Tutorial 2: Something a little more practical – a web page change detector

Now that you’ve gone through the basics of creating an Asphalt application, it’s time to expand your horizons a little. In this tutorial you will learn to use a container component to create a multi-component application and how to set up a configuration file for that.

The application you will build this time will periodically load a web page and see if it has changed since the last check. When changes are detected, it will then present the user with the computed differences between the old and the new versions.

Setting up the project structure

As in the previous tutorial, you will need a project directory and a virtual environment. Create a directory named tutorial2 and make a new virtual environment inside it. Then activate it and use pip to install the asphalt-mailer and aiohttp libraries:

pip install asphalt-mailer aiohttp

This will also pull in the core Asphalt library as a dependency.

Next, create a package directory named webnotifier and a module named app (app.py). The code in the following sections should be put in the app module (unless explicitly stated otherwise).

Detecting changes in a web page

The first task is to set up a loop that periodically retrieves the web page. For that, you can adapt code from the aiohttp HTTP client tutorial:

from __future__ import annotations

import asyncio
import logging
from typing import Any

import aiohttp
from asphalt.core import CLIApplicationComponent, Context, run_application

logger = logging.getLogger(__name__)


class ApplicationComponent(CLIApplicationComponent):
    async def run(self, ctx: Context) -> None:
        async with aiohttp.ClientSession() as session:
            while True:
                async with session.get("http://imgur.com") as resp:
                    await resp.text()

                await asyncio.sleep(10)

if __name__ == "__main__":
    run_application(ApplicationComponent(), logging=logging.DEBUG)

Great, so now the code fetches the contents of http://imgur.com at 10 second intervals. But this isn’t very useful yet – you need something that compares the old and new versions of the contents somehow. Furthermore, constantly loading the contents of a page exerts unnecessary strain on the hosting provider. We want our application to be as polite and efficient as reasonably possible.

To that end, you can use the if-modified-since header in the request. If the requests after the initial one specify the last modified date value in the request headers, the remote server will respond with a 304 Not Modified if the contents have not changed since that moment.

So, modify the code as follows:

class ApplicationComponent(CLIApplicationComponent):
    async def run(self, ctx: Context) -> None:
        last_modified = None
        async with aiohttp.ClientSession() as session:
            while True:
                headers: dict[str, Any] = (
                    {"if-modified-since": last_modified} if last_modified else {}
                )
                async with session.get("http://imgur.com", headers=headers) as resp:
                    logger.debug("Response status: %d", resp.status)
                    if resp.status == 200:
                        last_modified = resp.headers["date"]
                        await resp.text()
                        logger.info("Contents changed")

                await asyncio.sleep(10)

The code here stores the date header from the first response and uses it in the if-modified-since header of the next request. A 200 response indicates that the web page has changed so the last modified date is updated and the contents are retrieved from the response. Some logging calls were also sprinkled in the code to give you an idea of what’s happening.

Computing the changes between old and new versions

Now you have code that actually detects when the page has been modified between the requests. But it doesn’t yet show what in its contents has changed. The next step will then be to use the standard library difflib module to calculate the difference between the contents and send it to the logger:

from difflib import unified_diff


class ApplicationComponent(CLIApplicationComponent):
    async def run(self, ctx: Context) -> None:
        async with aiohttp.ClientSession() as session:
            last_modified, old_lines = None, None
            while True:
                logger.debug("Fetching webpage")
                headers: dict[str, Any] = (
                    {"if-modified-since": last_modified} if last_modified else {}
                )
                async with session.get("http://imgur.com", headers=headers) as resp:
                    logger.debug("Response status: %d", resp.status)
                    if resp.status == 200:
                        last_modified = resp.headers["date"]
                        new_lines = (await resp.text()).split("\n")
                        if old_lines is not None and old_lines != new_lines:
                            difference = unified_diff(old_lines, new_lines)
                            logger.info("Contents changed:\n%s", difference)

                        old_lines = new_lines

                await asyncio.sleep(10)

This modified code now stores the old and new contents in different variables to enable them to be compared. The .split("\n") is needed because unified_diff() requires the input to be iterables of strings. Likewise, the "\n".join(...) is necessary because the output is also an iterable of strings.

Sending changes via email

While an application that logs the changes on the console could be useful on its own, it’d be much better if it actually notified the user by means of some communication medium, wouldn’t it? For this specific purpose you need the asphalt-mailer library you installed in the beginning. The next modification will send the HTML formatted differences to you by email.

But, you only have a single component in your app now. To use asphalt-mailer, you will need to add its component to your application somehow. Enter ContainerComponent. With that, you can create a hierarchy of components where the mailer component is a child component of your own container component.

To use the mailer resource provided by asphalt-mailer, inject it to the run() function as a resource by adding a keyword-only argument, annotated with the type of the resource you want to inject (Mailer).

And to make the the results look nicer in an email message, you can switch to using difflib.HtmlDiff to produce the delta output:

from difflib import HtmlDiff

from asphalt.core import inject, resource
from asphalt.mailer.api import Mailer


class ApplicationComponent(CLIApplicationComponent):
    async def start(self, ctx: Context) -> None:
        self.add_component(
            "mailer", backend="smtp", host="your.smtp.server.here",
            message_defaults={"sender": "your@email.here", "to": "your@email.here"})
        await super().start(ctx)

    @inject
    async def run(self, ctx: Context, *, mailer: Mailer = resource()) -> None:
        async with aiohttp.ClientSession() as session:
            last_modified, old_lines = None, None
            diff = HtmlDiff()
            while True:
                logger.debug("Fetching webpage")
                headers: dict[str, Any] = (
                    {"if-modified-since": last_modified} if last_modified else {}
                )
                async with session.get("http://imgur.com", headers=headers) as resp:
                    logger.debug("Response status: %d", resp.status)
                    if resp.status == 200:
                        last_modified = resp.headers["date"]
                        new_lines = (await resp.text()).split("\n")
                        if old_lines is not None and old_lines != new_lines:
                            difference = diff.make_file(old_lines, new_lines, context=True)
                            await mailer.create_and_deliver(
                                subject="Change detected in web page",
                                html_body=difference
                            )
                            logger.info("Sent notification email")

                        old_lines = new_lines

                await asyncio.sleep(10)

You’ll need to replace the host, sender and to arguments for the mailer component and possibly add the username and password arguments if your SMTP server requires authentication.

With these changes, you’ll get a new HTML formatted email each time the code detects changes in the target web page.

Separating the change detection logic

While the application now works as intended, you’re left with two small problems. First off, the target URL and checking frequency are hard coded. That is, they can only be changed by modifying the program code. It is not reasonable to expect non-technical users to modify the code when they want to simply change the target website or the frequency of checks. Second, the change detection logic is hardwired to the notification code. A well designed application should maintain proper separation of concerns. One way to do this is to separate the change detection logic to its own class.

Create a new module named detector in the webnotifier package. Then, add the change event class to it:

import asyncio
import logging

import aiohttp
from asphalt.core import Component, Event, Signal, context_teardown

logger = logging.getLogger(__name__)


class WebPageChangeEvent(Event):
    def __init__(self, source, topic, old_lines, new_lines):
        super().__init__(source, topic)
        self.old_lines = old_lines
        self.new_lines = new_lines

This class defines the type of event that the notifier will emit when the target web page changes. The old and new content are stored in the event instance to allow the event listener to generate the output any way it wants.

Next, add another class in the same module that will do the HTTP requests and change detection:

class Detector:
    changed = Signal(WebPageChangeEvent)

    def __init__(self, url: str, delay: float):
        self.url = url
        self.delay = delay

    async def run(self) -> None:
        async with aiohttp.ClientSession() as session:
            last_modified, old_lines = None, None
            while True:
                logger.debug("Fetching contents of %s", self.url)
                headers: dict[str, Any] = (
                    {"if-modified-since": last_modified} if last_modified else {}
                )
                async with session.get(self.url, headers=headers) as resp:
                    logger.debug("Response status: %d", resp.status)
                    if resp.status == 200:
                        last_modified = resp.headers["date"]
                        new_lines = (await resp.text()).split("\n")
                        if old_lines is not None and old_lines != new_lines:
                            self.changed.dispatch(old_lines, new_lines)

                        old_lines = new_lines

                await asyncio.sleep(self.delay)

The constructor arguments allow you to freely specify the parameters for the detection process. The class includes a signal named changed that uses the previously created WebPageChangeEvent class. The code dispatches such an event when a change in the target web page is detected.

Finally, add the component class which will allow you to integrate this functionality into any Asphalt application:

class ChangeDetectorComponent(Component):
    def __init__(self, url: str, delay: float = 10):
        self.url = url
        self.delay = delay

    @context_teardown
    async def start(self, ctx: Context) -> None:
        detector = Detector(self.url, self.delay)
        ctx.add_resource(detector, context_attr='detector')
        task = asyncio.create_task(detector.run())
        logging.info(
            'Started web page change detector for url "%s" with a delay of %d seconds',
            self.url,
            self.delay,
        )

        yield

        # This part is run when the context is being torn down
        task.cancel()
        await asyncio.gather(task, return_exceptions=True)
        logging.info("Shut down web page change detector")

The component’s start() method starts the detector’s run() method as a new task, adds the detector object as resource and installs an event listener that will shut down the detector when the context is torn down.

Now that you’ve moved the change detection code to its own module, ApplicationComponent will become somewhat lighter:

from contextlib import aclosing  # on Python < 3.10, import from async_generator or contextlib2


class ApplicationComponent(CLIApplicationComponent):
    async def start(self, ctx: Context) -> None:
        self.add_component("detector", ChangeDetectorComponent, url="http://imgur.com")
        self.add_component(
            "mailer", backend="smtp", host="your.smtp.server.here",
            message_defaults={"sender": "your@email.here", "to": "your@email.here"})
        await super().start(ctx)

    @inject
    async def run(
        self,
        ctx: Context,
        *,
        mailer: Mailer = resource(),
        detector: Detector = resource(),
    ):
        diff = HtmlDiff()
        async with aclosing(detector.changed.stream_events()) as stream:
            async for event in stream:
                difference = diff.make_file(
                    event.old_lines, event.new_lines, context=True
                )
                await mailer.create_and_deliver(
                    subject=f"Change detected in {event.source.url}",
                    html_body=difference,
                )
                logger.info("Sent notification email")

The main application component will now use the detector resource added by ChangeDetectorComponent. It adds one event listener which reacts to change events by creating an HTML formatted difference and sending it to the default recipient.

Once the start() method here has run to completion, the event loop finally has a chance to run the task created for Detector.run(). This will allow the detector to do its work and dispatch those changed events that the page_changed() listener callback expects.

Setting up the configuration file

Now that your application code is in good shape, you will need to give the user an easy way to configure it. This is where YAML configuration files come in handy. They’re clearly structured and are far less intimidating to end users than program code. And you can also have more than one of them, in case you want to run the program with a different configuration.

In your project directory (tutorial2), create a file named config.yaml with the following contents:

---
component:
  type: webnotifier.app:ApplicationComponent
  components:
    detector:
      url: http://imgur.com/
      delay: 15
    mailer:
      host: your.smtp.server.here
      message_defaults:
        sender: your@email.here
        to: your@email.here

logging:
  version: 1
  disable_existing_loggers: false
  formatters:
    default:
      format: '[%(asctime)s %(levelname)s] %(message)s'
  handlers:
    console:
      class: logging.StreamHandler
      formatter: default
  root:
    handlers: [console]
    level: INFO
  loggers:
    webnotifier:
      level: DEBUG

The component section defines parameters for the root component. Aside from the special type key which tells the runner where to find the component class, all the keys in this section are passed to the constructor of ApplicationComponent as keyword arguments. Keys under components will match the alias of each child component, which is given as the first argument to add_component(). Any component parameters given here can now be removed from the add_component() call in ApplicationComponent’s code.

The logging configuration here sets up two loggers, one for webnotifier and its descendants and another (root) as a catch-all for everything else. It specifies one handler that just writes all log entries to the standard output. To learn more about what you can do with the logging configuration, consult the Configuration dictionary schema section in the standard library documentation.

You can now run your app with the asphalt run command, provided that the project directory is on Python’s search path. When your application is properly packaged and installed in site-packages, this won’t be a problem. But for the purposes of this tutorial, you can temporarily add it to the search path by setting the PYTHONPATH environment variable:

PYTHONPATH=. asphalt run config.yaml

On Windows:

set PYTHONPATH=%CD%
asphalt run config.yaml

Note

The if __name__ == '__main__': block is no longer needed since asphalt run is now used as the entry point for the application.

Conclusion

You now know how to take advantage of Asphalt’s component system to add structure to your application. You’ve learned how to build reusable components and how to make the components work together through the use of resources. Last, but not least, you’ve learned to set up a YAML configuration file for your application and to set up a fine grained logging configuration in it.

You now possess enough knowledge to leverage Asphalt to create practical applications. You are now encouraged to find out what Asphalt component projects exist to aid your application development. Happy coding ☺