Tutorial 2: Something a little more practical – a web page change detector¶
Now that you’ve gone through the basics of creating an Asphalt application, it’s time to expand your horizons a little. In this tutorial you will learn to use a container component to create a multi-component application and how to set up a configuration file for that.
The application you will build this time will periodically load a web page and see if it has changed since the last check. When changes are detected, it will then present the user with the computed differences between the old and the new versions.
Setting up the project structure¶
As in the previous tutorial, you will need a project directory and a virtual environment. Create a
directory named tutorial2
and make a new virtual environment inside it. Then activate it and
use pip
to install the asphalt-mailer
and aiohttp
libraries:
pip install asphalt-mailer aiohttp
This will also pull in the core Asphalt library as a dependency.
Next, create a package directory named webnotifier
and a module named app
(app.py
).
The code in the following sections should be put in the app
module (unless explicitly stated
otherwise).
Detecting changes in a web page¶
The first task is to set up a loop that periodically retrieves the web page. For that, you can adapt code from the aiohttp HTTP client tutorial:
from __future__ import annotations
import asyncio
import logging
from typing import Any
import aiohttp
from asphalt.core import CLIApplicationComponent, Context, run_application
logger = logging.getLogger(__name__)
class ApplicationComponent(CLIApplicationComponent):
async def run(self, ctx: Context) -> None:
async with aiohttp.ClientSession() as session:
while True:
async with session.get("http://imgur.com") as resp:
await resp.text()
await asyncio.sleep(10)
if __name__ == "__main__":
run_application(ApplicationComponent(), logging=logging.DEBUG)
Great, so now the code fetches the contents of http://imgur.com
at 10 second intervals.
But this isn’t very useful yet – you need something that compares the old and new versions of the
contents somehow. Furthermore, constantly loading the contents of a page exerts unnecessary strain
on the hosting provider. We want our application to be as polite and efficient as reasonably
possible.
To that end, you can use the if-modified-since
header in the request. If the requests after the
initial one specify the last modified date value in the request headers, the remote server will
respond with a 304 Not Modified
if the contents have not changed since that moment.
So, modify the code as follows:
class ApplicationComponent(CLIApplicationComponent):
async def run(self, ctx: Context) -> None:
last_modified = None
async with aiohttp.ClientSession() as session:
while True:
headers: dict[str, Any] = (
{"if-modified-since": last_modified} if last_modified else {}
)
async with session.get("http://imgur.com", headers=headers) as resp:
logger.debug("Response status: %d", resp.status)
if resp.status == 200:
last_modified = resp.headers["date"]
await resp.text()
logger.info("Contents changed")
await asyncio.sleep(10)
The code here stores the date
header from the first response and uses it in the
if-modified-since
header of the next request. A 200
response indicates that the web page
has changed so the last modified date is updated and the contents are retrieved from the response.
Some logging calls were also sprinkled in the code to give you an idea of what’s happening.
Computing the changes between old and new versions¶
Now you have code that actually detects when the page has been modified between the requests.
But it doesn’t yet show what in its contents has changed. The next step will then be to use the
standard library difflib
module to calculate the difference between the contents and send it
to the logger:
from difflib import unified_diff
class ApplicationComponent(CLIApplicationComponent):
async def run(self, ctx: Context) -> None:
async with aiohttp.ClientSession() as session:
last_modified, old_lines = None, None
while True:
logger.debug("Fetching webpage")
headers: dict[str, Any] = (
{"if-modified-since": last_modified} if last_modified else {}
)
async with session.get("http://imgur.com", headers=headers) as resp:
logger.debug("Response status: %d", resp.status)
if resp.status == 200:
last_modified = resp.headers["date"]
new_lines = (await resp.text()).split("\n")
if old_lines is not None and old_lines != new_lines:
difference = unified_diff(old_lines, new_lines)
logger.info("Contents changed:\n%s", difference)
old_lines = new_lines
await asyncio.sleep(10)
This modified code now stores the old and new contents in different variables to enable them to be
compared. The .split("\n")
is needed because unified_diff()
requires the input
to be iterables of strings. Likewise, the "\n".join(...)
is necessary because the output is
also an iterable of strings.
Sending changes via email¶
While an application that logs the changes on the console could be useful on its own, it’d be much
better if it actually notified the user by means of some communication medium, wouldn’t it?
For this specific purpose you need the asphalt-mailer
library you installed in the beginning.
The next modification will send the HTML formatted differences to you by email.
But, you only have a single component in your app now. To use asphalt-mailer
, you will need to
add its component to your application somehow. Enter
ContainerComponent
. With that, you can create a hierarchy of
components where the mailer
component is a child component of your own container component.
To use the mailer resource provided by asphalt-mailer
, inject it to the run()
function as a resource by adding a keyword-only argument, annotated with the type of
the resource you want to inject (Mailer
).
And to make the the results look nicer in an email message, you can switch to using
difflib.HtmlDiff
to produce the delta output:
from difflib import HtmlDiff
from asphalt.core import inject, resource
from asphalt.mailer.api import Mailer
class ApplicationComponent(CLIApplicationComponent):
async def start(self, ctx: Context) -> None:
self.add_component(
"mailer", backend="smtp", host="your.smtp.server.here",
message_defaults={"sender": "your@email.here", "to": "your@email.here"})
await super().start(ctx)
@inject
async def run(self, ctx: Context, *, mailer: Mailer = resource()) -> None:
async with aiohttp.ClientSession() as session:
last_modified, old_lines = None, None
diff = HtmlDiff()
while True:
logger.debug("Fetching webpage")
headers: dict[str, Any] = (
{"if-modified-since": last_modified} if last_modified else {}
)
async with session.get("http://imgur.com", headers=headers) as resp:
logger.debug("Response status: %d", resp.status)
if resp.status == 200:
last_modified = resp.headers["date"]
new_lines = (await resp.text()).split("\n")
if old_lines is not None and old_lines != new_lines:
difference = diff.make_file(old_lines, new_lines, context=True)
await mailer.create_and_deliver(
subject="Change detected in web page",
html_body=difference
)
logger.info("Sent notification email")
old_lines = new_lines
await asyncio.sleep(10)
You’ll need to replace the host
, sender
and to
arguments for the mailer component and
possibly add the username
and password
arguments if your SMTP server requires
authentication.
With these changes, you’ll get a new HTML formatted email each time the code detects changes in the target web page.
Separating the change detection logic¶
While the application now works as intended, you’re left with two small problems. First off, the target URL and checking frequency are hard coded. That is, they can only be changed by modifying the program code. It is not reasonable to expect non-technical users to modify the code when they want to simply change the target website or the frequency of checks. Second, the change detection logic is hardwired to the notification code. A well designed application should maintain proper separation of concerns. One way to do this is to separate the change detection logic to its own class.
Create a new module named detector
in the webnotifier
package. Then, add the change event
class to it:
import asyncio
import logging
import aiohttp
from asphalt.core import Component, Event, Signal, context_teardown
logger = logging.getLogger(__name__)
class WebPageChangeEvent(Event):
def __init__(self, source, topic, old_lines, new_lines):
super().__init__(source, topic)
self.old_lines = old_lines
self.new_lines = new_lines
This class defines the type of event that the notifier will emit when the target web page changes. The old and new content are stored in the event instance to allow the event listener to generate the output any way it wants.
Next, add another class in the same module that will do the HTTP requests and change detection:
class Detector:
changed = Signal(WebPageChangeEvent)
def __init__(self, url: str, delay: float):
self.url = url
self.delay = delay
async def run(self) -> None:
async with aiohttp.ClientSession() as session:
last_modified, old_lines = None, None
while True:
logger.debug("Fetching contents of %s", self.url)
headers: dict[str, Any] = (
{"if-modified-since": last_modified} if last_modified else {}
)
async with session.get(self.url, headers=headers) as resp:
logger.debug("Response status: %d", resp.status)
if resp.status == 200:
last_modified = resp.headers["date"]
new_lines = (await resp.text()).split("\n")
if old_lines is not None and old_lines != new_lines:
self.changed.dispatch(old_lines, new_lines)
old_lines = new_lines
await asyncio.sleep(self.delay)
The constructor arguments allow you to freely specify the parameters for the detection process.
The class includes a signal named changed
that uses the previously created
WebPageChangeEvent
class. The code dispatches such an event when a change in the target web
page is detected.
Finally, add the component class which will allow you to integrate this functionality into any Asphalt application:
class ChangeDetectorComponent(Component):
def __init__(self, url: str, delay: float = 10):
self.url = url
self.delay = delay
@context_teardown
async def start(self, ctx: Context) -> None:
detector = Detector(self.url, self.delay)
ctx.add_resource(detector, context_attr='detector')
task = asyncio.create_task(detector.run())
logging.info(
'Started web page change detector for url "%s" with a delay of %d seconds',
self.url,
self.delay,
)
yield
# This part is run when the context is being torn down
task.cancel()
await asyncio.gather(task, return_exceptions=True)
logging.info("Shut down web page change detector")
The component’s start()
method starts the detector’s run()
method as a new task, adds
the detector object as resource and installs an event listener that will shut down the detector
when the context is torn down.
Now that you’ve moved the change detection code to its own module, ApplicationComponent
will
become somewhat lighter:
from contextlib import aclosing # on Python < 3.10, import from async_generator or contextlib2
class ApplicationComponent(CLIApplicationComponent):
async def start(self, ctx: Context) -> None:
self.add_component("detector", ChangeDetectorComponent, url="http://imgur.com")
self.add_component(
"mailer", backend="smtp", host="your.smtp.server.here",
message_defaults={"sender": "your@email.here", "to": "your@email.here"})
await super().start(ctx)
@inject
async def run(
self,
ctx: Context,
*,
mailer: Mailer = resource(),
detector: Detector = resource(),
):
diff = HtmlDiff()
async with aclosing(detector.changed.stream_events()) as stream:
async for event in stream:
difference = diff.make_file(
event.old_lines, event.new_lines, context=True
)
await mailer.create_and_deliver(
subject=f"Change detected in {event.source.url}",
html_body=difference,
)
logger.info("Sent notification email")
The main application component will now use the detector resource added by
ChangeDetectorComponent
. It adds one event listener which reacts to change events by creating
an HTML formatted difference and sending it to the default recipient.
Once the start()
method here has run to completion, the event loop finally has a chance to run
the task created for Detector.run()
. This will allow the detector to do its work and dispatch
those changed
events that the page_changed()
listener callback expects.
Setting up the configuration file¶
Now that your application code is in good shape, you will need to give the user an easy way to configure it. This is where YAML configuration files come in handy. They’re clearly structured and are far less intimidating to end users than program code. And you can also have more than one of them, in case you want to run the program with a different configuration.
In your project directory (tutorial2
), create a file named config.yaml
with the following
contents:
---
component:
type: webnotifier.app:ApplicationComponent
components:
detector:
url: http://imgur.com/
delay: 15
mailer:
host: your.smtp.server.here
message_defaults:
sender: your@email.here
to: your@email.here
logging:
version: 1
disable_existing_loggers: false
formatters:
default:
format: '[%(asctime)s %(levelname)s] %(message)s'
handlers:
console:
class: logging.StreamHandler
formatter: default
root:
handlers: [console]
level: INFO
loggers:
webnotifier:
level: DEBUG
The component
section defines parameters for the root component. Aside from the special
type
key which tells the runner where to find the component class, all the keys in this section
are passed to the constructor of ApplicationComponent
as keyword arguments. Keys under
components
will match the alias of each child component, which is given as the first argument
to add_component()
. Any component parameters given
here can now be removed from the add_component()
call in ApplicationComponent
’s code.
The logging configuration here sets up two loggers, one for webnotifier
and its descendants
and another (root
) as a catch-all for everything else. It specifies one handler that just
writes all log entries to the standard output. To learn more about what you can do with the logging
configuration, consult the Configuration dictionary schema section in the standard library
documentation.
You can now run your app with the asphalt run
command, provided that the project directory is
on Python’s search path. When your application is properly packaged and installed in
site-packages
, this won’t be a problem. But for the purposes of this tutorial, you can
temporarily add it to the search path by setting the PYTHONPATH
environment variable:
PYTHONPATH=. asphalt run config.yaml
On Windows:
set PYTHONPATH=%CD%
asphalt run config.yaml
Note
The if __name__ == '__main__':
block is no longer needed since asphalt run
is now used
as the entry point for the application.
Conclusion¶
You now know how to take advantage of Asphalt’s component system to add structure to your application. You’ve learned how to build reusable components and how to make the components work together through the use of resources. Last, but not least, you’ve learned to set up a YAML configuration file for your application and to set up a fine grained logging configuration in it.
You now possess enough knowledge to leverage Asphalt to create practical applications. You are now encouraged to find out what Asphalt component projects exist to aid your application development. Happy coding ☺