Watching URLs For Changes

It's often useful to get a notification when a webpage changes. Here's a simple shell script for that:

#!/usr/bin/env bash

set -euo pipefail

url=$1
query=${2:-}

state="${XDG_DATA_HOME:-$HOME/.local/share}/web-watch/${url//\//_}-${query//\//_}"
mkdir -p "$(dirname "$state")"

tmp=$(mktemp)
curl --silent --fail --location --output "$tmp" "$url"
if [[ "$query" ]];then
  processed=$(mktemp)
  xmllint --html --xpath "$query" "$tmp" > "$processed" 2>/dev/null
  mv "$processed" "$tmp"
fi
[[ -e "$state" ]] && diff -u "$state" "$tmp" || :
mv "$tmp" "$state"

I.e., each time it runs, it shows a diff between the content the last time it ran and the current content. Optionally, it can watch only part of the page, specified by an XPath query.

It works very nicely with cron. Some examples:

50 3 * * 2 web-watch https://spectrum-os.org/
40 5 * * * web-watch https://revolutionrobotics.org/collections/all '//*[contains(@class,"grid-product__tag") or contains(@class, "collection-filter__item--count")]/text()' # Shut up and take my money!

# New chapter available?
33 7 * * 3 web-watch https://aphyr.com/tags/interviews '//article//h1//text()'
24 4 * * * web-watch https://palewebserial.wordpress.com/table-of-contents/ '//article//a//text()'
43 2 * * * web-watch https://www.royalroad.com/fiction/45534/this-used-to-be-about-dungeons '//table[@id="chapters"]//td[1]/a/text()'
37 6 * * * web-watch https://www.projectlawful.com/ '//*[contains(@class, "post-subject") or contains(@class, "post-replies")]/a[1]/text()'
38 6 * * * web-watch https://www.projectlawful.com/board_sections/721 '//*[contains(@class, "post-subject") or contains(@class, "post-replies")]/a[1]/text()'
47 7 * * * web-watch https://mangaclash.com/manga/tomo-chan-wa-onnanoko/ '//li[contains(@class, "wp-manga-chapter")]/a/text()'

# New versions available?
44 5 * * 0 web-watch https://nixos.org/download.html '//*[@id = "download-nixos"]//a/text()'
36 4 * * * web-watch https://www.qemu.org/download/ '//article[@id="source"]//a[not(contains(@href,"wiki.qemu.org"))][contains(text(),".")]/text()'
23 2 * * * web-watch https://www.gnucash.org/ '//h2[@id="dwnld-box"]/text()'
41 0 * * * web-watch https://liballeg.org/feed_atom.xml '//*[local-name()="title" and contains(., "released")]/text()'
37 2 * * * web-watch https://pypi.org/rss/project/backoff/releases.xml '//*[local-name()="title"]/text()'
44 6 * * * web-watch https://ftp.gnu.org/gnu/gawk/ '//a/text()'
27 7 * * * web-watch https://ftp.gnu.org/gnu/gzip/ '//a/text()'
46 0 * * * web-watch https://www.minetest.net/downloads/ '//a[contains(text(), " portable")]/text()'
18 3 * * * web-watch 'https://git.librecmc.org/?p=librecmc/librecmc.git;a=tags' '//tr[position()<3]//a[@class="list name"]/text()'