HTML Hacking Scripts

Here are a few useful web-related programs I've written lately. You might also see this interesting program by Abigail.

Shaking Up the Web

Latro finds idiotic PC sites open to perl.exe? abuse and reports their little problem.

HTML Munging

Extract URLs and verify validity; currently only looks for FTP:, HTTP:, and FILE: schemata, stored in A or IMG tags.

Strip out all the html bits from a document, leaving (unformatted) plain text in its wake.

Strips out comments from an HTML document.

Retrieve the title from a URL.

URL Munging

Given a list of URLs, sorts them by last-modified date.

Given one URL, extract all URLs it contains. Uses the LWP library, and is pretty complete.

Somewhat like xurl, (means ``quick xurl'') but expects to read from files, not URLs, and doesn't canonicalize relative links. It also runs about 100x faster and doesn't require an external library.

reltree Fix up a tree's URL to make them all relative instead of absolute.

Netscape Munging

Grovel global history. Search or dump out the netscape global history history file.