archiver

any user-agent containing "atmmachine.nz" is a user-initiated crawl

for website operators

purpose

crawls are for archival purposes exclusively. at this stage all archives are only kept locally (duplicated locally too) and accessed using py-wb, which is not exposed to the internet.

following robots.txt

if the crawl is a sitewide crawl, then Crawl-delay and Disallow rules in robots.txt will be observed for the following user-agents: *; Wget; atmmachine.nz; and whatever user-agent is currently in use (probably atmmachine.nz).

however, if the crawl is targeted (meaning it will only recurse by one level from the target url) robots.txt will not be followed.

for both crawl types, a default crawl delay of one second between requests is in force.

robots.txt is only checked at the beginning of the crawl and is not automatically observed if any changes are made during a crawl.

should you have any concerns, feel free to get in touch: atmatmmachine.nz.

planning to put the bash script which is used for this on codeberg sometime, but need to polish things up first and remove some hard coded defaults.

archiver

for website operators

purpose

following robots.txt

more