North American Network Operators Group|
Date Prev | Date Next |
Date Index |
Thread Index |
Author Index |
RE: Crawler Ettiquette
- From: Hunter, Jonathan
- Date: Thu Jan 24 06:45:25 2002
> a) Obey robots.txt files
> b) Allow network admins to automatically have their
> netblocks exempted on request
> c) Allow ISP's caches to sync with it.
I don't know if this is already on your list, but I'd also suggest "d) Rate-limiting of requests to a netblock/server". I haven't got any references immediately to hand, but I do seem to recall a crawler written in such a way that it remained "server-friendly" and would not fire off too many requests too quickly.
> ISPs who cache would have an advantage if they used the cache
> developed by this project to load their tables, but I do not
> know if there is an internet-wide WCCP or equivalent out there
> or if the improvement is worth the management overhead.
It may be worth having a quick look at http://www.ircache.net/ - there is a database of known caches available through a WHOIS interface, amongst other things.