[aklug] Re: recursive HTTP file downloading...

From: Leif Sawyer <lsawyer@gci.com>
Date: Thu Jun 10 2010 - 11:40:10 AKDT

Supposedly, wget supports the robots =3D off line in .wgetrc which
will ignore the robots.txt file restrictions.

or

wget -e robots=3Doff

So, no need to hacks it.

> -----Original Message-----
> From: aklug-bounce@aklug.org [mailto:aklug-bounce@aklug.org]
> On Behalf Of Arthur Corliss
> Sent: Thursday, June 10, 2010 11:35 AM
> To: blair parker
> Cc: aklug@aklug.org
> Subject: [aklug] Re: recursive HTTP file downloading...
>
> On Thu, 10 Jun 2010, blair parker wrote:
>
> > Ok... Maybe somebody out there can help me with a
> recursive download
> > issue...
> >
> > The state DOT has a bunch of specs that my wife wants to download:
> >
> > http://www.dot.state.ak.us/creg/design/highways/Specs/
> >
> > She wants all of the files, subdirectories included.
> >
> > I can't seem to get 'wget' to download any of the files listed, and
> > 'curl' only downloads files individually. Am I missing
> something, or
> > is there some relatively simple, recursive command to
> download all of
> > these files ?..
> >
> > Thanks.
>
> :-) Looks like their robots.txt forbids it, which wget obeys.
> That's where it's nice to be able to have the source. You
> can edit wget's source to ignore robots.txt. Of course,
> whether or not that's in keeping with proper web etiquette
> should weigh in your decision. If they catch you they'll
> certainly try to ban you.
>
> --Arthur Corliss
> Live Free or Die
> ---------
> To unsubscribe, send email to <aklug-request@aklug.org> with
> 'unsubscribe' in the message body.
>
>
---------
To unsubscribe, send email to <aklug-request@aklug.org>
with 'unsubscribe' in the message body.
Received on Thu Jun 10 11:40:15 2010

This archive was generated by hypermail 2.1.8 : Thu Jun 10 2010 - 11:40:15 AKDT