[aklug] Re: recursive HTTP file downloading...

From: Shane R. Spencer <shane@bogomip.com>
Date: Thu Jun 10 2010 - 11:46:09 AKDT

I typically have to copy /etc/wgetrc off to ~/.wgetrc or whatever it uses.. then disable
robots in the config file since there is no flag.

On 06/10/2010 11:40 AM, Leif Sawyer wrote:
> Supposedly, wget supports the robots =3D off line in .wgetrc which
> will ignore the robots.txt file restrictions.
>
> or
>
> wget -e robots=3Doff
>
> So, no need to hacks it.
>
>
>
>
>
>> -----Original Message-----
>> From: aklug-bounce@aklug.org [mailto:aklug-bounce@aklug.org]
>> On Behalf Of Arthur Corliss
>> Sent: Thursday, June 10, 2010 11:35 AM
>> To: blair parker
>> Cc: aklug@aklug.org
>> Subject: [aklug] Re: recursive HTTP file downloading...
>>
>> On Thu, 10 Jun 2010, blair parker wrote:
>>
>>> Ok... Maybe somebody out there can help me with a
>> recursive download
>>> issue...
>>>
>>> The state DOT has a bunch of specs that my wife wants to download:
>>>
>>> http://www.dot.state.ak.us/creg/design/highways/Specs/
>>>
>>> She wants all of the files, subdirectories included.
>>>
>>> I can't seem to get 'wget' to download any of the files listed, and
>>> 'curl' only downloads files individually. Am I missing
>> something, or
>>> is there some relatively simple, recursive command to
>> download all of
>>> these files ?..
>>>
>>> Thanks.
>>
>> :-) Looks like their robots.txt forbids it, which wget obeys.
>> That's where it's nice to be able to have the source. You
>> can edit wget's source to ignore robots.txt. Of course,
>> whether or not that's in keeping with proper web etiquette
>> should weigh in your decision. If they catch you they'll
>> certainly try to ban you.
>>
>> --Arthur Corliss
>> Live Free or Die
>> ---------
>> To unsubscribe, send email to <aklug-request@aklug.org> with
>> 'unsubscribe' in the message body.
>>
>>
> ---------
> To unsubscribe, send email to <aklug-request@aklug.org>
> with 'unsubscribe' in the message body.
>

---------
To unsubscribe, send email to <aklug-request@aklug.org>
with 'unsubscribe' in the message body.
Received on Thu Jun 10 11:46:20 2010

This archive was generated by hypermail 2.1.8 : Thu Jun 10 2010 - 11:46:20 AKDT