Simple scraping with wget: Difference between revisions

Revision as of 15:54, 19 May 2014

From Roel, a very nice one-liner:

 wget --random-wait -r -p -e robots=off -U mozilla www.somepage.com

The options:

-r (--recursive): Recursive (ie keep following links)
-p (--page-requisites): Download dependent files
-e (--execute): Performs the command (in this case robots=off).
-U (--user-agent): Sets the user agent string to that of a "known" browser, in this case mozilla.

@@ Line 2: / Line 2: @@
    wget --random-wait -r -p -e robots=off -U mozilla www.somepage.com
+The options:
+* -r (--recursive): Recursive (ie keep following links)
+* -p (--page-requisites): Download dependent files
+* -e (--execute): Performs the command (in this case robots=off).
+* -U (--user-agent): Sets the user agent string to that of a "known" browser, in this case mozilla.