Scraping the Open Directory with Python: Difference between revisions
No edit summary |
|||
(13 intermediate revisions by the same user not shown) | |||
Line 4: | Line 4: | ||
</blockquote> | </blockquote> | ||
== | == Step 1: Extracting the links from a particular directory page == | ||
Let's start at a single page on dmoz, such as: | |||
http://www.dmoz.org/Science/Astronomy/ | http://www.dmoz.org/Science/Astronomy/ | ||
Line 22: | Line 22: | ||
<source lang="python"> | <source lang="python"> | ||
import urllib2, html5lib, sys | |||
url = sys.argv[1] | url = sys.argv[1] | ||
Line 39: | Line 41: | ||
print ()</source> | print ()</source> | ||
== | == Step 2: Following links of related categories == | ||
The pages also contain links to sub- and related categories, if we want to follow these as well, we can... | The pages also contain links to sub- and related categories, if we want to follow these as well, we can... | ||
Conceptually, we add a notion of a "to do" list, for URLs we still want to check, and then a "seen" list for remembering where we've already been. The seen list is actually more efficient to implement as a python dictionary where we "add" a URL by using the URL as a key value which points to True (it doesn't actually matter what value is assigned in the example). | |||
<source lang="html4strict"> | <source lang="html4strict"> | ||
Line 100: | Line 104: | ||
</source> | </source> | ||
== | == Step 3: Putting it all together == | ||
<source lang="python"> | <source lang="python"> | ||
Line 151: | Line 155: | ||
print ("links", len(links.keys())) | print ("links", len(links.keys())) | ||
</source> | |||
== Output == | |||
Performing the code from step 3... | |||
python crawl.py http://www.dmoz.org/Science/Astronomy | |||
... produces the output shown below. The program actually can run for a very long time, after 5 minutes, the crawler has found over 2000 unique links and has moved on to categories like Math History and Physics History. We use '''ctrl-c''' to stop the script and preventing it from crawling the entire dmoz website. A next step is to maybe think about when and how to stop the crawler; perhaps after a certain number of links have been found, or else by limiting the number of steps to take away from the starting page. | |||
<source lang="text"> | |||
VISITING http://www.dmoz.org/Science/Astronomy | |||
links 20 | |||
VISITING http://www.dmoz.org/Science/Anomalies_and_Alternative_Science/Astronomy%2C_Alternative/ | |||
links 39 | |||
VISITING http://www.dmoz.org/Science/Social_Sciences/Archaeology/Topics/Archaeoastronomy/ | |||
links 94 | |||
VISITING http://www.dmoz.org/Science/Physics/Astrophysics/ | |||
links 128 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Calendars_and_Timekeeping/ | |||
links 154 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/ | |||
links 170 | |||
VISITING http://www.dmoz.org/Society/Issues/Environment/Light_Pollution/ | |||
links 221 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Eclipses%2C_Occultations_and_Transits/ | |||
links 221 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Extrasolar_Planets/ | |||
links 250 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Extraterrestrial_Life/ | |||
links 258 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Galaxies/ | |||
links 280 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Interstellar_Medium/ | |||
links 285 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/ | |||
links 320 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Star_Clusters/ | |||
links 322 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Stars/ | |||
links 347 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Academic_Departments/ | |||
links 355 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/ | |||
already seen http://www.sundial.thai-isan-lao.com/ | |||
links 388 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Astronomers/ | |||
links 405 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Data_Archives/ | |||
links 448 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Directories/ | |||
already seen http://www.cosmobrain.com/ | |||
links 457 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Education/ | |||
already seen http://www.math.nus.edu.sg/aslaksen/teaching/heavenly.shtml | |||
already seen http://www.windows2universe.org/ | |||
links 490 | |||
VISITING http://www.dmoz.org/Science/Astronomy/History/ | |||
links 531 | |||
VISITING http://www.dmoz.org/Recreation/Humor/Science/Astronomy/ | |||
links 536 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Images/ | |||
links 575 | |||
VISITING http://www.dmoz.org/Science/Astronomy/In_the_Arts/ | |||
already seen http://www.lindahall.org/events_exhib/exhibit/exhibits/stars/index.html | |||
links 578 | |||
VISITING http://www.dmoz.org/Science/Astronomy/News_and_Media/ | |||
links 601 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Organizations/ | |||
already seen http://planetary.org/ | |||
links 619 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Personal_Pages/ | |||
links 676 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Popular_Topics/ | |||
links 676 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Products_and_Services/ | |||
links 676 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Publications/ | |||
links 677 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Regional/ | |||
links 677 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Research_Groups_and_Centers/ | |||
already seen http://aa.usno.navy.mil/ | |||
links 711 | |||
VISITING http://www.dmoz.org/Shopping/Recreation/Science_and_Nature/Astronomy/ | |||
links 721 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Software/ | |||
links 738 | |||
VISITING http://www.dmoz.org/Science/Astronomy/News_and_Media/Weblogs/ | |||
links 743 | |||
VISITING http://www.dmoz.org/Society/People/Women/Science_and_Technology/Astronomy/ | |||
links 752 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Observatories/ | |||
links 807 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Planetariums/ | |||
links 807 | |||
VISITING http://www.dmoz.org/Kids_and_Teens/School_Time/Science/Astronomy_and_Space/ | |||
already seen http://www.iau.org/ | |||
already seen http://www.pbs.org/deepspace/ | |||
already seen http://planetquest.jpl.nasa.gov/ | |||
already seen http://www.pbs.org/wgbh/nova/universe/ | |||
already seen http://www.space.com/ | |||
already seen http://library.thinkquest.org/C0110484/ | |||
already seen http://www.windows2universe.org/ | |||
links 859 | |||
VISITING http://www.dmoz.org/Science/Earth_Sciences/Atmospheric_Sciences/Atmospheric_Physics/ | |||
links 882 | |||
VISITING http://www.dmoz.org/Science/Physics/ | |||
links 896 | |||
VISITING http://www.dmoz.org/Science/Technology/Space/ | |||
links 907 | |||
VISITING http://www.dmoz.org/Science/Anomalies_and_Alternative_Science/Astronomy%2C_Alternative/Cosmology/ | |||
links 953 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Planets/Mars/Life_on_Mars/ | |||
links 961 | |||
VISITING http://www.dmoz.org/Science/Anomalies_and_Alternative_Science/Astronomy%2C_Alternative/Planetary_Anomalies/ | |||
links 964 | |||
VISITING http://www.dmoz.org/Science/Anomalies_and_Alternative_Science/Astronomy%2C_Alternative/Sirius_and_the_Dogon/ | |||
links 968 | |||
VISITING http://www.dmoz.org/Science/Anomalies_and_Alternative_Science/Geology%2C_Alternative/Velikovsky%2C_Immanuel/ | |||
links 976 | |||
VISITING http://www.dmoz.org/Science/Astronomy/ | |||
already seen http://www.absoluteastronomy.com/ | |||
already seen http://aa.usno.navy.mil/ | |||
already seen http://astronomicaloptical.blogspot.com/ | |||
already seen http://www.astrosociety.org/education/resources/pseudobib.html | |||
already seen http://en.citizendium.org/wiki/Astronomy | |||
already seen http://casswww.ucsd.edu/public/astroed.html | |||
already seen http://www.funtrivia.com/ql.cfm?cat=59 | |||
already seen http://www.pd.astro.it/E-MOSTRA/A0000HOM.HTM | |||
already seen http://astronomyonline.org/ | |||
already seen http://www.astronomytoday.com/ | |||
already seen http://www.cv.nrao.edu/fits/www/astronomy.html | |||
already seen http://www.atlasoftheuniverse.com/ | |||
already seen http://www.cosmosportal.org/ | |||
already seen http://scienceworld.wolfram.com/astronomy/ | |||
already seen http://coolcosmos.ipac.caltech.edu/cosmic_classroom/ir_tutorial/ | |||
already seen https://www.lifereader.com/research-articles/logarithmic-maps-of-the-universe | |||
already seen http://muse.univ-lyon1.fr/ | |||
already seen http://www.nmm.ac.uk/server/show/conWebDoc.309 | |||
already seen http://universe-review.ca/ | |||
already seen http://jumk.de/astronomie/astronomy.shtml | |||
links 976 | |||
VISITING http://www.dmoz.org/Science/Social_Sciences/Archaeology/Periods_and_Cultures/Mesoamerican/Aztec/Calendar/ | |||
links 979 | |||
VISITING http://www.dmoz.org/Science/Social_Sciences/Archaeology/Topics/Archaeoastronomy/Mayan/ | |||
links 984 | |||
VISITING http://www.dmoz.org/Science/Social_Sciences/Archaeology/Topics/Archaeoastronomy/Stonehenge/ | |||
links 989 | |||
VISITING http://www.dmoz.org/Science/Social_Sciences/Archaeology/Alternative/Archaeoastronomy/ | |||
already seen http://www.archaeoastronomy.com/ | |||
links 1006 | |||
VISITING http://www.dmoz.org/Science/Social_Sciences/Archaeology/Archaeologists/Archaeoastronomers/ | |||
links 1009 | |||
VISITING http://www.dmoz.org/Science/Social_Sciences/Archaeology/Topics/Archaeoastronomy/Conferences/ | |||
links 1011 | |||
VISITING http://www.dmoz.org/Science/Social_Sciences/Archaeology/Topics/Archaeoastronomy/Journals/ | |||
links 1015 | |||
VISITING http://www.dmoz.org/Science/Physics/Relativity/Black_Holes/ | |||
links 1034 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Stars/Neutron_Stars/ | |||
already seen http://www.bigear.org/vol1no1/burnell.htm | |||
links 1053 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Stars/Stellar_Evolution/ | |||
links 1061 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Software/Computational_Astrophysics/ | |||
links 1077 | |||
VISITING http://www.dmoz.org/Science/Physics/Astrophysics/Research_Groups_and_Centers/ | |||
already seen http://www.cfa.harvard.edu/hco/ | |||
already seen http://www.cfa.harvard.edu/ | |||
already seen http://www.cfa.harvard.edu/sao/ | |||
links 1091 | |||
VISITING http://www.dmoz.org/Science/Physics/Nuclear/ | |||
links 1112 | |||
VISITING http://www.dmoz.org/Science/Physics/Particle/Astro_Particle/ | |||
links 1113 | |||
VISITING http://www.dmoz.org/Reference/Time/Clocks_and_Watches/ | |||
links 1118 | |||
VISITING http://www.dmoz.org/Science/Reference/Standards/Individual_Standards/ISO/ISO_8601/ | |||
links 1159 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Planets/Earth/Moon/Phases/ | |||
links 1176 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Calendars_and_Timekeeping/Sundials/ | |||
links 1195 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Sun/Sunrise_and_Sunset_Times/ | |||
links 1204 | |||
VISITING http://www.dmoz.org/Kids_and_Teens/School_Time/Science/Astronomy_and_Space/Time/ | |||
already seen http://www.cl.cam.ac.uk/~mgk25/iso-time.html | |||
already seen http://americanhistory.si.edu/ontime/ | |||
links 1224 | |||
VISITING http://www.dmoz.org/Reference/Time/ | |||
already seen http://www.hermetic.ch/cal_stud.htm | |||
already seen http://www-history.mcs.st-andrews.ac.uk/HistTopics/Time_1.html | |||
already seen http://www.cl.cam.ac.uk/~mgk25/iso-time.html | |||
already seen http://www.npr.org/templates/story/story.php?storyId=4572036 | |||
links 1238 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Sky_Calendars/ | |||
already seen http://www.sky-watch.com/ | |||
links 1248 | |||
VISITING http://www.dmoz.org/Science/Technology/Metrology/ | |||
links 1252 | |||
VISITING http://www.dmoz.org/Society/Holidays/Calendars_and_Lists/ | |||
links 1279 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/Cosmic_Background_Radiation/ | |||
links 1287 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/Cosmological_Constant_and_Dark_Energy/ | |||
links 1291 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/Dark_Matter/ | |||
links 1310 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/Earliest_Universe/ | |||
links 1312 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/Inflation/ | |||
links 1315 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/Large-Scale_Structure/ | |||
links 1318 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/Nucleosynthesis/ | |||
links 1320 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/Topological_Defects/ | |||
links 1326 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/Articles/ | |||
links 1332 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/Courses_and_Tutorials/ | |||
links 1334 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/History/ | |||
links 1336 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/Research_Groups/ | |||
links 1342 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/Software/ | |||
links 1344 | |||
VISITING http://www.dmoz.org/Science/Physics/Relativity/ | |||
already seen http://www.damtp.cam.ac.uk/user/gr/public/ | |||
links 1365 | |||
VISITING http://www.dmoz.org/Society/Issues/Environment/Light_Pollution/Chats_and_Forums/ | |||
links 1369 | |||
VISITING http://www.dmoz.org/Society/Issues/Environment/Light_Pollution/News_and_Media/ | |||
links 1384 | |||
VISITING http://www.dmoz.org/Science/Technology/Lighting/ | |||
links 1400 | |||
VISITING http://www.dmoz.org/Society/Issues/Environment/Light_Pollution/Opposing_Views/ | |||
links 1400 | |||
VISITING http://www.dmoz.org/Society/Issues/Environment/Light_Pollution/Regulation/ | |||
links 1400 | |||
VISITING http://www.dmoz.org/Society/Issues/Environment/Energy/ | |||
links 1442 | |||
VISITING http://www.dmoz.org/Society/Issues/Environment/Pollution/ | |||
links 1445 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Eclipses%2C_Occultations_and_Transits/Eclipses/ | |||
links 1472 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Eclipses%2C_Occultations_and_Transits/Occultations/ | |||
already seen http://www.occultations.org/ | |||
links 1474 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Eclipses%2C_Occultations_and_Transits/Transits/ | |||
links 1483 | |||
VISITING http://www.dmoz.org/Science/Astronomy/History/Eclipses%2C_Occultations_and_Transits/ | |||
links 1488 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Products_and_Services/Expeditions/ | |||
links 1494 | |||
VISITING http://www.dmoz.org/Science/Technology/Space/Missions/Unmanned/Deep_Space/Kepler/ | |||
links 1495 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Extraterrestrial_Life/Exobiology/ | |||
links 1518 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Extraterrestrial_Life/Fermi%27s_Paradox/ | |||
links 1529 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Extraterrestrial_Life/Panspermia/ | |||
already seen http://www.panspermia.org/ | |||
links 1533 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Extraterrestrial_Life/SETI/ | |||
links 1549 | |||
VISITING http://www.dmoz.org/Society/Paranormal/UFOs/ | |||
links 1578 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Galaxies/Active_Galactic_Nuclei_-_Quasars/ | |||
already seen http://charles_w.tripod.com/quasar.html | |||
links 1584 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Galaxies/Milky_Way/ | |||
already seen http://mwmw.gsfc.nasa.gov/ | |||
links 1592 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Deep_Sky_Observing/ | |||
already seen http://arpgalaxy.com/ | |||
already seen http://www.webbdeepsky.com/ | |||
links 1610 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Interstellar_Medium/Emission_Nebulae/ | |||
links 1611 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Interstellar_Medium/Planetary_Nebulae/ | |||
links 1616 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Interstellar_Medium/Reflection_Nebulae/ | |||
links 1617 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Planets/Earth/ | |||
links 1626 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Planets/Jupiter/ | |||
links 1637 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Planets/Mars/ | |||
links 1666 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Planets/Mercury/ | |||
links 1676 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Planets/Neptune/ | |||
links 1686 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Dwarf_Planets/Pluto_and_Charon/ | |||
links 1697 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Planets/Saturn/ | |||
links 1711 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Sun/ | |||
already seen http://solar-center.stanford.edu/ | |||
links 1734 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Planets/Earth/Moon/ | |||
links 1749 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Planets/Uranus/ | |||
links 1757 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Planets/Venus/ | |||
links 1769 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Asteroid_Belt/ | |||
links 1769 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Small_Bodies/ | |||
links 1792 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Dwarf_Planets/ | |||
links 1792 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Kuiper_Belt/ | |||
links 1810 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Oort_Cloud/ | |||
links 1811 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Planets/ | |||
already seen http://pds.jpl.nasa.gov/planets/ | |||
links 1814 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Solar_System_Observing/ | |||
links 1815 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Conferences/ | |||
links 1815 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Data_Archives/Solar_System/ | |||
already seen http://pds.jpl.nasa.gov/ | |||
links 1817 | |||
VISITING http://www.dmoz.org/Science/Astronomy/History/Solar_System/ | |||
already seen http://www-groups.dcs.st-and.ac.uk/~history/HistTopics/Neptune_and_Pluto.html | |||
links 1821 | |||
VISITING http://www.dmoz.org/Science/Technology/Space/Missions/ | |||
links 1829 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Star_Clusters/Globular_Clusters/ | |||
links 1831 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Star_Clusters/Open_Clusters/ | |||
links 1832 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Stars/Binary_Stars/ | |||
already seen http://www.phys.lsu.edu/astro/nap98/bf.final.html | |||
links 1842 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Stars/Dwarf_Stars/ | |||
links 1849 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Stars/Names/ | |||
links 1853 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Stars/Novae_and_Supernovae/ | |||
already seen http://chandra.harvard.edu/press/01_releases/press_011001.html | |||
links 1863 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Stars/Variable_Stars/ | |||
already seen http://www.aavso.org/ | |||
links 1869 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Sky_Maps_and_Atlases/ | |||
already seen http://www.lindahall.org/events_exhib/exhibit/exhibits/stars/index.html | |||
links 1892 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Data_Archives/Stellar_and_Astrometric/ | |||
already seen http://ad.usno.navy.mil/wds/dsl.html | |||
links 1898 | |||
VISITING http://www.dmoz.org/Shopping/Recreation/Science_and_Nature/Astronomy/Space_Novelties/Star_Names/ | |||
links 1903 | |||
VISITING http://www.dmoz.org/Reference/Education/Colleges_and_Universities/ | |||
links 1903 | |||
VISITING http://www.dmoz.org/Science/Academic_Departments/ | |||
links 1912 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Amateur_Contributions_to_Science/ | |||
already seen http://mbond.free.fr/ | |||
links 1917 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Amateur_Telescope_Making/ | |||
links 1942 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Astrophotography_and_CCD_Imaging/ | |||
already seen http://www.astropix.com/ | |||
links 1988 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Beginners/ | |||
already seen http://home.pcisys.net/~astrogirl/ | |||
links 2016 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Binocular_Astronomy/ | |||
links 2023 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Constellations/ | |||
already seen http://www.astro.wisc.edu/~dolan/constellations/ | |||
links 2034 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Observatories/ | |||
already seen http://mysite.verizon.net/vzeed81b/ | |||
already seen http://www.grimee.com/ | |||
links 2044 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Radio_Astronomy/ | |||
already seen http://www.bigear.org/ | |||
links 2056 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Satellites/ | |||
already seen http://www.planet4589.org/space/jsr/jsr.html | |||
already seen http://spaceflight.nasa.gov/realdata/sightings/ | |||
links 2062 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Spectroscopy/ | |||
already seen http://www.spectrashift.com/ | |||
links 2066 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Star_Parties/ | |||
links 2087 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Telescope_Owner_Resources/ | |||
links 2105 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Chats_and_Forums/ | |||
links 2117 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Classifieds/ | |||
links 2122 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Events/ | |||
links 2125 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Organizations/ | |||
links 2126 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Product_Reviews/ | |||
already seen http://www.scopereviews.com/ | |||
links 2131 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Data_Archives/Supernovae_and_Remnants/ | |||
already seen http://www.mrao.cam.ac.uk/surveys/snrs/ | |||
already seen http://www.rochesterastronomy.org/snimages/ | |||
links 2136 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Organizations/Directories/ | |||
links 2138 | |||
VISITING http://www.dmoz.org/Science/Directories/ | |||
links 2163 | |||
VISITING http://www.dmoz.org/Science/Technology/Space/NASA/Education/ | |||
already seen http://quest.arc.nasa.gov/ | |||
already seen http://origins.stsci.edu/ | |||
already seen http://cse.ssl.berkeley.edu/SegwayEd/index.html | |||
already seen http://www.windows2universe.org/ | |||
links 2241 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Observatories/Teaching_and_Public/ | |||
links 2254 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Products_and_Services/Stargazing_Events/ | |||
links 2256 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Software/Education_and_Multimedia/ | |||
links 2267 | |||
VISITING http://www.dmoz.org/Science/Astronomy/Products_and_Services/Educational_Resources/ | |||
links 2276 | |||
VISITING http://www.dmoz.org/Shopping/Entertainment/Recordings/Video/Science/ | |||
links 2284 | |||
VISITING http://www.dmoz.org/Science/Physics/Education/ | |||
already seen http://microgravity.grc.nasa.gov/DIME.html | |||
already seen http://www.grc.nasa.gov/WWW/K-12/Numbers/Math/Mathematical_Thinking/index.htm | |||
already seen http://www.colorado.edu/physics/2000/ | |||
links 2363 | |||
VISITING http://www.dmoz.org/Science/Technology/Space/Education/ | |||
already seen http://www.challenger.org/ | |||
already seen http://www.phy6.org/stargaze/Sintro.htm | |||
links 2380 | |||
VISITING http://www.dmoz.org/Science/Astronomy/History/Instruments/ | |||
links 2384 | |||
VISITING http://www.dmoz.org/Science/Astronomy/History/Observatories/ | |||
links 2387 | |||
VISITING http://www.dmoz.org/Science/Astronomy/History/People/ | |||
already seen http://www.astrosociety.org/education/resources/womenast_bib.html | |||
links 2397 | |||
VISITING http://www.dmoz.org/Science/Astronomy/History/Worldviews/ | |||
links 2408 | |||
VISITING http://www.dmoz.org/Science/Math/History/ | |||
links 2436 | |||
VISITING http://www.dmoz.org/Science/Physics/History/ | |||
links 2474 | |||
</source> | </source> |
Latest revision as of 13:57, 17 June 2014
From the dmoz website:
DMOZ is the largest, most comprehensive human-edited directory of the Web. It is constructed and maintained by a passionate, global community of volunteers editors. It was historically known as the Open Directory Project (ODP).
Step 1: Extracting the links from a particular directory page
Let's start at a single page on dmoz, such as:
http://www.dmoz.org/Science/Astronomy/
If you look into the source, we can see the structure around the URLs listed at the bottom of the page:
<ul style="margin-left:0;" class="directory-url">
<li>
<a class="listinglink" href="http://www.absoluteastronomy.com/">Absolute Astronomy</a>
- Facts and statistical information about planets, moons, constellations, stars, galaxies, and Messier objects.
<div class="flag"><a href="/public/flag?cat=Science%2FAstronomy&url=http%3A%2F%2Fwww.absoluteastronomy.com%2F"><img title="report an issue with this listing" alt="[!]" src="/img/flag.png"></a></div>
</li>
import urllib2, html5lib, sys
url = sys.argv[1]
f = urllib2.urlopen(url)
src = f.read()
tree = html5lib.parse(src, namespaceHTMLElements=False)
for div in tree.findall(".//ul"):
if "directory-url" in div.get("class", "").split():
for li in div.findall("li"):
for a in li.findall("a"):
if "listinglink" in a.get("class", "").split():
linkurl = a.get("href")
linkdescription = a.tail.strip().strip("-").strip()
print (linkurl)
print ("\t"+linkdescription.encode("utf-8"))
print ()
The pages also contain links to sub- and related categories, if we want to follow these as well, we can...
Conceptually, we add a notion of a "to do" list, for URLs we still want to check, and then a "seen" list for remembering where we've already been. The seen list is actually more efficient to implement as a python dictionary where we "add" a URL by using the URL as a key value which points to True (it doesn't actually matter what value is assigned in the example).
<div class="dir-1 borN">
<span><img style="height:2px;float:left;width:100%" src="http://o.aolcdn.com/os/dmoz/img/dividerN.gif"></span>
<ul class="directory dir-col">
<li class="">
<a href="/Science/Anomalies_and_Alternative_Science/Astronomy%2C_Alternative/">Alternative</a>@
<em>(72)</em>
</li>
from __future__ import print_function
import urllib2, html5lib, sys
from urlparse import urljoin
url = sys.argv[1]
todo = [url]
seen = {}
while len(todo) > 0:
url, todo = todo[0], todo[1:]
if url not in seen:
f = urllib2.urlopen(url)
print("VISITING", url)
seen[url] = True
src = f.read()
tree = html5lib.parse(src, namespaceHTMLElements=False)
# Extract links
print ("LINKS")
for div in tree.findall(".//ul"):
if "directory-url" in div.get("class", "").split():
for li in div.findall("li"):
for a in li.findall("a"):
if "listinglink" in a.get("class", "").split():
linkurl = a.get("href")
linkdescription = a.tail.strip().strip("-").strip()
print (linkurl)
print ("\t"+linkdescription.encode("utf-8"))
# Follow the related category pages
print ("RELATED")
for div in tree.findall(".//ul"):
if "directory" in div.get("class", "").split():
for li in div.findall("li"):
for a in li.findall("a"):
suburl = a.get("href")
suburl = urljoin(url, suburl)
description = a.text.strip()
print (suburl)
print ("\t"+description.encode("utf-8"))
if suburl not in seen:
todo.append(suburl)
print ()
Step 3: Putting it all together
from __future__ import print_function
import urllib2, html5lib, sys
from urlparse import urljoin
url = sys.argv[1]
todo = [url]
seen = {}
links = {}
while len(todo) > 0:
url, todo = todo[0], todo[1:]
if url not in seen:
f = urllib2.urlopen(url)
print("VISITING", url)
seen[url] = True
src = f.read()
tree = html5lib.parse(src, namespaceHTMLElements=False)
# Extract links
for div in tree.findall(".//ul"):
if "directory-url" in div.get("class", "").split():
for li in div.findall("li"):
for a in li.findall("a"):
if "listinglink" in a.get("class", "").split():
linkurl = a.get("href")
linkdescription = a.tail.strip().strip("-").strip()
if linkurl in links:
print ("already seen", linkurl)
# Record the link
links[linkurl] = linkdescription
# Follow the related category pages
for div in tree.findall(".//ul"):
if "directory" in div.get("class", "").split():
for li in div.findall("li"):
for a in li.findall("a"):
suburl = a.get("href")
suburl = urljoin(url, suburl)
description = a.text.strip()
if suburl not in seen:
# Add the suburl to the todo list
todo.append(suburl)
print ("links", len(links.keys()))
Output
Performing the code from step 3...
python crawl.py http://www.dmoz.org/Science/Astronomy
... produces the output shown below. The program actually can run for a very long time, after 5 minutes, the crawler has found over 2000 unique links and has moved on to categories like Math History and Physics History. We use ctrl-c to stop the script and preventing it from crawling the entire dmoz website. A next step is to maybe think about when and how to stop the crawler; perhaps after a certain number of links have been found, or else by limiting the number of steps to take away from the starting page.
VISITING http://www.dmoz.org/Science/Astronomy
links 20
VISITING http://www.dmoz.org/Science/Anomalies_and_Alternative_Science/Astronomy%2C_Alternative/
links 39
VISITING http://www.dmoz.org/Science/Social_Sciences/Archaeology/Topics/Archaeoastronomy/
links 94
VISITING http://www.dmoz.org/Science/Physics/Astrophysics/
links 128
VISITING http://www.dmoz.org/Science/Astronomy/Calendars_and_Timekeeping/
links 154
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/
links 170
VISITING http://www.dmoz.org/Society/Issues/Environment/Light_Pollution/
links 221
VISITING http://www.dmoz.org/Science/Astronomy/Eclipses%2C_Occultations_and_Transits/
links 221
VISITING http://www.dmoz.org/Science/Astronomy/Extrasolar_Planets/
links 250
VISITING http://www.dmoz.org/Science/Astronomy/Extraterrestrial_Life/
links 258
VISITING http://www.dmoz.org/Science/Astronomy/Galaxies/
links 280
VISITING http://www.dmoz.org/Science/Astronomy/Interstellar_Medium/
links 285
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/
links 320
VISITING http://www.dmoz.org/Science/Astronomy/Star_Clusters/
links 322
VISITING http://www.dmoz.org/Science/Astronomy/Stars/
links 347
VISITING http://www.dmoz.org/Science/Astronomy/Academic_Departments/
links 355
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/
already seen http://www.sundial.thai-isan-lao.com/
links 388
VISITING http://www.dmoz.org/Science/Astronomy/Astronomers/
links 405
VISITING http://www.dmoz.org/Science/Astronomy/Data_Archives/
links 448
VISITING http://www.dmoz.org/Science/Astronomy/Directories/
already seen http://www.cosmobrain.com/
links 457
VISITING http://www.dmoz.org/Science/Astronomy/Education/
already seen http://www.math.nus.edu.sg/aslaksen/teaching/heavenly.shtml
already seen http://www.windows2universe.org/
links 490
VISITING http://www.dmoz.org/Science/Astronomy/History/
links 531
VISITING http://www.dmoz.org/Recreation/Humor/Science/Astronomy/
links 536
VISITING http://www.dmoz.org/Science/Astronomy/Images/
links 575
VISITING http://www.dmoz.org/Science/Astronomy/In_the_Arts/
already seen http://www.lindahall.org/events_exhib/exhibit/exhibits/stars/index.html
links 578
VISITING http://www.dmoz.org/Science/Astronomy/News_and_Media/
links 601
VISITING http://www.dmoz.org/Science/Astronomy/Organizations/
already seen http://planetary.org/
links 619
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Personal_Pages/
links 676
VISITING http://www.dmoz.org/Science/Astronomy/Popular_Topics/
links 676
VISITING http://www.dmoz.org/Science/Astronomy/Products_and_Services/
links 676
VISITING http://www.dmoz.org/Science/Astronomy/Publications/
links 677
VISITING http://www.dmoz.org/Science/Astronomy/Regional/
links 677
VISITING http://www.dmoz.org/Science/Astronomy/Research_Groups_and_Centers/
already seen http://aa.usno.navy.mil/
links 711
VISITING http://www.dmoz.org/Shopping/Recreation/Science_and_Nature/Astronomy/
links 721
VISITING http://www.dmoz.org/Science/Astronomy/Software/
links 738
VISITING http://www.dmoz.org/Science/Astronomy/News_and_Media/Weblogs/
links 743
VISITING http://www.dmoz.org/Society/People/Women/Science_and_Technology/Astronomy/
links 752
VISITING http://www.dmoz.org/Science/Astronomy/Observatories/
links 807
VISITING http://www.dmoz.org/Science/Astronomy/Planetariums/
links 807
VISITING http://www.dmoz.org/Kids_and_Teens/School_Time/Science/Astronomy_and_Space/
already seen http://www.iau.org/
already seen http://www.pbs.org/deepspace/
already seen http://planetquest.jpl.nasa.gov/
already seen http://www.pbs.org/wgbh/nova/universe/
already seen http://www.space.com/
already seen http://library.thinkquest.org/C0110484/
already seen http://www.windows2universe.org/
links 859
VISITING http://www.dmoz.org/Science/Earth_Sciences/Atmospheric_Sciences/Atmospheric_Physics/
links 882
VISITING http://www.dmoz.org/Science/Physics/
links 896
VISITING http://www.dmoz.org/Science/Technology/Space/
links 907
VISITING http://www.dmoz.org/Science/Anomalies_and_Alternative_Science/Astronomy%2C_Alternative/Cosmology/
links 953
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Planets/Mars/Life_on_Mars/
links 961
VISITING http://www.dmoz.org/Science/Anomalies_and_Alternative_Science/Astronomy%2C_Alternative/Planetary_Anomalies/
links 964
VISITING http://www.dmoz.org/Science/Anomalies_and_Alternative_Science/Astronomy%2C_Alternative/Sirius_and_the_Dogon/
links 968
VISITING http://www.dmoz.org/Science/Anomalies_and_Alternative_Science/Geology%2C_Alternative/Velikovsky%2C_Immanuel/
links 976
VISITING http://www.dmoz.org/Science/Astronomy/
already seen http://www.absoluteastronomy.com/
already seen http://aa.usno.navy.mil/
already seen http://astronomicaloptical.blogspot.com/
already seen http://www.astrosociety.org/education/resources/pseudobib.html
already seen http://en.citizendium.org/wiki/Astronomy
already seen http://casswww.ucsd.edu/public/astroed.html
already seen http://www.funtrivia.com/ql.cfm?cat=59
already seen http://www.pd.astro.it/E-MOSTRA/A0000HOM.HTM
already seen http://astronomyonline.org/
already seen http://www.astronomytoday.com/
already seen http://www.cv.nrao.edu/fits/www/astronomy.html
already seen http://www.atlasoftheuniverse.com/
already seen http://www.cosmosportal.org/
already seen http://scienceworld.wolfram.com/astronomy/
already seen http://coolcosmos.ipac.caltech.edu/cosmic_classroom/ir_tutorial/
already seen https://www.lifereader.com/research-articles/logarithmic-maps-of-the-universe
already seen http://muse.univ-lyon1.fr/
already seen http://www.nmm.ac.uk/server/show/conWebDoc.309
already seen http://universe-review.ca/
already seen http://jumk.de/astronomie/astronomy.shtml
links 976
VISITING http://www.dmoz.org/Science/Social_Sciences/Archaeology/Periods_and_Cultures/Mesoamerican/Aztec/Calendar/
links 979
VISITING http://www.dmoz.org/Science/Social_Sciences/Archaeology/Topics/Archaeoastronomy/Mayan/
links 984
VISITING http://www.dmoz.org/Science/Social_Sciences/Archaeology/Topics/Archaeoastronomy/Stonehenge/
links 989
VISITING http://www.dmoz.org/Science/Social_Sciences/Archaeology/Alternative/Archaeoastronomy/
already seen http://www.archaeoastronomy.com/
links 1006
VISITING http://www.dmoz.org/Science/Social_Sciences/Archaeology/Archaeologists/Archaeoastronomers/
links 1009
VISITING http://www.dmoz.org/Science/Social_Sciences/Archaeology/Topics/Archaeoastronomy/Conferences/
links 1011
VISITING http://www.dmoz.org/Science/Social_Sciences/Archaeology/Topics/Archaeoastronomy/Journals/
links 1015
VISITING http://www.dmoz.org/Science/Physics/Relativity/Black_Holes/
links 1034
VISITING http://www.dmoz.org/Science/Astronomy/Stars/Neutron_Stars/
already seen http://www.bigear.org/vol1no1/burnell.htm
links 1053
VISITING http://www.dmoz.org/Science/Astronomy/Stars/Stellar_Evolution/
links 1061
VISITING http://www.dmoz.org/Science/Astronomy/Software/Computational_Astrophysics/
links 1077
VISITING http://www.dmoz.org/Science/Physics/Astrophysics/Research_Groups_and_Centers/
already seen http://www.cfa.harvard.edu/hco/
already seen http://www.cfa.harvard.edu/
already seen http://www.cfa.harvard.edu/sao/
links 1091
VISITING http://www.dmoz.org/Science/Physics/Nuclear/
links 1112
VISITING http://www.dmoz.org/Science/Physics/Particle/Astro_Particle/
links 1113
VISITING http://www.dmoz.org/Reference/Time/Clocks_and_Watches/
links 1118
VISITING http://www.dmoz.org/Science/Reference/Standards/Individual_Standards/ISO/ISO_8601/
links 1159
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Planets/Earth/Moon/Phases/
links 1176
VISITING http://www.dmoz.org/Science/Astronomy/Calendars_and_Timekeeping/Sundials/
links 1195
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Sun/Sunrise_and_Sunset_Times/
links 1204
VISITING http://www.dmoz.org/Kids_and_Teens/School_Time/Science/Astronomy_and_Space/Time/
already seen http://www.cl.cam.ac.uk/~mgk25/iso-time.html
already seen http://americanhistory.si.edu/ontime/
links 1224
VISITING http://www.dmoz.org/Reference/Time/
already seen http://www.hermetic.ch/cal_stud.htm
already seen http://www-history.mcs.st-andrews.ac.uk/HistTopics/Time_1.html
already seen http://www.cl.cam.ac.uk/~mgk25/iso-time.html
already seen http://www.npr.org/templates/story/story.php?storyId=4572036
links 1238
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Sky_Calendars/
already seen http://www.sky-watch.com/
links 1248
VISITING http://www.dmoz.org/Science/Technology/Metrology/
links 1252
VISITING http://www.dmoz.org/Society/Holidays/Calendars_and_Lists/
links 1279
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/Cosmic_Background_Radiation/
links 1287
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/Cosmological_Constant_and_Dark_Energy/
links 1291
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/Dark_Matter/
links 1310
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/Earliest_Universe/
links 1312
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/Inflation/
links 1315
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/Large-Scale_Structure/
links 1318
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/Nucleosynthesis/
links 1320
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/Topological_Defects/
links 1326
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/Articles/
links 1332
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/Courses_and_Tutorials/
links 1334
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/History/
links 1336
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/Research_Groups/
links 1342
VISITING http://www.dmoz.org/Science/Astronomy/Cosmology/Software/
links 1344
VISITING http://www.dmoz.org/Science/Physics/Relativity/
already seen http://www.damtp.cam.ac.uk/user/gr/public/
links 1365
VISITING http://www.dmoz.org/Society/Issues/Environment/Light_Pollution/Chats_and_Forums/
links 1369
VISITING http://www.dmoz.org/Society/Issues/Environment/Light_Pollution/News_and_Media/
links 1384
VISITING http://www.dmoz.org/Science/Technology/Lighting/
links 1400
VISITING http://www.dmoz.org/Society/Issues/Environment/Light_Pollution/Opposing_Views/
links 1400
VISITING http://www.dmoz.org/Society/Issues/Environment/Light_Pollution/Regulation/
links 1400
VISITING http://www.dmoz.org/Society/Issues/Environment/Energy/
links 1442
VISITING http://www.dmoz.org/Society/Issues/Environment/Pollution/
links 1445
VISITING http://www.dmoz.org/Science/Astronomy/Eclipses%2C_Occultations_and_Transits/Eclipses/
links 1472
VISITING http://www.dmoz.org/Science/Astronomy/Eclipses%2C_Occultations_and_Transits/Occultations/
already seen http://www.occultations.org/
links 1474
VISITING http://www.dmoz.org/Science/Astronomy/Eclipses%2C_Occultations_and_Transits/Transits/
links 1483
VISITING http://www.dmoz.org/Science/Astronomy/History/Eclipses%2C_Occultations_and_Transits/
links 1488
VISITING http://www.dmoz.org/Science/Astronomy/Products_and_Services/Expeditions/
links 1494
VISITING http://www.dmoz.org/Science/Technology/Space/Missions/Unmanned/Deep_Space/Kepler/
links 1495
VISITING http://www.dmoz.org/Science/Astronomy/Extraterrestrial_Life/Exobiology/
links 1518
VISITING http://www.dmoz.org/Science/Astronomy/Extraterrestrial_Life/Fermi%27s_Paradox/
links 1529
VISITING http://www.dmoz.org/Science/Astronomy/Extraterrestrial_Life/Panspermia/
already seen http://www.panspermia.org/
links 1533
VISITING http://www.dmoz.org/Science/Astronomy/Extraterrestrial_Life/SETI/
links 1549
VISITING http://www.dmoz.org/Society/Paranormal/UFOs/
links 1578
VISITING http://www.dmoz.org/Science/Astronomy/Galaxies/Active_Galactic_Nuclei_-_Quasars/
already seen http://charles_w.tripod.com/quasar.html
links 1584
VISITING http://www.dmoz.org/Science/Astronomy/Galaxies/Milky_Way/
already seen http://mwmw.gsfc.nasa.gov/
links 1592
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Deep_Sky_Observing/
already seen http://arpgalaxy.com/
already seen http://www.webbdeepsky.com/
links 1610
VISITING http://www.dmoz.org/Science/Astronomy/Interstellar_Medium/Emission_Nebulae/
links 1611
VISITING http://www.dmoz.org/Science/Astronomy/Interstellar_Medium/Planetary_Nebulae/
links 1616
VISITING http://www.dmoz.org/Science/Astronomy/Interstellar_Medium/Reflection_Nebulae/
links 1617
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Planets/Earth/
links 1626
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Planets/Jupiter/
links 1637
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Planets/Mars/
links 1666
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Planets/Mercury/
links 1676
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Planets/Neptune/
links 1686
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Dwarf_Planets/Pluto_and_Charon/
links 1697
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Planets/Saturn/
links 1711
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Sun/
already seen http://solar-center.stanford.edu/
links 1734
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Planets/Earth/Moon/
links 1749
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Planets/Uranus/
links 1757
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Planets/Venus/
links 1769
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Asteroid_Belt/
links 1769
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Small_Bodies/
links 1792
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Dwarf_Planets/
links 1792
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Kuiper_Belt/
links 1810
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Oort_Cloud/
links 1811
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Planets/
already seen http://pds.jpl.nasa.gov/planets/
links 1814
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Solar_System_Observing/
links 1815
VISITING http://www.dmoz.org/Science/Astronomy/Solar_System/Conferences/
links 1815
VISITING http://www.dmoz.org/Science/Astronomy/Data_Archives/Solar_System/
already seen http://pds.jpl.nasa.gov/
links 1817
VISITING http://www.dmoz.org/Science/Astronomy/History/Solar_System/
already seen http://www-groups.dcs.st-and.ac.uk/~history/HistTopics/Neptune_and_Pluto.html
links 1821
VISITING http://www.dmoz.org/Science/Technology/Space/Missions/
links 1829
VISITING http://www.dmoz.org/Science/Astronomy/Star_Clusters/Globular_Clusters/
links 1831
VISITING http://www.dmoz.org/Science/Astronomy/Star_Clusters/Open_Clusters/
links 1832
VISITING http://www.dmoz.org/Science/Astronomy/Stars/Binary_Stars/
already seen http://www.phys.lsu.edu/astro/nap98/bf.final.html
links 1842
VISITING http://www.dmoz.org/Science/Astronomy/Stars/Dwarf_Stars/
links 1849
VISITING http://www.dmoz.org/Science/Astronomy/Stars/Names/
links 1853
VISITING http://www.dmoz.org/Science/Astronomy/Stars/Novae_and_Supernovae/
already seen http://chandra.harvard.edu/press/01_releases/press_011001.html
links 1863
VISITING http://www.dmoz.org/Science/Astronomy/Stars/Variable_Stars/
already seen http://www.aavso.org/
links 1869
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Sky_Maps_and_Atlases/
already seen http://www.lindahall.org/events_exhib/exhibit/exhibits/stars/index.html
links 1892
VISITING http://www.dmoz.org/Science/Astronomy/Data_Archives/Stellar_and_Astrometric/
already seen http://ad.usno.navy.mil/wds/dsl.html
links 1898
VISITING http://www.dmoz.org/Shopping/Recreation/Science_and_Nature/Astronomy/Space_Novelties/Star_Names/
links 1903
VISITING http://www.dmoz.org/Reference/Education/Colleges_and_Universities/
links 1903
VISITING http://www.dmoz.org/Science/Academic_Departments/
links 1912
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Amateur_Contributions_to_Science/
already seen http://mbond.free.fr/
links 1917
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Amateur_Telescope_Making/
links 1942
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Astrophotography_and_CCD_Imaging/
already seen http://www.astropix.com/
links 1988
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Beginners/
already seen http://home.pcisys.net/~astrogirl/
links 2016
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Binocular_Astronomy/
links 2023
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Constellations/
already seen http://www.astro.wisc.edu/~dolan/constellations/
links 2034
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Observatories/
already seen http://mysite.verizon.net/vzeed81b/
already seen http://www.grimee.com/
links 2044
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Radio_Astronomy/
already seen http://www.bigear.org/
links 2056
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Satellites/
already seen http://www.planet4589.org/space/jsr/jsr.html
already seen http://spaceflight.nasa.gov/realdata/sightings/
links 2062
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Spectroscopy/
already seen http://www.spectrashift.com/
links 2066
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Star_Parties/
links 2087
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Telescope_Owner_Resources/
links 2105
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Chats_and_Forums/
links 2117
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Classifieds/
links 2122
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Events/
links 2125
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Organizations/
links 2126
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Product_Reviews/
already seen http://www.scopereviews.com/
links 2131
VISITING http://www.dmoz.org/Science/Astronomy/Data_Archives/Supernovae_and_Remnants/
already seen http://www.mrao.cam.ac.uk/surveys/snrs/
already seen http://www.rochesterastronomy.org/snimages/
links 2136
VISITING http://www.dmoz.org/Science/Astronomy/Amateur/Organizations/Directories/
links 2138
VISITING http://www.dmoz.org/Science/Directories/
links 2163
VISITING http://www.dmoz.org/Science/Technology/Space/NASA/Education/
already seen http://quest.arc.nasa.gov/
already seen http://origins.stsci.edu/
already seen http://cse.ssl.berkeley.edu/SegwayEd/index.html
already seen http://www.windows2universe.org/
links 2241
VISITING http://www.dmoz.org/Science/Astronomy/Observatories/Teaching_and_Public/
links 2254
VISITING http://www.dmoz.org/Science/Astronomy/Products_and_Services/Stargazing_Events/
links 2256
VISITING http://www.dmoz.org/Science/Astronomy/Software/Education_and_Multimedia/
links 2267
VISITING http://www.dmoz.org/Science/Astronomy/Products_and_Services/Educational_Resources/
links 2276
VISITING http://www.dmoz.org/Shopping/Entertainment/Recordings/Video/Science/
links 2284
VISITING http://www.dmoz.org/Science/Physics/Education/
already seen http://microgravity.grc.nasa.gov/DIME.html
already seen http://www.grc.nasa.gov/WWW/K-12/Numbers/Math/Mathematical_Thinking/index.htm
already seen http://www.colorado.edu/physics/2000/
links 2363
VISITING http://www.dmoz.org/Science/Technology/Space/Education/
already seen http://www.challenger.org/
already seen http://www.phy6.org/stargaze/Sintro.htm
links 2380
VISITING http://www.dmoz.org/Science/Astronomy/History/Instruments/
links 2384
VISITING http://www.dmoz.org/Science/Astronomy/History/Observatories/
links 2387
VISITING http://www.dmoz.org/Science/Astronomy/History/People/
already seen http://www.astrosociety.org/education/resources/womenast_bib.html
links 2397
VISITING http://www.dmoz.org/Science/Astronomy/History/Worldviews/
links 2408
VISITING http://www.dmoz.org/Science/Math/History/
links 2436
VISITING http://www.dmoz.org/Science/Physics/History/
links 2474