User:Francg/expub/thesis/prototype

Extracting URL's from any website

Now when we know what BS4 is and we have installed it on our machine, let's see what we can do with it.

from bs4 import BeautifulSoup

import requests

url = raw_input("Enter a website to extract the URL's from: ")

r = requests.get("http://" +url)

data = r.text

soup = BeautifulSoup(data)

for link in soup.find_all('a'):

   print(link.get('href'))