User:Francg/expub/thesis/prototype: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
No edit summary
Line 5: Line 5:
'''Prototype'''
'''Prototype'''


Extracting data (in this case I just scrap URL's / web links) from:  
Extracting data (in this case I scrap URL's / web links only) from: https://www.reddit.com/
<br> e.g. the website where I have my thesis outline stored: [https://pzwiki.wdka.nl/mediadesign/User:Francg/expub/thesis/thesis-outline Thesis Outline]


<br>
<br>
Line 15: Line 14:
<br>from bs4 import BeautifulSoup
<br>from bs4 import BeautifulSoup
<br>import requests
<br>import requests
<br>url = raw_input("Enter a website to extract the URL's from: ")
<br>url = raw_input("https://www.reddit.com/: ")
<br>r  = requests.get("http://" +url)
<br>r  = requests.get("https://www.reddit.com/" +url)
<br>data = r.text
<br>data = r.text
<br>soup = BeautifulSoup(data)
<br>soup = BeautifulSoup(data)
<br>for link in soup.find_all('a'):
<br>for link in soup.find_all('a'):
     print(link.get('href'))
     print(link.get('href'))

Revision as of 00:52, 5 October 2017


Prototype

Extracting data (in this case I scrap URL's / web links only) from: https://www.reddit.com/



Run Python (I did it from virtual environment)
from bs4 import BeautifulSoup
import requests
url = raw_input("https://www.reddit.com/: ")
r = requests.get("https://www.reddit.com/" +url)
data = r.text
soup = BeautifulSoup(data)
for link in soup.find_all('a'):

   print(link.get('href'))