User:Francg/expub/thesis/prototype: Difference between revisions

Latest revision as of 13:02, 5 October 2017

Prototype

Extracting data (scrapping URL's / web links from content only)
from: https://www.reddit.com/

Run Python (I did it from virtual environment in my laptop)
then following these commands:

from bs4 import BeautifulSoup
import requests
url = raw_input("https://www.reddit.com/: ")
r = requests.get("https://www.reddit.com/" +url)
data = r.text
soup = BeautifulSoup(data)
for link in soup.find_all('a'):
print(link.get('href'))

Bs4-test-reddit1-2.png

Bs4-test-reddit1

Bs4-test-reddit1-2

Bs4-test-reddit1-3

@@ Line 1: / Line 1: @@
+<div style="font-size:100%; letter-spacing: 0.05em; line-height: 1.6em; margin-left: 80px; margin-right: 140px;">
 <center>
+<br>
+'''Prototype'''
-'''Extracting URL's from any website'''
+Extracting data (scrapping URL's / web links from content only)
+<br>from: https://www.reddit.com/
-Now when we know what BS4 is and we have installed it on our machine,
+<br>
-let's see what we can do with it.
+<br>
+</center>
-from bs4 import BeautifulSoup
+Run Python (I did it from virtual environment in my laptop)
+<br>then following these commands:
-import requests
+<br>from bs4 import BeautifulSoup
+<br>import requests
+<br>url = raw_input("https://www.reddit.com/: ")
+<br>r  = requests.get("https://www.reddit.com/" +url)
+<br>data = r.text
+<br>soup = BeautifulSoup(data)
+<br>for link in soup.find_all('a'):
+<br>    print(link.get('href'))
-url = raw_input("Enter a website to extract the URL's from: ")
+Bs4-test-reddit1-2.png
-r  = requests.get("http://" +url)
+<img src="https://pzwiki.wdka.nl/mw-mediadesign/images/9/98/Bs4-test-reddit1.png" alt="Bs4-test-reddit1" width="250%" height="250%"/>
-data = r.text
+<img src="https://pzwiki.wdka.nl/mw-mediadesign/images/5/56/Bs4-test-reddit1-2.png" alt="Bs4-test-reddit1-2" width="250%" height="250%"/>
-soup = BeautifulSoup(data)
+<img src="https://pzwiki.wdka.nl/mw-mediadesign/images/9/91/Bs4-test-reddit1-3.png" alt="Bs4-test-reddit1-3" width="250%" height="250%"/>
-for link in soup.find_all('a'):
-    print(link.get('href'))
-</center>