User:Tash/Prototyping 03

From XPUB & Lens-Based wiki

Workshop: CGI & TF-IDF search engine

Search tash.png
#!//usr/local/bin/python3
import cgi
import cgitb; cgitb.enable()
import nltk
import re

print ("Content-type:text/html;charset=utf-8")
print ()

#cgi.print_environ()

f = cgi.FieldStorage()
submit1 = f.getvalue("submit1", "")
submit2 = f.getvalue("submit2", "")

text = f.getvalue("text", "")

### SORTING
import os
import csv
import string
import pandas as pd
import sys
### SEARCHING
#input keyword you want to search
keyword = text


print ("""<!DOCTYPE html>
<html>
<head>
	<title>Search</title>
	<meta charset="utf-8">
</head>
<body>
<p style='font-size: 20pt; font-family: Courier'>Search by keyword</p>
	<form method="get">
	<textarea name="text" style="background: yellow; font-size: 10pt; width: 370px; height: 28px;" autofocus></textarea>
	<input type="submit" name="submit" value="Search" style='font-size: 9pt; height: 32px; vertical-align:top;'>

</form>
<p style='font-size: 9pt; font-family: Courier'>
	webring <br>
<a href="http://145.24.204.185:8000/form.html">joca</a>
<a href="http://145.24.198.145:8000/form.html">alice</a>
<a href="http://145.24.246.69:8000/form.html">michael</a>
<a href="http://145.24.165.175:8000/form.html">ange</a>
<a href="http://145.24.254.39:8000/form.html">zalan</a>

</p>
</body>
</html>""")
x = 0
if text :
	#read csv, and split on "," the line
	csv_file = csv.reader(open('tfidf.csv', "r"), delimiter=",")
	col_names = next(csv_file)
	#loop through csv list
	for row in csv_file:
		#if current rows value is equal to input, print that row
		if keyword == row[0] :
			tfidf_list = list(zip(col_names, row))
			del tfidf_list[0]
			sorted_by_second = sorted(tfidf_list, key=lambda x:float(x[1]), reverse=True)
			print ("<p></p>")
			print ("--------------------------------------------------------------------------------------")
			print ("<p style='font-size: 20pt; font-family: Courier'>Results</p>")
			for item in sorted_by_second:
				x = x+1
				print ("--------------------------------------------------------------------------------------")
				print ("<br></br>")
				print(x, item)
				n = item[0]

				f = open("cgi-bin/texts/{}".format(n), "r")
				sents = nltk.sent_tokenize(f.read())

				for sentence in sents:
					if re.search(r'\b({})\b'.format(text), sentence):
						print ("<br></br>")
						print(sentence)
				f.close()
				print ("<br></br>")

Workshop with Marcell Mars

On tunneling: contexts: censorship (e.g. of nation states like China, Turkey, Iran or of institutions and academia), anonymity, organizing resistance or political action rules: physical location (as far as the wifi goes),

AP: access point (for wireless routers for example) router: a networking device that forwards data packets between computer networks. Routers perform the traffic directing functions on the Internet. encryption certificates: used by websites to enable secure HTTPS connections, issued to domain and subdomain. Issued by authorities like Let’s Encrypt (recently free) and DigiCert. DNS: domain name server or system, which resolves and distributes IP adresses, and lets you get to the domain. Usually set to automatic DHCP, but you can manually choose your own conversion point, like those served by Google (8.8.8.8)
 ping: command line tool to send a quick byte of info to check if a domain is alive network interface: the device / card (e.g. wireless or ethernet) through which your computer is talking to the internet. IP addresses are assigned to the network interface

> so to avoid network admins from seeing your DNS requests (and tracking domain and subdomains) you can ‘tunnel’ and use things like an encrypted DNS server, or proxy servers

When talking about networks: in Unix philosophy, everything is a file, with paths which you can read and write into. Networks are streaming media, so here things become more complex. Here, ports are the sockets through which you can make connections. Over time, default conventions have been assigned – like 22 for SSH and 443 for HTTPS.

Repositories: https://gitlab.com/marcellmars/letssharebooks https://github.com/marcellmars/logan_and_jessica

Exercise: https://imgur.com/a/xuUuN https://rsync.samba.org/

Interesting projects: https://beakerbrowser.com/ https://ipfs.io/ https://zerotier.com/


Self directed research

Brainstorm 23.04.2018

Interface: How do you visualize that which is UNSTABLE? Serendipity? Missing data? Uncertainty? Dissent? Multiple views? On data provenance and feminist visualization: https://civic.mit.edu/feminist-data-visualization HOW can you GET data that's MISSING ?! E.G. from LibGen: where is the UPLOAD DATA? what could we do with it?

Simple test to highlight absent information: in LibGen's catalogue CSV there are row without titles How to search for blanks?

something like:

  csvgrep -c Title -m "" content.csv

^ this solution matches spaces but doesn't look for empty state cells.

Libgen blanks.gif
  csvgrep -c Author -r "^$" content.csv

^ this solution finds rows with empty state cells in the 'Author' column

andre's exciting explorations of the archive.org api search: Internet Archive Advanced search: https://archive.org/advancedsearch.php ghost in the mp3


Interface & database

SQL

SQL - Structured Query Language. It is declarative computer language aimed at querying relational databases. MySQL is a relational database - a piece of software optimized for data storage and retrieval. There are many such databases - Oracle, Microsoft SQL Server, SQLite and many others are examples of such.

SQLite

SQLite is an embedded SQL database engine that implements a self-contained, serverless, zero-configuration, transactional SQL database engine. The code for SQLite is in the public domain and is thus free for use for any purpose, commercial or private. SQLite is the most widely deployed database in the world with more applications than we can count, including several high-profile projects.

Unlike most other SQL databases, SQLite does not have a separate server process. SQLite reads and writes directly to ordinary disk files. A complete SQL database with multiple tables, indices, triggers, and views, is contained in a single disk file. Furthermore, the file format is cross-platform. A database that is created on one machine can be copied and used on a different machine with a different architecture. https://sqlite.org/about.html

Flask

Flask is a BSD-licensed microframework for Python based on Werkzeug and Jinja 2.


Search functionality

Using Flask-WTForms to create a search which queries the SQL database. Links: https://pythonhosted.org/Flask-Bootstrap/forms.html and https://programfault.com/flask-101-how-to-add-a-search-form/

in forms.py

  • simple string search field
class SearchForm(FlaskForm):
    search = StringField('', validators=[InputRequired()])
</search>


'''in views.py'''
* putting search bar on home page
* routing results.html, setting up redirect and error message

<source lang= python>
@app.route('/', methods=['GET', 'POST'])
def home():
    """Render website's home page."""
    #return render_template('home.html')
    search = SearchForm(request.form)
    if request.method == 'POST':
        return search_results(search)
    return render_template('home.html', form=search)

## search
@app.route('/results', methods= ['GET'])
def search_results(search):
    results = []
    search_string = search.data['search']

    if search_string:
        results=Book.query.filter(Book.title.contains(search_string)).all()

    if not results:
        flash('No results found!')
        return redirect('/')

    else:
        # display results
        return render_template('results.html', books=results)

in results.html

  • template page for showing results, same as show_books.html
{% extends 'base.html' %}

{% block main %}
<div class="container">
  <h1 class="page-header">Search Results</h1>
  {% with messages = get_flashed_messages() %}
    {% if messages %}
      <div class="alert alert-success">
        <ul>
        {% for message in messages %}
          <li>{{ message }}</li>
        {% endfor %}
        </ul>
      </div>
    {% endif %}
  {% endwith %}

  <table style="width:100%">
    <tr>
        <th>Cover</th>
      <th>Title</th>
      <th>Author</th>
      <th>Filetype</th>
      <th>Tag</th>
    </tr>
        {% for book in books %}
    <tr>
      <td><img src="../uploads/cover/{{ book.cover }}" width="80"></td>
      <td><a href="books/{{ book.id }}">{{ book.title }}</a></td>

      <td>  {% for author in book.authors %}

              <li><a href="{{url_for('show_author_by_id', id=author.id)}}">{{ author.author_name }}</a>  </li>

        {% endfor %}</td>
      <td>{{ book.fileformat }}</td>
      <td>{{ book.tag}}</td>
    </tr>
  {% endfor %}
  </table>


</div>
{% endblock %}

HTML / bootstrap visualizations

Responsive image gallery, for search interface:

Responsiveweb.jpg