User:Tash/Special Issue 06

From XPUB & Lens-Based wiki

Research and contribution to XPPL

Research questions

on interface:

  • How do you engage with UNSTABLE libraries? Can we design reading / searching interfaces that are able to represent uncertainty, locate outsides, explore provenances?

on sociality and access:

  • What modes of sociality can we embed into a library interface? Can we devise new ways to talk back to the data?
  • How to hero the enunciative materiality of digital libraries? Prioritize ecosystems and interactions instead of objects?
  • What is the current state of media piracy in the West vs in Asia? In both contexts, who is the pirate downloader and the outlaw uploader?

More research here.

Brainstorm 23.04.2018

Interface: How do you visualize that which is UNSTABLE? Serendipity? Missing data? Uncertainty? Dissent? Multiple views? On data provenance and feminist visualization: https://civic.mit.edu/feminist-data-visualization HOW can you GET data that's MISSING ?! E.G. from LibGen: where is the UPLOAD DATA? what could we do with it?

Simple test to highlight absent information: in LibGen's catalogue CSV there are row without titles How to search for blanks?

something like:

  csvgrep -c Title -m "" content.csv

^ this solution matches spaces but doesn't look for empty state cells.

Libgen blanks.gif
  csvgrep -c Author -r "^$" content.csv

^ this solution finds rows with empty state cells in the 'Author' column

andre's exciting explorations of the archive.org api search: Internet Archive Advanced search: https://archive.org/advancedsearch.php ghost in the mp3

Project description

Visualizing data provenance. Map to Not Indicate (Art & Language, 1967)

The default web interface of the library is a space for researching as well as reading. Here users can choose to navigate through the entire X-LIB catalogue, or through various stacks. Unlike most search engines, X-LIB's is designed to prioritize ecosystems and interactions instead of results or objects. Multiple queries for non-existant items in the collection are tracked and automatically made into red links, which are visualised and placed back into the library. In this way the collection is always represented in relation to its own limits, outsides and peripheries. Making these 'wishlists' visible also offers context: we get to know our fellow researchers, situate our own knowledge with theirs. Within the core network, pirate downloaders and outlaw uploaders can interact more directly with each other. When you see an empty item, either in the full catalogue or in someone's stack, you can choose to upload to it. You can create an entire stack of wished-for items and wait for others who may have the file to help you to complete it. The search engine also offers more playful orderings like randomization or by reading time.

The search engine can be made using html, python and CGI scripts. The files would be stored in separate directories and JSON files which can be called and created via a web interface. Another option is to use the Semantic MediaWiki platform, which already has built in functions like the automatic creation of red links, categories and tags for archiving, and also supports the maintenance of these files. To research further: how each of these platforms will deal with user accounts / anonymity / interactivity.

We want this library to exist in the space between researching and the act of downloading/uploading. Piracy is necessary for studying – but it is not just about file sharing. It is also about learning what it means to be a librarian, to pass on information and to explore questions of data provenance. In this way it is important that the default interface of X-LIB explores more social modes of reading and searching.

This project continues my research into feminist ways of representing data, of making visible what is included and what is excluded in archive. My research into the social aspects of the digital library is also relevant to the concept of enunciative materiality, which we started to explore last trimester.


Search functionality

Interesting project on the politics of the search engine: http://www.feministsearchtool.nl/

Using Flask-WTForms to create a search which queries the SQL database. Links: https://pythonhosted.org/Flask-Bootstrap/forms.html and https://programfault.com/flask-101-how-to-add-a-search-form/

in forms.py

  • simple string search field
class SearchForm(FlaskForm):
    search = StringField('', validators=[InputRequired()])
</search>


'''in views.py'''
* putting search bar on home page
* routing results.html, setting up redirect and error message

<source lang= python>
@app.route('/', methods=['GET', 'POST'])
def home():
    """Render website's home page."""
    #return render_template('home.html')
    search = SearchForm(request.form)
    if request.method == 'POST':
        return search_results(search)
    return render_template('home.html', form=search)

## search
@app.route('/results', methods= ['GET'])
def search_results(search):
    results = []
    search_string = search.data['search']

    if search_string:
        results=Book.query.filter(Book.title.contains(search_string)).all()

    if not results:
        flash('No results found!')
        return redirect('/')

    else:
        # display results
        return render_template('results.html', books=results)

in results.html

  • template page for showing results, same as show_books.html
{% extends 'base.html' %}

{% block main %}
<div class="container">
  <h1 class="page-header">Search Results</h1>
  {% with messages = get_flashed_messages() %}
    {% if messages %}
      <div class="alert alert-success">
        <ul>
        {% for message in messages %}
          <li>{{ message }}</li>
        {% endfor %}
        </ul>
      </div>
    {% endif %}
  {% endwith %}

  <table style="width:100%">
    <tr>
        <th>Cover</th>
      <th>Title</th>
      <th>Author</th>
      <th>Filetype</th>
      <th>Tag</th>
    </tr>
        {% for book in books %}
    <tr>
      <td><img src="../uploads/cover/{{ book.cover }}" width="80"></td>
      <td><a href="books/{{ book.id }}">{{ book.title }}</a></td>

      <td>  {% for author in book.authors %}

              <li><a href="{{url_for('show_author_by_id', id=author.id)}}">{{ author.author_name }}</a>  </li>

        {% endfor %}</td>
      <td>{{ book.fileformat }}</td>
      <td>{{ book.tag}}</td>
    </tr>
  {% endfor %}
  </table>


</div>
{% endblock %}

Extracting images from PDF

pdfimages extracts more and fragmented images

To make more dynamic 'cover images':

Option 1: using pdfimages -j magnet_reader_3_processual_publishing_actual_gestures.pdf ./pdfimages

Option 2: python script which looks for start bytes and endbytes of jpg files:

python script extracts less images, only recognizes complete jpgs

<source lang=python>

  1. coding=utf-8
  2. Extract jpg's from pdf's. Quick and dirty.

import sys

with open(sys.argv[1], "rb") as file:

   pdf = file.read()

startmark = b"\xff\xd8" startfix = 0 endmark = b"\xff\xd9" endfix = 2 i = 0

njpg = 0 while True:

   istream = pdf.find(b"stream", i)
   if istream < 0:
       break
   istart = pdf.find(startmark, istream, istream + 20)
   if istart < 0:
       i = istream + 20
       continue
   iend = pdf.find(b"endstream", istart)
   if iend < 0:
       raise Exception("Didn't find end of stream!")
   iend = pdf.find(endmark, iend - 20)
   if iend < 0:
       raise Exception("Didn't find end of JPG!")
   istart += startfix
   iend += endfix
   print("JPG %d from %d to %d" % (njpg, istart, iend))
   jpg = pdf[istart:iend]
   with open("jpg%d.jpg" % njpg, "wb") as jpgfile:
       jpgfile.write(jpg)
   njpg += 1
   i = iend

</end>