User:Alice/IFL

From XPUB & Lens-Based wiki
< User:Alice
Revision as of 21:46, 14 May 2018 by Alice (talk | contribs)

My contribution to Xpub Library

Research questions

  • How can we represent the books in the collection in a different way - using the idea of stacks as mixtapes, depending on study path and reading time
  • What kind of interface would be best suited for a library that serves our community

Stacks

A stack is a number of books that are read at a certain point in time, alternating between them. They usually have a topic in common, or follow a certain study path that can bring you to a point of knowledge. Rather than a bookshelf, where books are lined up and often forgotten, the stack on your table/nightstand/toilet consists of books prone to be opened and reopened at any time.

Reading time

For this first challenge, I looked into representing a text through the amount of time it takes to read (similar to Medium). I adapted this script (license) using BeautifulSoup to extract the text from an html file and print out the estimated reading time in minutes and hours.

import bs4
import urllib.request, re
from math import ceil

def extract_text(url):
    html = urllib.request.urlopen(url).read()
    soup = bs4.BeautifulSoup(html, 'html.parser')
    texts = soup.findAll(text=True)
    return texts

def is_visible(element):
    if element.parent.name in ['style', 'script', '[document]', 'head', 'title']:
        return False
    elif isinstance(element, bs4.element.Comment):
        return False
    elif element.string == "\n":
        return False
    return True

def filter_visible_text(page_texts):
    return filter(is_visible, page_texts)

WPM = 180
WORD_LENGTH = 5

def count_words_in_text(text_list, word_length):
    total_words = 0
    for current_text in text_list:
        total_words += len(current_text)/word_length
    return total_words


def estimate_reading_time(url):
    texts = extract_text(url)
    filtered_text = filter_visible_text(texts)
    total_words = count_words_in_text(filtered_text, WORD_LENGTH)
    minutes = ceil(total_words/WPM)
    hours = minutes/60
    return [minutes, hours]
html_file = 'file:///home/alice/Documents/Reader%20final/txt/where_is.html'

values = estimate_reading_time(html_file)

print('It will take you', values[0], ' minutes or', values[1], 'hours to read this text')

Interface

I want to research the prospect of having only a command line interface through which anyone can search the library, shared online using gotty.

Things I'm currently playing with:

Wishlist