User:Alice/IFL: Difference between revisions
(Created page with "= My contribution to Xpub Library = == Research questions == * How can we represent the books in the collection in a different way - using the idea of stacks as mixtapes, depe...") |
No edit summary |
||
Line 2: | Line 2: | ||
== Research questions == | == Research questions == | ||
* How can we represent the books in the collection in a different way - using the idea of stacks as mixtapes, depending on study path and reading time | * How can we represent the books in the collection in a different way - using the idea of stacks as mixtapes, depending on study path and reading time | ||
* What kind of interface would be best suited for a library that serves our community | |||
=== Reading time === | === Reading time === | ||
For this first challenge, I looked into representing a text through the amount of time it takes to read (similar to Medium). I adapted [https://github.com/assafelovic/reading_time_estimator this] script ([https://github.com/assafelovic/reading_time_estimator/blob/master/LICENSE license]) using BeautifulSoup to extract the text from an html file and print out the estimated reading time in minutes and hours. | For this first challenge, I looked into representing a text through the amount of time it takes to read (similar to Medium). I adapted [https://github.com/assafelovic/reading_time_estimator this] script ([https://github.com/assafelovic/reading_time_estimator/blob/master/LICENSE license]) using BeautifulSoup to extract the text from an html file and print out the estimated reading time in minutes and hours. | ||
Line 53: | Line 56: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
=== Interface === | |||
I want to research the prospect of having only a command line interface through which anyone can search the library. | |||
Things I'm currently playing with: | |||
* [https://github.com/yudai/gotty Gotty] | |||
* [https://github.com/tmux/tmux/wiki Tmux] | |||
* [http://urwid.org/tutorial/index.html Urwid] |
Revision as of 22:18, 13 May 2018
My contribution to Xpub Library
Research questions
- How can we represent the books in the collection in a different way - using the idea of stacks as mixtapes, depending on study path and reading time
- What kind of interface would be best suited for a library that serves our community
Reading time
For this first challenge, I looked into representing a text through the amount of time it takes to read (similar to Medium). I adapted this script (license) using BeautifulSoup to extract the text from an html file and print out the estimated reading time in minutes and hours.
import bs4
import urllib.request, re
from math import ceil
def extract_text(url):
html = urllib.request.urlopen(url).read()
soup = bs4.BeautifulSoup(html, 'html.parser')
texts = soup.findAll(text=True)
return texts
def is_visible(element):
if element.parent.name in ['style', 'script', '[document]', 'head', 'title']:
return False
elif isinstance(element, bs4.element.Comment):
return False
elif element.string == "\n":
return False
return True
def filter_visible_text(page_texts):
return filter(is_visible, page_texts)
WPM = 180
WORD_LENGTH = 5
def count_words_in_text(text_list, word_length):
total_words = 0
for current_text in text_list:
total_words += len(current_text)/word_length
return total_words
def estimate_reading_time(url):
texts = extract_text(url)
filtered_text = filter_visible_text(texts)
total_words = count_words_in_text(filtered_text, WORD_LENGTH)
minutes = ceil(total_words/WPM)
hours = minutes/60
return [minutes, hours]
html_file = 'file:///home/alice/Documents/Reader%20final/txt/where_is.html'
values = estimate_reading_time(html_file)
print('It will take you', values[0], ' minutes or', values[1], 'hours to read this text')
Interface
I want to research the prospect of having only a command line interface through which anyone can search the library.
Things I'm currently playing with: