User:Alice/XPPL

From XPUB & Lens-Based wiki


Stacks

Theoretical approach

The concept of stacks relies on a combination of two meta-theories in information science– collectivism (social constructivism) and constructionism. In broad terms, this means that the emphasis within the concept of the stack is on discourse as knowledge production, social understandings of knowledge and the knowledge as a product of thought-collectives.

When building a stack, one can think of a particular project she was involved in. In the process of working on that project, references built up from many different sources, some contradictory, some irrelevant at first glance, but ultimately useful in creating a broad view around the topic of research. The community the researcher is part of has a major influence in building the reference points that are being accessed within the research. All information that circulates around this particular project is reconstructed in the idea of a stack of books kept beside the bed, one that is consulted often, with books browsed at random when needed.

As Marcia Bates references in her paper on digital libraries, ‘The user’s experience is phenomenologically different from the indexer’s experience. The user ’s task is to describe something that, by definition, he or she does not know (Belkin, 1982)’. What is more, as Harter (1992) points out, discovering that a particular article is relevant to one’s concerns does not necessarily mean that the article is ‘about’ the same subject as one’s originating interest. Generally, catalogers do not index on the basis of these infinitely many possible anticipated needs. T rying to combine both of these experiences made me realize why elements such as stack description, other books in the stack, links between stacks and annotations are so necessary. They are there to guide the reader to the desired topic without necessarily knowing exactly what she is looking for.

Implementation

Conected research

Git branches

When you create a branch within a project in git, you diverge from the main line of development and continue working on a a separate line without affecting the main line. A branch is a movable pointer to one of the snapshots of your project (a commit).


A repository with two branches

Setting up

To create a new branch, you first have to go to the folder in which you cloned the git repository.

  • Check your current status and branch:
git status
git branch -v
  • To create a new branch:
git branch name_of_branch

or

git checkout -b name_of_branch

which creates a new branch and moves you inside it

otherwise, to go inside your new branch

git checkout name_of_branch

Now you can start working on your new branch and commiting to it.

Merging the branches

First, move to the main branch and get the latest version, in case other people have pushed their commits to it in the meantime.


checkout master
to see the difference between your version of the master branch and the current version:
git status
git diff 

now pull, and resolve conflicts if there are any:

pull origin master

In order to merge both branches, you can use the rebase command to put your latest commits on top of all other commits in the master branch. Then, fix any conflicts that might occur in your text editor, add the changed files, commit, merge and push.

git checkout name_of_branch
git rebase master

fix conflicts

git add name_of_changed_files

git checkout master

git merge name_of_branch

git push origin master


Research questions: If computers made book synopses, or were in charge of categorizing books using the Dewey Decimal Classification, how would they weigh up against humans in making these decisions?

Markov Chains

Markov chains are used to create statistical models of the sequences of letters in a text. Markov chains are now widely used in speech recognition, handwriting recognition, information retrieval, data compression, and spam filtering.

From A Mathematical Theory of Communication by C.E. Shannon

Using a fork of a Markov text generator script (listed below) I generated abstracts of various texts and analyzed the results.

Example 1

Original text: When Computers Were Women, by Jennifer Light

Abstract:

ved credit for innovation or invention. The story presents an apparent paradox. It suggests that women from the history while the way women were breaking into traditionally male occupations during World War II. These two talented engineering.

A closer look at this literature explicates the paradox by revealing wide spread ambivalence about women from the history of ENIAC's female computations during this stage of computers," performing ballistics computations as subprofessional. While celebrating women's work. While celebrated than their own.

The omission of labor. The story of ENIAC's "invention" with special focus on the female computer science perpetuates misconceptions of them were selected to programmer, perceived credit for innovation or invention.

The story of computer, ENIAC, to automate ballistics computers supports Ruth Milkman's thesis of an "idiom of sex-typing" during this stage of computations during the difficulty of these women with mathematical training were ass


Notes

  • This text has been OCRd from a scan, so the text quality is not perfect, hence the occasional fragments of words.

Example 2

Original text: A news article on AlJazeera

Abstract:

The government's "Yes" camp. The call for an early polls.

The government has asked parliamentary and presidential polls had previously been slated for snap election. Earlier this year, his ruling Justice and Development Party (AK Party), reached an elections would begin. His comments came after meeting Devlet Bahceli's MHP. State of emergency that will increase the powers to appoint vice president new powers to appoint vice presidential election comes as nationalist Movement led by US-based cleric Fethullah Gulen, whom Turkey's continued struggle against Kurdish fighters as "terror threats from a northern enclave.

Erdogan said.

Ankara has labeled the prospect of early polls.

The constitutional changed in an April 2017 referendum that will increase the powers to appoint vice presidential system needs to be confirmed by the elections to be re-elected in the vote give the next president. The snap election commission, he said, but preparations would begin.

Resources


Working with databases

Creating a search engine based on tf-idf

Pmgoogle3.png

My contribution to Xpub Library

Research questions

  • How can we represent the books in the collection in a different way - using the idea of stacks as mixtapes, depending on study path and reading time
  • What kind of interface would be best suited for a library that serves our community

Knowledge organization

Examples Stacks A stack is a number of books that are read at a certain point in time, alternating between them. They usually have a topic in common, or follow a certain study path that can bring you to a point of knowledge. Rather than a bookshelf, where books are lined up and often forgotten, the stack on your table/nightstand/toilet consists of books prone to be opened and reopened at any time.

Reading time

For this first challenge, I looked into representing a text through the amount of time it takes to read (similar to Medium). I adapted this script (license) using BeautifulSoup to extract the text from an html file and print out the estimated reading time in minutes and hours.

import bs4
import urllib.request, re
from math import ceil

def extract_text(url):
    html = urllib.request.urlopen(url).read()
    soup = bs4.BeautifulSoup(html, 'html.parser')
    texts = soup.findAll(text=True)
    return texts

def is_visible(element):
    if element.parent.name in ['style', 'script', '[document]', 'head', 'title']:
        return False
    elif isinstance(element, bs4.element.Comment):
        return False
    elif element.string == "\n":
        return False
    return True

def filter_visible_text(page_texts):
    return filter(is_visible, page_texts)

WPM = 180
WORD_LENGTH = 5

def count_words_in_text(text_list, word_length):
    total_words = 0
    for current_text in text_list:
        total_words += len(current_text)/word_length
    return total_words


def estimate_reading_time(url):
    texts = extract_text(url)
    filtered_text = filter_visible_text(texts)
    total_words = count_words_in_text(filtered_text, WORD_LENGTH)
    minutes = ceil(total_words/WPM)
    hours = minutes/60
    return [minutes, hours]
html_file = 'file:///home/alice/Documents/Reader%20final/txt/where_is.html'

values = estimate_reading_time(html_file)

print('It will take you', values[0], ' minutes or', values[1], 'hours to read this text')

Interface

I want to research the prospect of having only a command line interface through which anyone can search the library, shared online using gotty.

Things I'm currently playing with:

Syncthing testing

Syncthing

Session with Tash, Andre & Alice: 28.05.2018
How to configure and install syncthing on the raspberry pi, and two of our own machines?
Syncthing can be used to sync book files and catalog files between different instances of our library (e.g. syncing catalog between server and Pi's, syncing book files between Pi's) Files are not stored in the cloud and it allows for decentralized, read-write architecture (different from rsync which uses a master-slave relationship)

Running Syncthing
At first start Syncthing will generate a configuration file, some keys and then start the admin GUI in your browser. The GUI remains available on https://localhost:8384/. For Syncthing to be able to synchronize files with another device, it must be told about that device. This is accomplished by exchanging “device IDs”. A device ID is a unique, cryptographically-secure identifier that is generated as part of the key generation the first time you start Syncthing. It is printed in the log above, and you can see it in the web GUI by selecting the “gear menu” (top right) and “Show ID”. Two devices will only connect and talk to each other if they are both configured with each other’s device ID. Since the configuration must be mutual for a connection to happen, device IDs don’t need to be kept secret. They are essentially part of the public key. To get your two devices to talk to each other click “Add Device” at the bottom right on both, and enter the device ID of the other side. You should also select the folder(s) that you want to share. The device name is optional and purely cosmetic. It can be changed later if required.

Configuration
Syncthing config.xml file, which can be edited via terminal or through the web GUI interface. Each element describes one folder. The following attributes may be set on the folder element:

id - The folder ID, must be unique. (mandatory)labelThe label of a folder is a human readable and descriptive local name. May be different on each device, empty, and/or identical to other folder labels. (optional)

path - The path to the directory where the folder is stored on this device; not sent to other devices. (mandatory)

type - Controls how the folder is handled by Syncthing. Possible values are:

readwrite - The folder is in default mode. Sending local and accepting remote changes.readonlyThe folder is in “send-only” mode – it will not be modified by Syncthing on this device.

rescanIntervalS - The rescan interval, in seconds. Can be set to zero to disable when external plugins are used to trigger rescans.

Because the pi can't access the browser GUI, you can change the config file to add the GUI port address from 127... to 0000 served on Apache web server. Then you can look at the GUI remotely in your browser. Alternatively, you can add device keys via terminal in the config file. Question: Can we have rw permissions on the main pi, and read only permissions on all others? - probs

Sharksd.jpg

Troubleshooting

Kernel Panic
Don't use the shark SD card! Aymeric bought them for super cheap and they will corrupt the f up. Kernel panic means you have to try and reboot the Pi in recovery mode. Or... abort.

Merging & file conflicts
Editing CSV files in different nodes at the same time will result in conflicts. How to make a fault tolerant, decentralized file system which will allow up-to-date uploads, edits and deletions between different nodes? Important for us: How to keep catalog and files separate so that only catalog is visible to public? AND How to make sure file and catalog are synced in a way that is distributed?


Wishlist

To continue

A couple of points I would like to work on further:

  • helping to maintain this library and make the database more stable
  • add new features to stacks - group stacks under the name of the author, consider ways to visualize relationships between stacks
  • allow bulk downloading for books in stacks
  • adding collections vs adding individual books - thinking in stacks prior to the upload