User:Alice/Code Exercises

Oulipo exercise

Improved the N + 7 code we wrote in Prototyping, by debugging and adding some tests. Work in progress.

Improvements

Bug example: the for loop was skipping capitalized words, because it could not find them in the input file with nouns.

To fix, I added the lower method for strings to turn all text to lowercase.

 for word in separated:
        word = word.lower() + '\n'

To test it, I added a test.

def test_seven():
    assert seven('Baboons') == 'babushkas'

Full script (in progress)

def seven(sentence):
    fpath = open('91K nouns.txt')
    nouns = fpath.readlines()
    separated = sentence.split()    
    #print(separated)
    new_separated = []
    for word in separated:
        word = word.lower() + '\n'
        if word in nouns:
            position = nouns.index(word)
            new_word = nouns[position + 7]
            #print(" replacing", new_position)
            new_separated.append(new_word.strip())
        else:
            #print("notinlist")
            #print("adding to new_separated ", word)
            new_separated.append(word.strip())
    #print(new_separated)
    return ' '.join(new_separated)

#sentence = input('What is your sentence? ')
#seven(sentence)

# pytest requires that you name your tests with test_<your-name>
# run with the 'pytest' command in your terminal
def test_seven():
    assert seven('Baboons') == 'babushkas'
    assert seven('Baboons,') == 'babushkas'

Tesseract exercise

Initial test to train tesseract to recognise an image as a character/word

First, using imagemagick, convert the jpg file into a tiff file, for better OCR results

convert -density 300 flower3.jpg -depth 8 -strip -background white -alpha off flower3.tiff

Using tesseract page segmentation -8 and -10, I tested it to see what kind of text output I would get when it considers the image as single character or as a word.

tesseract flower.tiff  -psm 8 output
tesseract flower.tiff  -psm 10 output2

results were

a

<23

I created a boxfile for the best result (with psm -10)

tesseract flower4.tiff -psm 10 flower4 makebox

I then opened the image/boxfile combination with moshpytt, and edited the content of the box, in order to recognise it as the word 'flower'.

python moshpytt.py

Python exercise inspired by the work of Carl Andre

I wrote a script that would turn a list of words of different lengths into a pattern similar to the one Carl Andre typed by hand with a typewriter. So far, all the tests pass. When receiving as input a long list of words, it raises a ValueError, which means it still needs debugging...

import pytest
from math import ceil


def grabber(words, numgrab):
    grabbedwords = []
    for number in range(numgrab):
        grabbedwords.append(words.pop(0))
    return (grabbedwords, words)


def pattern(words, maxlength):
    goodwords = []
    for word in words:right_one = ['a', 'aa', 'aaa', 'aa', 'a', 'b', 'bb', 'bbb', 'bb', 'b']
        if len(word) <= maxlength:
            goodwords.append(word)

    items_pattern = maxlength + (maxlength -1)
    if len(goodwords) % items_pattern != 0:
        raise ValueError

    times = int(len(words) / items_pattern)
    final_pattern = []
    for each_time in range(times):
        grabbed, whatisleft = grabber(goodwords, items_pattern)
        goodwords = whatisleft
        middle = ceil(len(grabbed)/2)
        sorted_pattern = (
            sorted(grabbed[:middle]) +
            sorted(grabbed[middle:], reverse=True)
        )
        final_pattern.append(sorted_pattern)

    return final_pattern

def test_pattern_returns_list():
    assert type(pattern(['a', 'b', 'c', 'd', 'e'], 3)) == type([])

def test_pattern_removes_over_max_len():
    right_one = [['a', 'aa', 'aaa', 'aa', 'a']]
    assert pattern(right_one[0] + ['aaaaa'], 3) == right_one

def test_pattern_too_short_wont_work():
    with pytest.raises(ValueError):
        pattern(['a', 'aa'], 3)

def test_grabber():
    assert grabber(['a', 'aaa'], 1) == (['a'], ['aaa'])

def test_two_patterns():
    right_one = ['a', 'aa', 'aaa', 'aa', 'a', 'b', 'bb', 'bbb', 'bb', 'b']
    result = [['a', 'aa', 'aaa', 'aa', 'a'], ['b', 'bb', 'bbb', 'bb', 'b']]
    assert pattern(right_one, 3) == result