User:Lbattich/Replace ME (python text replace)

From XPUB & Lens-Based wiki
< User:Lbattich
Revision as of 16:21, 22 April 2015 by Lbattich (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Scripts for replacing famous names in a text with my name, for instance replacing the names in wikipedia entry pages.

guide in steps

step 1

compile a list of names into a plain txt file with a layout like this:

Vito Acconci
Bas Jan Ader
Vikky Alexander
Roy Ascott
Marina Abramović
Billy Apple
Shusaku Arakawa
Christopher D'Arcangelo
Michael Asher
Mireille Astore

I took listings for wikipedia lists like this one and this one.

Important! Make sure the list looks neat, and there are no weird characters, like parenthesis: ().

step 2

run this python script in terminal:

cat list.txt | python break.py > sublist.txt

where list.txt is your source list file that looks like the one in step one

and where break.py has this script:

import re,sys

text = sys.stdin.readlines()

for line in text:
	line = re.sub(r" ", r"\tLucas_", line)
	line = line.strip()
	line = re.sub(r"$", r"\tBattich\n", line)
	line = re.sub(r"_", r"\n", line)

	sys.stdout.write(line)

Your end product list file, sublist.txt, now looks like this (notice that in the case of names with second names, like Jan Bas Ader, second names are also made into Lucas)

Vito	Lucas
Acconci	Battich
Bas	Lucas	
Jan	Lucas
Ader	Battich
Vikky	Lucas
Alexander	Battich
Roy	Lucas
Ascott	Battich
Marina	Lucas
Abramovi	Battich

step 3

run this:

cat original-wiki.html | python replace.py > new-wiki.html

where original-wiki.html is your source text, in this case an html file being an exact replica of a wikipedia entry page.

and new-wiki.html is your end product

and replace.py looks like this:

import re, sys

text = sys.stdin.readlines()
subtext = open("sublist.txt").readlines()

for line in text:
	for subline in subtext:
		subline = subline.strip()
		subline = re.split(r"\t", subline)
		search = subline[0]
		search = r"\b{0}\b".format(search)
		replace = subline[1]
		line = line.strip()
		line = re.sub(search, replace, line)
		
	sys.stdout.write(line)

Note! that the python script REQUIRES that you use a file called sublist.txt containing the subtitution list. cannot have any other name.

what happened

I took this and it became THIS

I also took this and it became THIS

My original list of names is not so comprehensive – doesn't include names of theorists, writers and pre-modern artists, etc, basically of anyone who didn't appear in the small lists I used – so the result is not entirely polished.