Regular Expressions

From XPUB & Lens-Based wiki
(Redirected from RegularExpressions)

Regular Expressions are descriptions of patterns that can be used to do sophisticated text processing. For instance, you might want to find all lines that begin with a '#' character, or all five-letter words beginning with a capital P. Regular expression is the 're' in the well-known UNIX command-line tool grep, and is available in many text editors' search and replace functions. You can use regular expressions in most programming languages (Python included). Regular Expressions belong to the class of DeclarativeLanguages.


In class, we looked at an example of a regular expression in a Python script to process the timecodes of a subtitles file:

import re, sys

timecode_pattern = r'(\d\d):(\d\d):(\d\d),(\d\d\d)'

from datetime import *

ADJUST_SECONDS = 4.25
ADJUST_SECS = int(ADJUST_SECONDS)
ADJUST_MS = int((ADJUST_SECONDS - ADJUST_SECS) * 1000)

def adjustTime (match):
	nums = [int(x) for x in match.groups()]
	(hours, mins, secs, msecs) = nums
	time = datetime(1984, 1, 1, hour = hours, minute = mins, second = secs, microsecond = msecs*1000)
	time += timedelta(seconds = ADJUST_SECS, milliseconds = ADJUST_MS)
	return "%02d:%02d:%02d,%03d" % (time.hour, time.minute, time.second, int(time.microsecond / 1000))

for line in sys.stdin:
	line = re.sub(timecode_pattern, adjustTime, line)
	
	sys.stdout.write(line)

Resources

Tutorials and Howtos

Regular Expressions in Python

Reference Material

regular expressions are one of the most esoteric aspect of programming! (AndreaFiore) look at this for example...

r'([01])(?:&\1|(?:(?<=0)&1)|(?:(?<=1&0)(?:[^&01]|\Z))|(?:(?<=0)\|0)| (?:(?<=0\|1)(?:[^|01]|\Z))|(?:(?<=1)\|0)|(?:(?<=1)\|1))' 

Having good reference material under your eyes can make your life easier... so consider printing a copy of this

http://www.greenend.org.uk/rjk/2002/06/regexp.html#grouping

Links of examples and usage

xkcd_regular_expressions.png