Regular Expressions: Difference between revisions
m (RegularExpressions moved to Regular Expressions: new style names) |
|||
Line 45: | Line 45: | ||
http://www.greenend.org.uk/rjk/2002/06/regexp.html#grouping | http://www.greenend.org.uk/rjk/2002/06/regexp.html#grouping | ||
=== Links of | === Links of examples and usage === | ||
*[http://www.regular-expressions.info/] | *[http://www.regular-expressions.info/] | ||
*[http://sitescooper.org/tao_regexps.html A Tao of Regular Expressions] | *[http://sitescooper.org/tao_regexps.html A Tao of Regular Expressions] |
Revision as of 16:47, 11 October 2010
Regular Expressions are descriptions of patterns that can be used to do sophisticated text processing. For instance, you might want to find all lines that begin with a '#' character, or all five-letter words beginning with a capital P. Regular expression is the 're' in the well-known UNIX command-line tool grep, and is available in many text editors' search and replace functions. You can use regular expressions in most programming languages (Python included). Regular Expressions belong to the class of DeclarativeLanguages.
In class, we looked at an example of a regular expression in a Python script to process the timecodes of a subtitles file:
import re, sys
timecode_pattern = r'(\d\d):(\d\d):(\d\d),(\d\d\d)'
from datetime import *
ADJUST_SECONDS = 4.25
ADJUST_SECS = int(ADJUST_SECONDS)
ADJUST_MS = int((ADJUST_SECONDS - ADJUST_SECS) * 1000)
def adjustTime (match):
nums = [int(x) for x in match.groups()]
(hours, mins, secs, msecs) = nums
time = datetime(1984, 1, 1, hour = hours, minute = mins, second = secs, microsecond = msecs*1000)
time += timedelta(seconds = ADJUST_SECS, milliseconds = ADJUST_MS)
return "%02d:%02d:%02d,%03d" % (time.hour, time.minute, time.second, int(time.microsecond / 1000))
for line in sys.stdin:
line = re.sub(timecode_pattern, adjustTime, line)
sys.stdout.write(line)
Resources
Tutorials and Howtos
Reference Material
regular expressions are one of the most esoteric aspect of programming! (AndreaFiore) look at this for example...
r'([01])(?:&\1|(?:(?<=0)&1)|(?:(?<=1&0)(?:[^&01]|\Z))|(?:(?<=0)\|0)| (?:(?<=0\|1)(?:[^|01]|\Z))|(?:(?<=1)\|0)|(?:(?<=1)\|1))'
Having good reference material under your eyes can make your life easier... so consider printing a copy of this Regular Expression Quick reference Cards
http://www.greenend.org.uk/rjk/2002/06/regexp.html#grouping