ZeroPadding

From XPUB & Lens-Based wiki

Zero-padding names

When you work with a sequence of numbered file names, a common problem is that the names get sorted in the wrong order. For instance, given the files named:

1.jpg
2.jpg
3.jpg
4.jpg
5.jpg
6.jpg
7.jpg
8.jpg
9.jpg
10.jpg
11.jpg
12.jpg


Using a simple "ls" command:

ls -1 *.jpg


Would result in:

10.jpg
11.jpg
12.jpg
1.jpg
2.jpg
3.jpg
4.jpg
5.jpg
6.jpg
7.jpg
8.jpg
9.jpg


This is because filenames are always treated as text, not as numbers, and alphabetically "10.jpg" comes before "1.jpg". The solution is to "pad" the name with zeros. This means to make the names all have a certain length (say 4 characters), by adding (or "filling", "padding") the name with extra "0"'s. These extra zero's come before the actual number. For instance:

0001.jpg
0002.jpg
0003.jpg
0004.jpg
0005.jpg
0006.jpg
0007.jpg
0008.jpg
0009.jpg
0010.jpg
0011.jpg
0012.jpg


Zero-padded names will sort the same way alphabetically as numerically.

When generating names automatically (via a script, or using a program such as ImageMagick), a special "formatting code" can be used to automatically zero-pad the names.

convert picture.jpg -crop 30x "%04d.jpg"


The code "%04d" means to "pad the name to 4 places using the 0 character. The % starts the code and the final d means 'decimal, telling the program that we are using whole numbers. You can also give a name like 'tile' (or whatever you want) to appear before the number, you would just say:

convert picture.jpg -crop 30x "tile%04d.jpg"


In Python, the same formatting trick can be used with the "%" string-formatting operator, note that in this case there are two % characters, the one inside the quotations is the string-format "template", and the second % is what tells Python to treat the string (preceding the operator) with the variable (following the operator):

for i in range(100000):
    print "hello %06d" % i