User:Riviera/Crop continued: Difference between revisions
(Adjusted the code to clarify the process of producing PDF output) |
m (adjusted transclusion tags) |
||
Line 3: | Line 3: | ||
I recently wrote a sketch of a command line application written in Python which produces pdf output. I have since enhanced the code, integrating it with ConTeXt output. At the end of this post, I illustrate ways in which the software can be applied to visual ends. | I recently wrote a sketch of a command line application written in Python which produces pdf output. I have since enhanced the code, integrating it with ConTeXt output. At the end of this post, I illustrate ways in which the software can be applied to visual ends. | ||
<onlyinclude> | |||
<syntaxhighlight lang="python">import argparse | <syntaxhighlight lang="python">import argparse | ||
import math | import math | ||
Line 46: | Line 46: | ||
'--orientation', 'landscape', | '--orientation', 'landscape', | ||
'main.tex'])</syntaxhighlight> | 'main.tex'])</syntaxhighlight> | ||
</onlyinclude> | |||
<noinclude> | |||
<span id="wishlist"></span> | <span id="wishlist"></span> | ||
==Wishlist== | ==Wishlist== | ||
</noinclude> | |||
<includeonly> | |||
<span id="wishlist"></span> | |||
====Wishlist==== | |||
</includeonly> | |||
<onlyinclude> | |||
It would be interesting to add a <code>--page-on-page</code> flag which introduces variation in the output. When active, this flag would print the cropped page on the given page size at the given scale and ratio. This is the default behaviour at the moment. Implementing this flag would result in an alternative default behavior where the output is a page already cropped to size. | It would be interesting to add a <code>--page-on-page</code> flag which introduces variation in the output. When active, this flag would print the cropped page on the given page size at the given scale and ratio. This is the default behaviour at the moment. Implementing this flag would result in an alternative default behavior where the output is a page already cropped to size. | ||
</onlyinclude> | |||
<noinclude> | |||
<span id="papersize-dictionary"></span> | <span id="papersize-dictionary"></span> | ||
==Papersize Dictionary== | ==Papersize Dictionary== | ||
</noinclude> | |||
<includeonly> | |||
==== Papersize Dictionary ==== | |||
</includeonly> | |||
<onlyinclude> | |||
I drew up a dictionary of A-series papersizes based on information at [https://papersizes.io papersizes.io]. This way paper dimensions can be referenced by name. | I drew up a dictionary of A-series papersizes based on information at [https://papersizes.io papersizes.io]. This way paper dimensions can be referenced by name. | ||
Line 78: | Line 91: | ||
"A3+": [329, 483] | "A3+": [329, 483] | ||
}</syntaxhighlight> | }</syntaxhighlight> | ||
</onlyinclude> | |||
<noinclude> | |||
<span id="portrait-and-landscape"></span> | <span id="portrait-and-landscape"></span> | ||
==Portrait and Landscape== | ==Portrait and Landscape== | ||
<noinclude> | |||
<includeonly> | |||
==== Portrait and Landscape ==== | |||
</includeonly> | |||
<onlyinclude> | |||
I figured I would implement portrait and landscape orientations into the script. Portrait mode is enabled by default. Passing <code>--orientation landscape</code> to the command switches to landscape output. It might be more concise to have a <code>--landscape</code> flag. | I figured I would implement portrait and landscape orientations into the script. Portrait mode is enabled by default. Passing <code>--orientation landscape</code> to the command switches to landscape output. It might be more concise to have a <code>--landscape</code> flag. | ||
Line 103: | Line 122: | ||
# paper_height = landscape_paper_sizes[args.papersize[0]][1] | # paper_height = landscape_paper_sizes[args.papersize[0]][1] | ||
# print(args.papersize[0], "landscape", paper_width, "mm x", paper_height, "mm")</syntaxhighlight> | # print(args.papersize[0], "landscape", paper_width, "mm x", paper_height, "mm")</syntaxhighlight> | ||
</onlyinclude> | |||
<noinclude> | |||
<span id="ratio"></span> | <span id="ratio"></span> | ||
==Ratio== | ==Ratio== | ||
</noinclude> | |||
<includeonly> | |||
====Ratio==== | |||
</includeonly> | |||
<onlyinclude> | |||
<syntaxhighlight lang="python">ratio = args.ratio[0].split(":") | <syntaxhighlight lang="python">ratio = args.ratio[0].split(":") | ||
ratio_x = int(ratio[0]) | ratio_x = int(ratio[0]) | ||
Line 132: | Line 157: | ||
w += ratio_x | w += ratio_x | ||
h += ratio_y</syntaxhighlight> | h += ratio_y</syntaxhighlight> | ||
</onlyinclude> | |||
<noinclude> | |||
<span id="pandas-numpy-and-sklearn"></span> | <span id="pandas-numpy-and-sklearn"></span> | ||
==Pandas, Numpy and Sklearn== | ==Pandas, Numpy and Sklearn== | ||
<noinclude> | |||
<includeonly> | |||
At the beginning of the script, I imported (parts of) these modules into the python script. This was to enable python to make use of different mathematical functions. In particular, I’m going to use a pandas DataFrame, SciKit Learn’s MinMaxScaler and Numpy’s interp function. The purpose is to provide the user with the ability to scale the size of the cropped page in the output. In short, the values in <code>possible_widths_list</code> and <code>possible_heights_list</code> are adjusted to a percentage scale. That there can be more or less than 100 values in the <code>possible_widths_list</code> and <code>possible_heights_list</code> means that the value of the length of the list needs to represents 100%. To begin with, let’s create a DataFrame and a scaler. The code which appears below was adapted from [https://codefellows.github.io/sea-python-401d5/lectures/rescaling_data.html this website]. | At the beginning of the script, I imported (parts of) these modules into the python script. This was to enable python to make use of different mathematical functions. In particular, I’m going to use a pandas DataFrame, SciKit Learn’s MinMaxScaler and Numpy’s interp function. The purpose is to provide the user with the ability to scale the size of the cropped page in the output. In short, the values in <code>possible_widths_list</code> and <code>possible_heights_list</code> are adjusted to a percentage scale. That there can be more or less than 100 values in the <code>possible_widths_list</code> and <code>possible_heights_list</code> means that the value of the length of the list needs to represents 100%. To begin with, let’s create a DataFrame and a scaler. The code which appears below was adapted from [https://codefellows.github.io/sea-python-401d5/lectures/rescaling_data.html this website]. | ||
<syntaxhighlight lang="python">df = pd.DataFrame({"widths": possible_widths_list, "heights": possible_heights_list}) | <syntaxhighlight lang="python">df = pd.DataFrame({"widths": possible_widths_list, "heights": possible_heights_list}) | ||
scaler = MinMaxScaler()</syntaxhighlight> | scaler = MinMaxScaler()</syntaxhighlight> | ||
</includeonly> | |||
<noinclude> | |||
<span id="visualising-the-dataframe"></span> | <span id="visualising-the-dataframe"></span> | ||
===Visualising the dataframe=== | ===Visualising the dataframe=== | ||
</noinclude> | |||
<includeonly> | |||
=====Visualising the Dataframe===== | |||
</includeonly> | |||
<onlyinclude> | |||
The dataframe resembles a table of widths and heights spanning a range of values. | The dataframe resembles a table of widths and heights spanning a range of values. | ||
Line 159: | Line 193: | ||
[84 rows x 2 columns]</pre> | [84 rows x 2 columns]</pre> | ||
<onlyinclude> | |||
<noinclude> | |||
<span id="adding-scaled-values-to-the-dataframe"></span> | <span id="adding-scaled-values-to-the-dataframe"></span> | ||
===Adding scaled values to the dataframe=== | ===Adding scaled values to the dataframe=== | ||
</noinclude> | |||
<includeonly> | |||
=====Adding scaled values to the dataframe===== | |||
</includeonly> | |||
<onlyinclude> | |||
This code assigns a percentage-based value to each possible width and height. | This code assigns a percentage-based value to each possible width and height. | ||
Line 187: | Line 227: | ||
[84 rows x 4 columns]</pre> | [84 rows x 4 columns]</pre> | ||
</onlyinclude> | |||
<noinclude> | |||
<span id="interpolating-the-values"></span> | <span id="interpolating-the-values"></span> | ||
===Interpolating the values=== | ===Interpolating the values=== | ||
</noinclude> | |||
<includeonly> | |||
=====Interpolating the values===== | |||
</includeonly> | |||
<onlyinclude> | |||
Next, the values are interpolated. To my understanding, this is like cross-referencing the values in one list against the values in another. It’s like creating an array with floating-point indexes. The values in between are interpolated and rounded to the nearest mm. The resulting values are consistently approximate. | Next, the values are interpolated. To my understanding, this is like cross-referencing the values in one list against the values in another. It’s like creating an array with floating-point indexes. The values in between are interpolated and rounded to the nearest mm. The resulting values are consistently approximate. | ||
Line 203: | Line 249: | ||
<syntaxhighlight lang="python">scaled_paper_height = math.floor(np.interp(args.scale[0], scaled_heights, possible_heights_list)) | <syntaxhighlight lang="python">scaled_paper_height = math.floor(np.interp(args.scale[0], scaled_heights, possible_heights_list)) | ||
scaled_paper_width = math.floor(np.interp(args.scale[0], scaled_widths, possible_widths_list))</syntaxhighlight> | scaled_paper_width = math.floor(np.interp(args.scale[0], scaled_widths, possible_widths_list))</syntaxhighlight> | ||
</onlyinclude> | |||
<noinclude> | |||
<span id="writing-to-a-file"></span> | <span id="writing-to-a-file"></span> | ||
==Writing to a file== | ==Writing to a file== | ||
</noinclude> | |||
<includeonly> | |||
=====Writing to a file===== | |||
</includeonly> | |||
<onlyinclude> | |||
The output of the script is code which can be understood by the ConTeXt typesetting software. F-strings containing the values calculated by or provided to the script are used. The variables feature at key points in the ConTeXt code. The file is created. Then, a blank layout is defined and setup. | The output of the script is code which can be understood by the ConTeXt typesetting software. F-strings containing the values calculated by or provided to the script are used. The variables feature at key points in the ConTeXt code. The file is created. Then, a blank layout is defined and setup. | ||
Line 248: | Line 300: | ||
<syntaxhighlight lang="python">f.close()</syntaxhighlight> | <syntaxhighlight lang="python">f.close()</syntaxhighlight> | ||
</onlyinclude> | |||
<noinclude> | |||
<span id="pdf-output"></span> | <span id="pdf-output"></span> | ||
==PDF Output== | ==PDF Output== | ||
</noinclude> | |||
<includeonly> | |||
====PDF Output==== | |||
</includeonly> | |||
<onlyinclude> | |||
ConTeXt can be run on the output file, in this case <code>main.tex</code>, to produce a pdf. | ConTeXt can be run on the output file, in this case <code>main.tex</code>, to produce a pdf. | ||
[[File:Cropped-page-example-01.pdf|alt=PDF depicting the consequence of running ConTeXt on the output of the script.|center|thumb|741x741px|The consequence of running ConTeXt on the output of the script.]] | [[File:Cropped-page-example-01.pdf|alt=PDF depicting the consequence of running ConTeXt on the output of the script.|center|thumb|741x741px|The consequence of running ConTeXt on the output of the script.]] | ||
</onlyinclude> |
Revision as of 20:23, 24 February 2024
More cropping
I recently wrote a sketch of a command line application written in Python which produces pdf output. I have since enhanced the code, integrating it with ConTeXt output. At the end of this post, I illustrate ways in which the software can be applied to visual ends.
import argparse
import math
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
To set up a command line application the argparse module is used. Argparse allows for flags and positional arguments to be given to the script when executed at the command line. The values passed in by the user are stored in variables. Argparse also implements a help flag which offers information about available flags.
parser = argparse.ArgumentParser(description='Crop typesetting areas.')
Arguments are added. outfile
is a positonal argument whereas the remaining arguments are flags. The flags will be looked at in more detail later on.
parser.add_argument('outfile',
metavar='OUTFILE',
nargs=1,
help="Write to a file")
parser.add_argument('--papersize',
metavar='PAPERSIZE',
nargs=1,
default='A4',
help="Provide a standard papersize")
parser.add_argument('--ratio',
metavar='RATIO',
nargs=1,
default='2:3',
help="Crop the paper to this proportion")
parser.add_argument('--orientation',
metavar='ORIENTATION',
nargs=1,
default='portrait',
help="Switch between portrait and landscape.")
parser.add_argument('--scale',
metavar='SCALE',
nargs=1,
default=[90.0],
help="Scale the size of the cropped page.")
_StoreAction(option_strings=['--scale'], dest='scale', nargs=1, const=None, default=[90.0], type=None, choices=None, required=False, help='Scale the size of the cropped page.', metavar='SCALE')
For the sake of example, let’s pass the following arguments to the script.
args = parser.parse_args(args=['--scale', '90',
'--ratio', '5:3',
'--papersize', 'A3',
'--orientation', 'landscape',
'main.tex'])
Wishlist
It would be interesting to add a --page-on-page
flag which introduces variation in the output. When active, this flag would print the cropped page on the given page size at the given scale and ratio. This is the default behaviour at the moment. Implementing this flag would result in an alternative default behavior where the output is a page already cropped to size.
Papersize Dictionary
I drew up a dictionary of A-series papersizes based on information at papersizes.io. This way paper dimensions can be referenced by name.
portrait_paper_sizes = {
# size width height (mm)
"A0" : [841, 1189],
"A1" : [594, 841],
"A2" : [420, 594],
"A3" : [297, 420],
"A4" : [210, 297],
"A5" : [148, 210],
"A6" : [105, 148],
"A7" : [74, 105],
"A8" : [52, 74],
"A9" : [37, 52],
"A10": [26, 37],
"A11": [18, 26],
"A12": [13, 18],
"A13": [9, 13],
"2A0": [1189, 1682],
"4A0": [1682, 2378],
"A0+": [914, 1292],
"A1+": [609, 914],
"A3+": [329, 483]
}
Portrait and Landscape
I figured I would implement portrait and landscape orientations into the script. Portrait mode is enabled by default. Passing --orientation landscape
to the command switches to landscape output. It might be more concise to have a --landscape
flag.
if "portrait" in args.orientation:
paper_width = portrait_paper_sizes[args.papersize[0]][0]
paper_height = portrait_paper_sizes[args.papersize[0]][1]
print(args.papersize[0], "portrait", paper_width, "mm x", paper_height, "mm")
I have not accounted for a situation in which someone provides a papersize which is not listed in the dictionary. I expect that at the moment, the script will throw an error if this happens. In any case, it’s necessary to exchange the values of the width and height in landscape mode. This can be done in an least two ways. I decided to change the indexes like so.
if "landscape" in args.orientation:
paper_width = portrait_paper_sizes[args.papersize[0]][1]
paper_height = portrait_paper_sizes[args.papersize[0]][0]
print(args.papersize[0], "landscape", paper_width, "mm x", paper_height, "mm")
A3 landscape 420 mm x 297 mm
Bu it is also possible to switch the values of paper_width
and paper_height
by creating a new dictionary of landscape paper sizes. The code commented out below does that.
# if "landscape" in args.orientation:
# landscape_paper_sizes = {}
# for size in portrait_paper_sizes:
# landscape_paper_sizes[size] = portrait_paper_sizes[size][::-1]
# paper_width = landscape_paper_sizes[args.papersize[0]][0]
# paper_height = landscape_paper_sizes[args.papersize[0]][1]
# print(args.papersize[0], "landscape", paper_width, "mm x", paper_height, "mm")
Ratio
ratio = args.ratio[0].split(":")
ratio_x = int(ratio[0])
ratio_y = int(ratio[1])
print(f"Crop ratio: {ratio_x}:{ratio_y}")
Crop ratio: 5:3
The ratio is provided to the script with the --ratio
flag. By default the ratio is 2:3. Some calculations need to be done so let’s initialise some variables.
possible_widths_list = []
possible_heights_list = []
w = ratio_x
h = ratio_y
In order to ascertain the size of the cropped page, I’m calculating a list of measurements. These measurements indicate towards the 2D area of the cropped page. The values are later used in the context of the scale feature. The following calculation checks the ratio against the dimensions of the page. A for
loop is used to provide a limit to the length of the list which contains the measurements described above.
if (math.floor(paper_width / ratio_y)) > (math.floor(paper_height / ratio_x)):
# If the paper is landscape
for dimension in range(math.floor(paper_width / ratio_x)):
possible_widths_list += [w]
possible_heights_list += [h]
w += ratio_x
h += ratio_y
else:
for dimension in range(math.floor(paper_height / ratio_y)):
possible_widths_list += [w]
possible_heights_list += [h]
w += ratio_x
h += ratio_y
Pandas, Numpy and Sklearn
Visualising the dataframe
The dataframe resembles a table of widths and heights spanning a range of values.
print(df)
widths heights 0 5 3 1 10 6 2 15 9 3 20 12 4 25 15 .. ... ... 79 400 240 80 405 243 81 410 246 82 415 249 83 420 252 [84 rows x 2 columns]
Adding scaled values to the dataframe
This code assigns a percentage-based value to each possible width and height.
tmp_widths = df.widths - df.widths.min()
tmp_heights = df.heights - df.heights.min()
scaled_widths = tmp_widths / tmp_widths.max() * 100
scaled_heights = tmp_heights / tmp_heights.max() * 100
df["scaled_widths"] = scaled_widths
df["scaled_heights"] = scaled_heights
print(df)
widths heights scaled_widths scaled_heights 0 5 3 0.000000 0.000000 1 10 6 1.204819 1.204819 2 15 9 2.409639 2.409639 3 20 12 3.614458 3.614458 4 25 15 4.819277 4.819277 .. ... ... ... ... 79 400 240 95.180723 95.180723 80 405 243 96.385542 96.385542 81 410 246 97.590361 97.590361 82 415 249 98.795181 98.795181 83 420 252 100.000000 100.000000 [84 rows x 4 columns]
Interpolating the values
Next, the values are interpolated. To my understanding, this is like cross-referencing the values in one list against the values in another. It’s like creating an array with floating-point indexes. The values in between are interpolated and rounded to the nearest mm. The resulting values are consistently approximate.
scaled_paper_height = math.floor(np.interp(95.2, scaled_heights, possible_heights_list))
scaled_paper_width = math.floor(np.interp(95.2, scaled_widths, possible_widths_list))
print(scaled_paper_width)
print(scaled_paper_height)
400 240
Notice that the printed values correspond to the scaled values in the DataFrame. It’s best if the user can determine the scale to crop the paper to. So, the first argument to np.interp
is replaced with args.scale[0]
.
scaled_paper_height = math.floor(np.interp(args.scale[0], scaled_heights, possible_heights_list))
scaled_paper_width = math.floor(np.interp(args.scale[0], scaled_widths, possible_widths_list))
Writing to a file
The output of the script is code which can be understood by the ConTeXt typesetting software. F-strings containing the values calculated by or provided to the script are used. The variables feature at key points in the ConTeXt code. The file is created. Then, a blank layout is defined and setup.
f = open(args.outfile[0], "w")
f.write("""\\definelayout[blank][
topspace=0mm,
backspace=0mm,
bottomspace=0mm,
width=fit,
height=fit,
header=0mm,
footer=0mm,
leftmargin=0mm,
rightmargin=0mm,
leftmargindistance=0mm,
rightmargindistance=0mm]
\\setuplayout[blank]""")
Then, having turned off page numbering, the f-string containing the values of scaled_paper_width
and scaled_paper_height
are passed to \definepapersize
.
f.write(f"""\\definepapersize[scaled][width={scaled_paper_width}mm, height={scaled_paper_height}mm]
\\setuppapersize[scaled]""")
The code takes landscape mode into account using an if statement
if "portrait" in args.orientation:
f.write(f"[{args.papersize[0]}]")
else:
f.write(f"[{args.papersize[0]}, landscape]")
Finally, the layout is setup, the frame is switched on and the text environment is invoked. Inside the text environment, a frame which fills the typesetting area is included to ensure there is content in the document.
f.write("""\\setuplayout[location=""" "{middle,middle}" """,marking=empty]
\\showframe
\\starttext
\\startframedtext[width=\\textwidth,height=\\textheight]
\\stopframedtext
\\stoptext
""")
f.close()
PDF Output
ConTeXt can be run on the output file, in this case main.tex
, to produce a pdf.