User:Riviera/Crop continued: Difference between revisions

From XPUB & Lens-Based wiki
(Adjusted the code to clarify the process of producing PDF output)
 
m (fixed transclusion headline levels)
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:


<noinclude>
=More cropping=
=More cropping=


I recently wrote a sketch of a command line application written in Python which produces pdf output. I have since enhanced the code, integrating it with ConTeXt output. At the end of this post, I illustrate ways in which the software can be applied to visual ends.
I recently wrote a sketch of a command line application written in Python which produces pdf output. I have since enhanced the code, integrating it with ConTeXt output. At the end of this post, I illustrate ways in which the software can be applied to visual ends.
 
</noinclude>
<syntaxhighlight lang="python">import argparse
<syntaxhighlight lang="python">import argparse
import math
import math
Line 46: Line 47:
                               '--orientation', 'landscape',
                               '--orientation', 'landscape',
                               'main.tex'])</syntaxhighlight>
                               'main.tex'])</syntaxhighlight>
<noinclude>
<span id="wishlist"></span>
<span id="wishlist"></span>
==Wishlist==
==Wishlist==
 
</noinclude>
<includeonly>
<span id="wishlist"></span>
====Wishlist====
</includeonly>
It would be interesting to add a <code>--page-on-page</code> flag which introduces variation in the output. When active, this flag would print the cropped page on the given page size at the given scale and ratio. This is the default behaviour at the moment. Implementing this flag would result in an alternative default behavior where the output is a page already cropped to size.
It would be interesting to add a <code>--page-on-page</code> flag which introduces variation in the output. When active, this flag would print the cropped page on the given page size at the given scale and ratio. This is the default behaviour at the moment. Implementing this flag would result in an alternative default behavior where the output is a page already cropped to size.
 
<noinclude>
<span id="papersize-dictionary"></span>
<span id="papersize-dictionary"></span>
==Papersize Dictionary==
==Papersize Dictionary==
 
</noinclude>
<includeonly>
==== Papersize Dictionary ====
</includeonly>
I drew up a dictionary of A-series papersizes based on information at [https://papersizes.io papersizes.io]. This way paper dimensions can be referenced by name.
I drew up a dictionary of A-series papersizes based on information at [https://papersizes.io papersizes.io]. This way paper dimensions can be referenced by name.


Line 78: Line 87:
     "A3+": [329, 483]
     "A3+": [329, 483]
}</syntaxhighlight>
}</syntaxhighlight>
<noinclude>
<span id="portrait-and-landscape"></span>
<span id="portrait-and-landscape"></span>
==Portrait and Landscape==
==Portrait and Landscape==
 
<noinclude>
<includeonly>
==== Portrait and Landscape ====
</includeonly>
I figured I would implement portrait and landscape orientations into the script. Portrait mode is enabled by default. Passing <code>--orientation landscape</code> to the command switches to landscape output. It might be more concise to have a <code>--landscape</code> flag.
I figured I would implement portrait and landscape orientations into the script. Portrait mode is enabled by default. Passing <code>--orientation landscape</code> to the command switches to landscape output. It might be more concise to have a <code>--landscape</code> flag.


Line 103: Line 116:
#    paper_height = landscape_paper_sizes[args.papersize[0]][1]         
#    paper_height = landscape_paper_sizes[args.papersize[0]][1]         
#    print(args.papersize[0], "landscape", paper_width, "mm x", paper_height, "mm")</syntaxhighlight>
#    print(args.papersize[0], "landscape", paper_width, "mm x", paper_height, "mm")</syntaxhighlight>
<noinclude>
<span id="ratio"></span>
<span id="ratio"></span>
==Ratio==
==Ratio==
 
</noinclude>
<includeonly>
====Ratio====
</includeonly>
<syntaxhighlight lang="python">ratio = args.ratio[0].split(":")
<syntaxhighlight lang="python">ratio = args.ratio[0].split(":")
ratio_x = int(ratio[0])
ratio_x = int(ratio[0])
Line 132: Line 149:
         w += ratio_x
         w += ratio_x
         h += ratio_y</syntaxhighlight>
         h += ratio_y</syntaxhighlight>
<noinclude>
<span id="pandas-numpy-and-sklearn"></span>
<span id="pandas-numpy-and-sklearn"></span>
==Pandas, Numpy and Sklearn==
==Pandas, Numpy and Sklearn==
 
</noinclude>
<includeonly>
====Pandas, Numpy and SciKit Learn====
</includeonly>
At the beginning of the script, I imported (parts of) these modules into the python script. This was to enable python to make use of different mathematical functions. In particular, I’m going to use a pandas DataFrame, SciKit Learn’s MinMaxScaler and Numpy’s interp function. The purpose is to provide the user with the ability to scale the size of the cropped page in the output. In short, the values in <code>possible_widths_list</code> and <code>possible_heights_list</code> are adjusted to a percentage scale. That there can be more or less than 100 values in the <code>possible_widths_list</code> and <code>possible_heights_list</code> means that the value of the length of the list needs to represents 100%. To begin with, let’s create a DataFrame and a scaler. The code which appears below was adapted from [https://codefellows.github.io/sea-python-401d5/lectures/rescaling_data.html this website].
At the beginning of the script, I imported (parts of) these modules into the python script. This was to enable python to make use of different mathematical functions. In particular, I’m going to use a pandas DataFrame, SciKit Learn’s MinMaxScaler and Numpy’s interp function. The purpose is to provide the user with the ability to scale the size of the cropped page in the output. In short, the values in <code>possible_widths_list</code> and <code>possible_heights_list</code> are adjusted to a percentage scale. That there can be more or less than 100 values in the <code>possible_widths_list</code> and <code>possible_heights_list</code> means that the value of the length of the list needs to represents 100%. To begin with, let’s create a DataFrame and a scaler. The code which appears below was adapted from [https://codefellows.github.io/sea-python-401d5/lectures/rescaling_data.html this website].


<syntaxhighlight lang="python">df = pd.DataFrame({"widths": possible_widths_list, "heights": possible_heights_list})
<syntaxhighlight lang="python">df = pd.DataFrame({"widths": possible_widths_list, "heights": possible_heights_list})
scaler = MinMaxScaler()</syntaxhighlight>
scaler = MinMaxScaler()</syntaxhighlight>
<noinclude>
<span id="visualising-the-dataframe"></span>
<span id="visualising-the-dataframe"></span>
===Visualising the dataframe===
===Visualising the dataframe===
 
</noinclude>
<includeonly>
=====Visualising the Dataframe=====
</includeonly>
The dataframe resembles a table of widths and heights spanning a range of values.
The dataframe resembles a table of widths and heights spanning a range of values.


Line 159: Line 184:


[84 rows x 2 columns]</pre>
[84 rows x 2 columns]</pre>
<noinclude>
<span id="adding-scaled-values-to-the-dataframe"></span>
<span id="adding-scaled-values-to-the-dataframe"></span>
===Adding scaled values to the dataframe===
===Adding scaled values to the dataframe===
 
</noinclude>
<includeonly>
=====Adding scaled values to the dataframe=====
</includeonly>
This code assigns a percentage-based value to each possible width and height.
This code assigns a percentage-based value to each possible width and height.


Line 187: Line 216:


[84 rows x 4 columns]</pre>
[84 rows x 4 columns]</pre>
<noinclude>
<span id="interpolating-the-values"></span>
<span id="interpolating-the-values"></span>
===Interpolating the values===
===Interpolating the values===
 
</noinclude>
<includeonly>
=====Interpolating the values=====
</includeonly>
Next, the values are interpolated. To my understanding, this is like cross-referencing the values in one list against the values in another. It’s like creating an array with floating-point indexes. The values in between are interpolated and rounded to the nearest mm. The resulting values are consistently approximate.
Next, the values are interpolated. To my understanding, this is like cross-referencing the values in one list against the values in another. It’s like creating an array with floating-point indexes. The values in between are interpolated and rounded to the nearest mm. The resulting values are consistently approximate.


Line 203: Line 236:
<syntaxhighlight lang="python">scaled_paper_height = math.floor(np.interp(args.scale[0], scaled_heights, possible_heights_list))
<syntaxhighlight lang="python">scaled_paper_height = math.floor(np.interp(args.scale[0], scaled_heights, possible_heights_list))
scaled_paper_width = math.floor(np.interp(args.scale[0], scaled_widths, possible_widths_list))</syntaxhighlight>
scaled_paper_width = math.floor(np.interp(args.scale[0], scaled_widths, possible_widths_list))</syntaxhighlight>
<noinclude>
<span id="writing-to-a-file"></span>
<span id="writing-to-a-file"></span>
==Writing to a file==
==Writing to a file==
 
</noinclude>
<includeonly>
====Writing to a file====
</includeonly>
The output of the script is code which can be understood by the ConTeXt typesetting software. F-strings containing the values calculated by or provided to the script are used. The variables feature at key points in the ConTeXt code. The file is created. Then, a blank layout is defined and setup.
The output of the script is code which can be understood by the ConTeXt typesetting software. F-strings containing the values calculated by or provided to the script are used. The variables feature at key points in the ConTeXt code. The file is created. Then, a blank layout is defined and setup.


Line 248: Line 285:


<syntaxhighlight lang="python">f.close()</syntaxhighlight>
<syntaxhighlight lang="python">f.close()</syntaxhighlight>
<noinclude>
<span id="pdf-output"></span>
<span id="pdf-output"></span>
==PDF Output==
==PDF Output==
</noinclude>
<includeonly>
====PDF Output====
</includeonly>


ConTeXt can be run on the output file, in this case <code>main.tex</code>, to produce a pdf.
ConTeXt can be run on the output file, in this case <code>main.tex</code>, to produce a pdf.
[[File:Cropped-page-example-01.pdf|alt=PDF depicting the consequence of running ConTeXt on the output of the script.|center|thumb|741x741px|The consequence of running ConTeXt on the output of the script.]]
[[File:Cropped-page-example-01.pdf|alt=PDF depicting the consequence of running ConTeXt on the output of the script.|center|thumb|741x741px|The consequence of running ConTeXt on the output of the script.]]

Latest revision as of 20:33, 24 February 2024


More cropping

I recently wrote a sketch of a command line application written in Python which produces pdf output. I have since enhanced the code, integrating it with ConTeXt output. At the end of this post, I illustrate ways in which the software can be applied to visual ends.

import argparse
import math
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler

To set up a command line application the argparse module is used. Argparse allows for flags and positional arguments to be given to the script when executed at the command line. The values passed in by the user are stored in variables. Argparse also implements a help flag which offers information about available flags.

parser = argparse.ArgumentParser(description='Crop typesetting areas.')

Arguments are added. outfile is a positonal argument whereas the remaining arguments are flags. The flags will be looked at in more detail later on.

parser.add_argument('outfile',
                    metavar='OUTFILE',
                    nargs=1,
                    help="Write to a file")
parser.add_argument('--papersize',
                    metavar='PAPERSIZE',
                    nargs=1,
                    default='A4',
                    help="Provide a standard papersize")
parser.add_argument('--ratio',
                    metavar='RATIO',
                    nargs=1,
                    default='2:3',
                    help="Crop the paper to this proportion")
parser.add_argument('--orientation',
                    metavar='ORIENTATION',
                    nargs=1,
                    default='portrait',
                    help="Switch between portrait and landscape.")
parser.add_argument('--scale',
                    metavar='SCALE',
                    nargs=1,
                    default=[90.0],
                    help="Scale the size of the cropped page.")
_StoreAction(option_strings=['--scale'], dest='scale', nargs=1, const=None, default=[90.0], type=None, choices=None, required=False, help='Scale the size of the cropped page.', metavar='SCALE')

For the sake of example, let’s pass the following arguments to the script.

args = parser.parse_args(args=['--scale', '90',
                               '--ratio', '5:3',
                               '--papersize', 'A3',
                               '--orientation', 'landscape',
                               'main.tex'])

Wishlist

It would be interesting to add a --page-on-page flag which introduces variation in the output. When active, this flag would print the cropped page on the given page size at the given scale and ratio. This is the default behaviour at the moment. Implementing this flag would result in an alternative default behavior where the output is a page already cropped to size.

Papersize Dictionary

I drew up a dictionary of A-series papersizes based on information at papersizes.io. This way paper dimensions can be referenced by name.

portrait_paper_sizes = {
    # size width height (mm)
    "A0" : [841, 1189],
    "A1" : [594, 841],
    "A2" : [420, 594],
    "A3" : [297, 420],
    "A4" : [210, 297],
    "A5" : [148, 210],
    "A6" : [105, 148],
    "A7" : [74, 105],
    "A8" : [52, 74],
    "A9" : [37, 52],
    "A10": [26, 37],
    "A11": [18, 26],
    "A12": [13, 18],
    "A13": [9, 13],
    "2A0": [1189, 1682],
    "4A0": [1682, 2378],
    "A0+": [914, 1292],
    "A1+": [609, 914],
    "A3+": [329, 483]
}

Portrait and Landscape

I figured I would implement portrait and landscape orientations into the script. Portrait mode is enabled by default. Passing --orientation landscape to the command switches to landscape output. It might be more concise to have a --landscape flag.

if "portrait" in args.orientation:
    paper_width = portrait_paper_sizes[args.papersize[0]][0]
    paper_height = portrait_paper_sizes[args.papersize[0]][1]
    print(args.papersize[0], "portrait", paper_width, "mm x", paper_height, "mm")

I have not accounted for a situation in which someone provides a papersize which is not listed in the dictionary. I expect that at the moment, the script will throw an error if this happens. In any case, it’s necessary to exchange the values of the width and height in landscape mode. This can be done in an least two ways. I decided to change the indexes like so.

if "landscape" in args.orientation:
    paper_width = portrait_paper_sizes[args.papersize[0]][1]
    paper_height = portrait_paper_sizes[args.papersize[0]][0]
    print(args.papersize[0], "landscape", paper_width, "mm x", paper_height, "mm")
A3 landscape 420 mm x 297 mm

Bu it is also possible to switch the values of paper_width and paper_height by creating a new dictionary of landscape paper sizes. The code commented out below does that.

# if "landscape" in args.orientation:
#     landscape_paper_sizes = {}
#     for size in portrait_paper_sizes:
#         landscape_paper_sizes[size] = portrait_paper_sizes[size][::-1]
#     paper_width = landscape_paper_sizes[args.papersize[0]][0]
#     paper_height = landscape_paper_sizes[args.papersize[0]][1]        
#     print(args.papersize[0], "landscape", paper_width, "mm x", paper_height, "mm")

Ratio

ratio = args.ratio[0].split(":")
ratio_x = int(ratio[0])
ratio_y = int(ratio[1])
print(f"Crop ratio: {ratio_x}:{ratio_y}")
Crop ratio: 5:3

The ratio is provided to the script with the --ratio flag. By default the ratio is 2:3. Some calculations need to be done so let’s initialise some variables.

possible_widths_list = []
possible_heights_list = []
w = ratio_x
h = ratio_y

In order to ascertain the size of the cropped page, I’m calculating a list of measurements. These measurements indicate towards the 2D area of the cropped page. The values are later used in the context of the scale feature. The following calculation checks the ratio against the dimensions of the page. A for loop is used to provide a limit to the length of the list which contains the measurements described above.

if (math.floor(paper_width / ratio_y)) > (math.floor(paper_height / ratio_x)):
    # If the paper is landscape
    for dimension in range(math.floor(paper_width / ratio_x)):
        possible_widths_list += [w]
        possible_heights_list += [h]
        w += ratio_x
        h += ratio_y
else:
    for dimension in range(math.floor(paper_height / ratio_y)):
        possible_widths_list += [w]
        possible_heights_list += [h]
        w += ratio_x
        h += ratio_y

Pandas, Numpy and Sklearn

At the beginning of the script, I imported (parts of) these modules into the python script. This was to enable python to make use of different mathematical functions. In particular, I’m going to use a pandas DataFrame, SciKit Learn’s MinMaxScaler and Numpy’s interp function. The purpose is to provide the user with the ability to scale the size of the cropped page in the output. In short, the values in possible_widths_list and possible_heights_list are adjusted to a percentage scale. That there can be more or less than 100 values in the possible_widths_list and possible_heights_list means that the value of the length of the list needs to represents 100%. To begin with, let’s create a DataFrame and a scaler. The code which appears below was adapted from this website.

df = pd.DataFrame({"widths": possible_widths_list, "heights": possible_heights_list})
scaler = MinMaxScaler()

Visualising the dataframe

The dataframe resembles a table of widths and heights spanning a range of values.

print(df)
    widths  heights
0        5        3
1       10        6
2       15        9
3       20       12
4       25       15
..     ...      ...
79     400      240
80     405      243
81     410      246
82     415      249
83     420      252

[84 rows x 2 columns]

Adding scaled values to the dataframe

This code assigns a percentage-based value to each possible width and height.

tmp_widths = df.widths - df.widths.min()
tmp_heights = df.heights - df.heights.min()
scaled_widths = tmp_widths / tmp_widths.max() * 100
scaled_heights = tmp_heights / tmp_heights.max() * 100

df["scaled_widths"] = scaled_widths
df["scaled_heights"] = scaled_heights

print(df)
    widths  heights  scaled_widths  scaled_heights
0        5        3       0.000000        0.000000
1       10        6       1.204819        1.204819
2       15        9       2.409639        2.409639
3       20       12       3.614458        3.614458
4       25       15       4.819277        4.819277
..     ...      ...            ...             ...
79     400      240      95.180723       95.180723
80     405      243      96.385542       96.385542
81     410      246      97.590361       97.590361
82     415      249      98.795181       98.795181
83     420      252     100.000000      100.000000

[84 rows x 4 columns]

Interpolating the values

Next, the values are interpolated. To my understanding, this is like cross-referencing the values in one list against the values in another. It’s like creating an array with floating-point indexes. The values in between are interpolated and rounded to the nearest mm. The resulting values are consistently approximate.

scaled_paper_height = math.floor(np.interp(95.2, scaled_heights, possible_heights_list))
scaled_paper_width = math.floor(np.interp(95.2, scaled_widths, possible_widths_list))

print(scaled_paper_width)
print(scaled_paper_height)
400
240

Notice that the printed values correspond to the scaled values in the DataFrame. It’s best if the user can determine the scale to crop the paper to. So, the first argument to np.interp is replaced with args.scale[0].

scaled_paper_height = math.floor(np.interp(args.scale[0], scaled_heights, possible_heights_list))
scaled_paper_width = math.floor(np.interp(args.scale[0], scaled_widths, possible_widths_list))

Writing to a file

The output of the script is code which can be understood by the ConTeXt typesetting software. F-strings containing the values calculated by or provided to the script are used. The variables feature at key points in the ConTeXt code. The file is created. Then, a blank layout is defined and setup.

f = open(args.outfile[0], "w")
f.write("""\\definelayout[blank][
topspace=0mm,
backspace=0mm,
bottomspace=0mm,
width=fit,
height=fit,
header=0mm,
footer=0mm,
leftmargin=0mm,
rightmargin=0mm,
leftmargindistance=0mm,
rightmargindistance=0mm]
\\setuplayout[blank]""")

Then, having turned off page numbering, the f-string containing the values of scaled_paper_width and scaled_paper_height are passed to \definepapersize.

f.write(f"""\\definepapersize[scaled][width={scaled_paper_width}mm, height={scaled_paper_height}mm]
    \\setuppapersize[scaled]""")

The code takes landscape mode into account using an if statement

if "portrait" in args.orientation:
    f.write(f"[{args.papersize[0]}]")
else:
    f.write(f"[{args.papersize[0]}, landscape]")

Finally, the layout is setup, the frame is switched on and the text environment is invoked. Inside the text environment, a frame which fills the typesetting area is included to ensure there is content in the document.

f.write("""\\setuplayout[location=""" "{middle,middle}" """,marking=empty]
    \\showframe
    \\starttext
    \\startframedtext[width=\\textwidth,height=\\textheight]
    
    
    
    \\stopframedtext
    \\stoptext
    """)
f.close()

PDF Output

ConTeXt can be run on the output file, in this case main.tex, to produce a pdf.

PDF depicting the consequence of running ConTeXt on the output of the script.
The consequence of running ConTeXt on the output of the script.