User:Riviera/Emacs layout

From XPUB & Lens-Based wiki

Part One

An aim of the reading, writing and research methods seminars this trimester is to create a personal reader. In a previous wiki post, I discussed my engagement with free software based solutions to the typographical challenges posed by this task. This wiki post picks up on the same theme from a perspective closer to these wiki posts about Emacs. The outcome is a result of minimally typesetting, and then printing Emacs buffers. The font is Deja Vu Mono, the text has double height line spacing and it’s paginated more or less by hand.

To facilitate with this endeavor, I wrote a handful of functions in Emacs Lisp.

Right Margin

Below is an example:

(defun configure-right-margin (right-margin-width file 
                    &optional fill-column justify)
  "Configure RIGHT-MARGIN-WIDTH in FILE, optionally set FILL-COLUMN or JUSTIFY"
  (find-file-other-window file)
  (progn
    (set-right-margin (point-min) (point-max) right-margin-width)
    (if fill-column
    (set-fill-column fill-column)
      nil)
    (if justify
    (fill-region (point-min) (point-max) justify)
       nil)))

The function weaves together various actions which I would need to perform by hand in Emacs for each text. The idea is that one could call this script on a file whilst running Emacs in Batch Mode. I hope this could be achieved by running a command such as:

$ emacs -Q --batch -l paginate.el --eval '(configure-right-margin 8 "foo.txt")'

This command runs Emacs non-interactively. It loads the file with the function definitions into memory and then evaluates one of those defined functions. Presumably there is a way in which this could be hooked into a python or Bash script.


Paginate

(defun paginate (page-length initial-page-number offset)
  (progn
    (insert-char 12) ; form feed
    (insert-char 13 2) ; carriage return twice
    (newline)
    ;; format page number
    (insert-char 32 (- fill-column offset)) ; space
    (insert "[" (int-to-string initial-page-number) "]") ; pg number in header
    (setq initial-page-number (1+ initial-page-number))
    (newline)
    (insert-char 13)
    (open-line 1)
    (forward-line (- page-length 4))
    (forward-line)
    (if (= (point) (point-max))
    (beginning-of-buffer)
      (list (forward-line (- 0 1))
        (paginate page-length initial-page-number offset)))))

The script above inserts particular characters at a given point on occasions which are determined by variables such as page-length. Functions in lines 12-14 illustrate the spatial aspects of Emacs. Furthermore, it is a recursive function which means it calls itself until completion. The script paginated documents in a rudimentary way. The paginate function therefore became the basis for the new-page function.

(defun new-page (page-number offset)
  "Insert a form feed, two carriage returns, the page number and another carriage return."
  (progn
    (unless (= (point) (point-min))
      (insert-char 12)) ; form feed
    (insert-char 13 2) ; carriage return
    (newline)
    (insert-char 32 (- fill-column offset)) 
    (insert "[" (int-to-string page-number) "]")
    (newline)
    (insert-char 13) ; carriage return
    (open-line 1)
    ))

Paginate Revisited

Accordingly, I wrote an improved version of the paginate script which utilises cond to insert page breaks. cond allows for the execution of code under particular conditions. Each clause in cond is checked in order until one returns non-nil. The discussion proceeds by way of each of these clauses. For the predicates, the functions with -p suffixes, please see the appendix.

(defun paginate (page-length initial-page-number offset fill-column
                 right-margin-width)
    "Insert Form Feeds, Carriage Returns and page numbers at conditional
intervals throughout the document. Remove orphans also."
    (cond

(= (point) (point-min))

The first condition checks if point and point-min are equal. If so, the function new-page is called. It’s salient to invoke this condition as the behaviour of new-page changes if point and point-min are equal.[1] Then the value of initial-page-number is incremented and point moves forward several lines. Lastly, paginate calls itself. As point has moved forwards several lines this condition will never be true in subsequent phases of function execution.

      ((= (point) (point-min))
       (new-page initial-page-number offset)
       (setq initial-page-number (1+ initial-page-number))
       (forward-line (- page-length 4))
       (paginate page-length initial-page-number offset fill-column
             right-margin-width))

(current-line-is-headline-p)

The second condition checks if the current line is a headline. I’ve taken a leaf out of the markdown book and used ‘#’ delimiters to designate headlines in the body of the text. I don’t want these to be formatted in the final version, so I’ll have to write another function called cleanup-headlines.

      ((current-line-is-headline-p)
       (new-page initial-page-number offset)
       (setq initial-page-number (1+ initial-page-number))
       (forward-line (- page-length 4))
       (paginate page-length initial-page-number offset fill-column
             right-margin-width))

The function executes in exactly the same way as the clause above.

(beginning-of-section-p)

Each time point moves forward a certain number of lines in the script, it arrives at a possible location for a page break. This condition checks if the current line is the opening line of a section. In other words, the current line is preceded by a newline which is in turn preceded by a headline. Suppose the current line comes at the opening of a section. I want the headline two lines beforehand to appear on the same page, rather than on the previous page. Therefore, the script

  • sets mark
  • moves point two lines backwards
  • kills the region between point and mark
  • inserts a new page
  • increments the page number
  • moves point forward one line
  • stores point
  • yanks the region back into the buffer
  • restores point
  • moves forward several lines
  • calls paginate
      ((beginning-of-section-p)
       (set-mark (point))
       (forward-line (- 2))
       (kill-region (mark) (point))
       (new-page initial-page-number offset)
       (setq initial-page-number (1+ initial-page-number))
       (forward-line)
       (save-excursion (yank))
       ;; (format-headline)
       (forward-line (- page-length 4))
       (paginate page-length initial-page-number offset fill-column
             right-margin-width))

(current-line-is-orphan-p width)

This condition checks, on the basis of line width, whether the current line is an orphan. The implicit assumption of current-line-is-orphan-p is that the final line of a paragraph is never full-width. This is clarified in more detail in the Appendix.

      ((current-line-is-orphan-p (- fill-column right-margin-width))
       (forward-line)
       (new-page initial-page-number offset)
       (setq initial-page-number (1+ initial-page-number))
       (forward-line (- page-length 4))
       (paginate page-length initial-page-number offset fill-column
             right-margin-width))

(current-line-is-full-width-p width)

This predicate was the basis for the preceding current-line-is-orphan-p. It checks if the line is full width. It should be a “normal” place for a page break as there is no white either side of the line. To that end, I think it’s important to write a current-line-is-widow-p clause and place it before this clause.

      ((current-line-is-full-width-p (- fill-column right-margin-width))
       (new-page initial-page-number offset)
       (setq initial-page-number (1+ initial-page-number))
       (forward-line (- page-length 4))
       (paginate page-length initial-page-number offset fill-column
             right-margin-width))

(= (point) (point-max))

This condition checks if the end of the buffer has been reached, if so, the function returns nil. Note: it’s important for this clause to precede the following clause, otherwise the script may enter an infinite loop. Eventually, this clause will return non-nil and the script will exit.

      ((= (point) (point-max))
       nil)

(current-line-empty-p)

The last clause checks if the current line is empty, if so it removes that empty line from the top of the new page. This ensures all pages start at the same height from the top of the page. Then it calls paginate.

      ((current-line-empty-p)
       (new-page initial-page-number offset)
       (setq initial-page-number (1+ initial-page-number))
       (kill-line)
       (forward-line (- page-length 4))
       (paginate page-length initial-page-number offset fill-column
             right-margin-width))
      ))

Part Two

Runts

In typography a paragraph that contains a single word in its final line is known as a runt (according to Wikipedia). The presence of runts in typeset text is considered poor typography. As part of my attempts to use Emacs as a layout engine I wrote a script which removes runts from buffers. Here’s how I did so.

Firstly, a function needs to be defined which checks if the current line contains a single word followed by ‘.’. That is handled by the following function.

(defun current-line-is-runt-p ()
    "Return non-nil if the current line is a runt."
  (save-excursion
    (beginning-of-line)
    (looking-at-p "^[[:word:]]+\\.$")))

The next function, correct-runts, is based on a while loop. Initially, on line 7, the code moves point to what Emacs regards as the end of the paragraph. It then checks, via current-line-is-runt-p, whether the current line is a runt. If so, it moves point back to the beginning of the buffer, increments the column width by one character and finally applies justification. At that point the while loop kicks in and the process continues. The condition for the while loop can be interpreted as “exit on success or failure”. Success is defined in terms of point reaching the end of the buffer. Failure, on the other hand. occurs when the values of fill-column-min and fill-column-max are equal. By implication, success amounts to removing all runts from the document within the specified range of column widths.

(defun correct-runts (fill-column-min fill-column-max right-margin-width
                      justify)
  "Check every paragraph for runts. Attempt to remove runts by adjusting
the value of fill-column within the range of FILL-COLUMN-MIN and
FILL-COLUMN-MAX."
  (while (not (or (= (point) (point-max)) (= fill-column-min fill-column-max)))
    (end-of-paragraph-text)
    (if (current-line-is-runt-p)
    (list (beginning-of-buffer)
          (setq fill-column-min (1+ fill-column-min))
          (set-fill-column fill-column-min)
          (set-right-margin (point-min) (point-max) right-margin-width)
          (fill-region (point-min) (point-max) justify))
      )))

Finally, I wrote an additional function which allows me to execute the programme on text in a particular file.

(defun correct-runts-in-file (file fill-column-min fill-column-max
                   right-margin-width justify)
  "Execute `correct-runts' on FILE"
  (find-file-other-window file)
  (correct-runts fill-column-min fill-column-max right-margin-width
         justify))

Footnotes

  1. Specifically, a Form Feed is not inserted if point is at the beginning of the buffer. This is based on my decision to create separate PDFs which will eventually be concatenated.

Appendix

(defun beginning-of-section-p ()
  "Check if the current line is the first paragraph of a new section."
  (save-excursion
    (forward-line (- 2))
    (looking-at-p "^#+[^#]+#+[[:space:]]*$")
    ))

(defun current-line-is-headline-p ()
  "Check if the current line is a headline."
  (save-excursion
    (looking-at-p "^#+[^#]+#+[[:space:]]*$")
    ))

(defun current-line-is-full-width-p (width)
  (save-excursion
    (looking-at-p (concat  "^.\\{" (int-to-string width) "\\}$"))))

(defun current-line-is-orphan-p (width)
  "Check if current line is an orphan, edit as needed."
  (save-excursion
    (looking-at-p (concat  "^.\\{1,"
               (int-to-string (- width 1))
               "\\}$\\|^^[[:space:]]*.\\{1," ; blockquote orphans
               (int-to-string (- width 5))
               "\\}$"
               ))))