Pandoc: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
 
(12 intermediate revisions by the same user not shown)
Line 1: Line 1:
__TOC__
__TOC__


=Pandoc=


https://pandoc.org/
https://pandoc.org/
Line 12: Line 14:
==Pandoc common arguments==
==Pandoc common arguments==


'''-f''' - option standing for “from”, is followed by the input format
'''-f''' or '''--from''' - option standing for “from”, is followed by the input format
 
'''-t''' or '''--to''' - option standing for “to”, is followed by the output format
 
'''-S''' or '''--standalone'''  - option standing for “standalone”, produces output with an appropriate header and footer
 
'''-s''' or '''--stylesheet''' - option to use a CSS stylesheet
 
'''-o''' or '''--output''' - option for file output
 
==Pandoc stdin and stdout==
 
By default, Pandoc can take '''<code>stdin</code>''' as an '''input'''.
 
Likewise, Pandoc also writes its '''output''' to '''<code>stdout</code>''' by default.
 
An example of using the '''<code>stdin</code>''': if you use a pipeline to get your text from a pad, you can use <code>curl</code> to download this text and ''pipe'' it into Pandoc:
 
$ curl https://pad.xpub.nl/p/collaborations/export/txt | pandoc --from markdown --to html
 
This will output HTML in your terminal.


'''-t''' - option standing for “to”, is followed by the output format
An example of using the '''<code>stdout</code>''': if you want to turn this HTML page into a PDF, for example using [[Weasyprint]], you can ''pipe'' the output of Pandoc into Weasyprint:


'''-s''' - option standing for “standalone”, produces output with an appropriate header and footer
$ curl https://pad.xpub.nl/p/collaborations/export/txt | pandoc --from markdown --to html | weasyprint - pad.pdf


'''-o''' - option for file output
Note that Weasyprint (and many other programs) use a '''dash''' <code>-</code> as a special symbol to use <code>stdin</code> as input.


== changing the default template ==
== Changing the default template ==


  $ pandoc --from markdown --to html --print-default-template=html5 > template.html
  $ pandoc --from markdown --to html --print-default-template=html5 > template.html
Line 35: Line 57:
===STRING to MARKDOWN===
===STRING to MARKDOWN===


<pre>echo "<h1>Hello Pandoc</h1><p>from html to markdown</p>" | pandoc -f html -t markdown</pre>
$ echo "<strong>Hello Pandoc</strong> <em>from html to markdown</em>" | pandoc -f html -t markdown


===WIKI to HTML===
===WIKI to HTML===
Line 42: Line 64:
* convert mediawiki to html:
* convert mediawiki to html:


  pandoc page.wiki -f mediawiki -t html -o page.html
  $ pandoc page.wiki -f mediawiki -t html -o page.html


===WIKI to HTML to PDF to BOOKLET PDF===
===WIKI to HTML to PDF to BOOKLET PDF===
EXAMPLE WIKI PAGE to HTML to PDF to BOOKLET PDF


MediaWiki provides a way to get the content of a page in wikitext or HTML:  
MediaWiki provides a way to get the content of a page in wikitext or HTML:  
Line 59: Line 79:
Another example:  
Another example:  


  curl https://pzwiki.wdka.nl/mediadesign/Voting_by_show_of_hands?action=raw > Voting_by_show_of_hands.mediawiki
  $ curl https://pzwiki.wdka.nl/mediadesign/Voting_by_show_of_hands?action=raw > Voting_by_show_of_hands.mediawiki
  pandoc --from mediawiki --to html Voting_by_show_of_hands.mediawiki --output Voting_by_show_of_hands.html
 
  weasyprint Voting_by_show_of_hands.html Voting_by_show_of_hands.pdf
  $ pandoc --from mediawiki --to html Voting_by_show_of_hands.mediawiki --output Voting_by_show_of_hands.html
 
  $ weasyprint Voting_by_show_of_hands.html Voting_by_show_of_hands.pdf


And to make a booklet PDF:  
And to make a booklet PDF:  
Line 75: Line 97:
or in multiple lines, and with in between moments of saving the content to files:
or in multiple lines, and with in between moments of saving the content to files:


  curl https://pad.xpub.nl/p/collaborations/export/txt > pad.md  
  $ curl https://pad.xpub.nl/p/collaborations/export/txt > pad.md  
  pandoc --from markdown --to html pad.md --output pad.html  
 
  weasyprint -s stylesheet.css pad.html pad.pdf
  $ pandoc --from markdown --to html pad.md --output pad.html  
 
  $ weasyprint -s stylesheet.css pad.html pad.pdf


And to make a booklet PDF:  
And to make a booklet PDF:  
Line 83: Line 107:
  $ pdfbook2 --paper=a4paper pad.pdf
  $ pdfbook2 --paper=a4paper pad.pdf
  $ pdfbook2 --paper=a4paper --short-edge --no-crop pad.pdf
  $ pdfbook2 --paper=a4paper --short-edge --no-crop pad.pdf
--------------------------------
[[File:pandoc_diagram.jpg|800px]]




[[Category:Cookbook]]
[[Category:Cookbook]]
[[Category:PagedMedia]]
[[Category:PagedMedia]]

Latest revision as of 13:00, 31 October 2023


Pandoc

https://pandoc.org/

You can use Pandoc to generate PDF's directly from other document formats, like Markdown, wikitext, Libre Office, slides, InDesign ICML files, or PDF.

Pandoc is described as an "universal document converter": it converts documents from one markup language into another.

Extensive documentation: Pandoc’s Manual or man pandoc

Pandoc common arguments

-f or --from - option standing for “from”, is followed by the input format

-t or --to - option standing for “to”, is followed by the output format

-S or --standalone - option standing for “standalone”, produces output with an appropriate header and footer

-s or --stylesheet - option to use a CSS stylesheet

-o or --output - option for file output

Pandoc stdin and stdout

By default, Pandoc can take stdin as an input.

Likewise, Pandoc also writes its output to stdout by default.

An example of using the stdin: if you use a pipeline to get your text from a pad, you can use curl to download this text and pipe it into Pandoc:

$ curl https://pad.xpub.nl/p/collaborations/export/txt | pandoc --from markdown --to html

This will output HTML in your terminal.

An example of using the stdout: if you want to turn this HTML page into a PDF, for example using Weasyprint, you can pipe the output of Pandoc into Weasyprint:

$ curl https://pad.xpub.nl/p/collaborations/export/txt | pandoc --from markdown --to html | weasyprint - pad.pdf

Note that Weasyprint (and many other programs) use a dash - as a special symbol to use stdin as input.

Changing the default template

$ pandoc --from markdown --to html --print-default-template=html5 > template.html
$ pandoc --from markdown --to html --template template.html input.md -o output.html

PDF

A range of PDF engines are supported at the moment, including Paged.js, weasyprint and LaTeX. You need to select the one of choice using the --pdf-engine option, and have the PDF engine installed on your computer.

You can follow this page for instructions: https://pandoc.org/MANUAL.html#creating-a-pdf

Examples

STRING to MARKDOWN

$ echo "Hello Pandoc from html to markdown" | pandoc -f html -t markdown

WIKI to HTML

  • Save the content of a wiki page on to a plain-text file, example: page.wiki
  • convert mediawiki to html:
$ pandoc page.wiki -f mediawiki -t html -o page.html

WIKI to HTML to PDF to BOOKLET PDF

MediaWiki provides a way to get the content of a page in wikitext or HTML:

You can use Weasyprint to generate a PDF from an URL!

$ weasyprint -s stylesheet.css https://pzwiki.wdka.nl/mediadesign/Wordhole?action=render wordhole.pdf

Another example:

$ curl https://pzwiki.wdka.nl/mediadesign/Voting_by_show_of_hands?action=raw > Voting_by_show_of_hands.mediawiki
$ pandoc --from mediawiki --to html Voting_by_show_of_hands.mediawiki --output Voting_by_show_of_hands.html
$ weasyprint Voting_by_show_of_hands.html Voting_by_show_of_hands.pdf

And to make a booklet PDF:

$ pdfbook2 --paper=a4paper --short-edge --no-crop Voting_by_show_of_hands.pdf

PAD to MARKDOWN to HTML to PDF to BOOKLET PDF

In one pipeline:

$ curl https://pad.xpub.nl/p/collaborations/export/txt | pandoc --from markdown --to html | weasyprint -s stylesheet.css - pad.pdf

or in multiple lines, and with in between moments of saving the content to files:

$ curl https://pad.xpub.nl/p/collaborations/export/txt > pad.md 
$ pandoc --from markdown --to html pad.md --output pad.html 
$ weasyprint -s stylesheet.css pad.html pad.pdf

And to make a booklet PDF:

$ pdfbook2 --paper=a4paper pad.pdf
$ pdfbook2 --paper=a4paper --short-edge --no-crop pad.pdf