User:Simon/Trim4/Extracting text from the web: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
 
(One intermediate revision by the same user not shown)
Line 7: Line 7:
For example, this is a file I made from some notes on a Flusser interview about linear writing:
For example, this is a file I made from some notes on a Flusser interview about linear writing:


    $ curl https://pad.xpub.nl/p/flusser_interview_notes/export/txt | pandoc -t markdown > flusser.md
<code>
$ curl https://pad.xpub.nl/p/flusser_interview_notes/export/txt | pandoc -t markdown > flusser.md
</code>


I'm then storing the files in [https://git.xpub.nl/simoon/thesis my git], which is public. Having texts in git allows me to use its versioning capabilities, allowing me to go back over old modified versions in the file tree - I can copy paste from these snippets that I may want to go back and retain in the future..
I'm then storing the files in [https://git.xpub.nl/simoon/thesis my git], which is public. Having texts in git allows me to use its versioning capabilities, allowing me to go back over old modified versions in the file tree - I can copy paste from these snippets that I may want to go back and retain in the future...
 
[[Category: Tasks of the Contingent Librarian]]

Latest revision as of 16:34, 20 June 2020

11.11.19 Extracting text using curl

curl is a command that can be used from the terminal to take text from a URL. It can be piped with software such as pandoc to convert the text to other formats, and in support of a workflow I'm starting to develop, this comes in quite handy.

I'm writing text on the pad, and then converting it to markdown. This extra step isn't necessary (in fact it adds to the work) but I'm interested in using pads as multi-flow publishing tools in the future so I'm testing this out. Also, using a pad allows me to style the text simply using markdown rather than HTML.

For example, this is a file I made from some notes on a Flusser interview about linear writing:

$ curl https://pad.xpub.nl/p/flusser_interview_notes/export/txt | pandoc -t markdown > flusser.md

I'm then storing the files in my git, which is public. Having texts in git allows me to use its versioning capabilities, allowing me to go back over old modified versions in the file tree - I can copy paste from these snippets that I may want to go back and retain in the future...