Pipelines: Difference between revisions

From XPUB & Lens-Based wiki
Line 69: Line 69:


=== Shuffle a file ===
=== Shuffle a file ===
shuf is a simple program than randomizes the lines of a file. It can be run like:


  shuf < somefile
  shuf < somefile
Line 74: Line 76:
  cat somefile | shuf
  cat somefile | shuf


Also if shuf is run with the name of a file, it will use that as it's input:
shuf somefile
=== Head/Tail ===
To see the top of a file, you can use ''head'':
To see the top of a file, you can use ''head'':


Line 80: Line 87:
Head has an option (-n) for how many lines to show.
Head has an option (-n) for how many lines to show.


Similarly ''tail'' shows the bottom of a file, this one is very useful for quickly checking a log file:
sudo tail /var/log/apache2/error.log
=== Random Line ===
To pick a random line of a file, you could first shuffle it, then pick the first line:
To pick a random line of a file, you could first shuffle it, then pick the first line:


  cat somefile | shuf | head -n1
  cat somefile | shuf | head -n1

Revision as of 11:13, 20 February 2013

Philosophy of the commandline: Small tools that do one thing very well, loosely connected together to make custom "pipelines" or workflows to do specific (or surprising) things.

stdin and stdout

Every program receives "standard in", and sends its output to "standard out". By default, stdin is taken from the keyboard, and stdout will display something to the screen. These mappings can be adjusted however using redirection using the special pipeline characters '>', '<', and '|'.


Redirecting stdout with >

date

Displays the date to the screen (no stdin used by date).

date > time.txt

Redirects the output of date and "saves as" time.txt.

cat time.txt

Display time.txt (to the screen by default)

Variation, Adding to a file with >>

date >> time.txt

Will addon to a file (or "append" in CS lingo).

Redirecting stdin with <

wc -l

"Word count" program can be used to simply count the number of lines of a text file (with the -l option). When the above command is run, wc "listens to stdin" which is the console/keyboard. The program appears to do nothing and the shell "hangs" waiting for input. Type a few lines in such as...

testing
one
two
three
<CTRL-D>

Finally, on a blank line, pressing Ctrl-D tells the shell "END OF FILE" -- or stop reading input, and wc will snap into action and output the number of lines it read from stdin.

wc -l < mytextfile

Tells wc to use mytextfile as stdin and thus shows how many lines are in that file.

Piping (stdin=>stdout) with |

ls | wc -l

Is sort of like:

ls >< wc -l  (this is invalid!)

In that stdout of ls is "piped" to be the stdin of wc. The result is a file count of the current directory. (NB: the ls command is smart and disables multiple column output if it's being redirect (ie not going straight to the console), to see this try:

ls | cat

The smoking cat: cat+pipe

Note that you can also get the same effect of < by using the cat program (which just copies the contents of a file to stdout)

cat somefile | wc -l

This is sometimes nice to read as left to right flow is maybe easier to read than putting a "<" at the end.

Other commands

shuf, head, tail

Shuffle a file

shuf is a simple program than randomizes the lines of a file. It can be run like:

shuf < somefile

or

cat somefile | shuf

Also if shuf is run with the name of a file, it will use that as it's input:

shuf somefile

Head/Tail

To see the top of a file, you can use head:

cat somefile | head

Head has an option (-n) for how many lines to show.

Similarly tail shows the bottom of a file, this one is very useful for quickly checking a log file:

sudo tail /var/log/apache2/error.log


Random Line

To pick a random line of a file, you could first shuffle it, then pick the first line:

cat somefile | shuf | head -n1