Pipelines: Difference between revisions

From XPUB & Lens-Based wiki
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
Philosophy of the commandline: Small tools that do one thing very well, loosely connected together to make custom "pipelines" or workflows to do specific (or surprising) things.
Philosophy of the commandline: Small tools that do one thing very well, loosely connected together to make custom "pipelines" or workflows to do specific (or surprising) things.
The ''pipeline'' is a fundamental feature of the UNIX command line. By connecting the output of one program to the input of another, you can build chains of commands. In this way simple commands can form the building blocks to build more sophisticated / personalized scripts to do powerful things.
* [[Wikipedia:Pipeline (Unix)|Pipelines (Wikipedia)]]


== stdin and stdout ==
== stdin and stdout ==
Line 55: Line 59:


  ls | cat
  ls | cat
== Nested loops (think clock) ==
<pre lang="bash">
for (( h=0; h<=23; h++ ))
do
for (( m=0; m<=59; m++ ))
do
echo "The time is: $h:$m"
done
done
</pre>
== The smoking cat: cat+pipe ==
Note that you can also get the same effect of < by using the cat program (which just copies the contents of a file to stdout)
cat somefile | wc -l
This is sometimes nice to read as left to right flow is maybe easier to read than putting a "<" at the end.
== Other commands ==
shuf, head, tail
=== Shuffle a file ===
shuf is a simple program than randomizes the lines of a file. It can be run like:
shuf < somefile
or
cat somefile | shuf
Also if shuf is run with the name of a file, it will use that as it's input:
shuf somefile
=== Heads and Tails ===
To see the top of a file, you can use ''head'':
cat somefile | head
Head has an option (-n) for how many lines to show.
Similarly ''tail'' shows the bottom of a file, this one is very useful for quickly checking a log file:
sudo tail /var/log/apache2/error.log
=== Random Line ===
To pick a random line of a file, you could first shuffle it, then pick the first line:
cat somefile | shuf | head -n1

Latest revision as of 15:12, 8 January 2020

Philosophy of the commandline: Small tools that do one thing very well, loosely connected together to make custom "pipelines" or workflows to do specific (or surprising) things.

The pipeline is a fundamental feature of the UNIX command line. By connecting the output of one program to the input of another, you can build chains of commands. In this way simple commands can form the building blocks to build more sophisticated / personalized scripts to do powerful things.

stdin and stdout

Every program receives "standard in", and sends its output to "standard out". By default, stdin is taken from the keyboard, and stdout will display something to the screen. These mappings can be adjusted however using redirection using the special pipeline characters '>', '<', and '|'.


Redirecting stdout with >

date

Displays the date to the screen (no stdin used by date).

date > time.txt

Redirects the output of date and "saves as" time.txt.

cat time.txt

Display time.txt (to the screen by default)

Variation, Adding to a file with >>

date >> time.txt

Will addon to a file (or "append" in CS lingo).

Redirecting stdin with <

wc -l

"Word count" program can be used to simply count the number of lines of a text file (with the -l option). When the above command is run, wc "listens to stdin" which is the console/keyboard. The program appears to do nothing and the shell "hangs" waiting for input. Type a few lines in such as...

testing
one
two
three
<CTRL-D>

Finally, on a blank line, pressing Ctrl-D tells the shell "END OF FILE" -- or stop reading input, and wc will snap into action and output the number of lines it read from stdin.

wc -l < mytextfile

Tells wc to use mytextfile as stdin and thus shows how many lines are in that file.

Piping (stdin=>stdout) with |

ls | wc -l

Is sort of like:

ls >< wc -l  (this is invalid!)

In that stdout of ls is "piped" to be the stdin of wc. The result is a file count of the current directory. (NB: the ls command is smart and disables multiple column output if it's being redirect (ie not going straight to the console), to see this try:

ls | cat

Nested loops (think clock)

for (( h=0; h<=23; h++ ))
do

	for (( m=0; m<=59; m++ ))
	do
	echo "The time is: $h:$m"
	done

done

The smoking cat: cat+pipe

Note that you can also get the same effect of < by using the cat program (which just copies the contents of a file to stdout)

cat somefile | wc -l

This is sometimes nice to read as left to right flow is maybe easier to read than putting a "<" at the end.

Other commands

shuf, head, tail

Shuffle a file

shuf is a simple program than randomizes the lines of a file. It can be run like:

shuf < somefile

or

cat somefile | shuf

Also if shuf is run with the name of a file, it will use that as it's input:

shuf somefile

Heads and Tails

To see the top of a file, you can use head:

cat somefile | head

Head has an option (-n) for how many lines to show.

Similarly tail shows the bottom of a file, this one is very useful for quickly checking a log file:

sudo tail /var/log/apache2/error.log

Random Line

To pick a random line of a file, you could first shuffle it, then pick the first line:

cat somefile | shuf | head -n1