Streams vs. Arguments

Bash commands are just programs

Bash commands are small (mostly..) independent programs which in theory “do one thing and do it well”. To prove this, you can find out where they are in the filesystem with which

> which ls

(but you can’t look inside because they are compiled binaries)

Standard Unix streams

In the early days of Unix, operators used dumb terminals to connect to the huge, centralised monsters which is what computers were back then. These terminals consisted of a keyboard and a monitor (or printer) and little else - they were basically one of those fat screens you see in old movies. The remote computer would happily wait around for all the characters typed by the operator at their own speed, without negative effects if the terminal was too slow in sending characters and without expecting an end. These connections, and the characters traveling on them, were called streams. Streams were then abstracted, so that writing to or reading from a stream is exactly the same as doing it to a file.

In modern computers the same functionality as the terminal is carried out by a small dedicated program, which is referred as either “the shell”, “the command line”, “the CLI (command line interface)” or, on OS X, “Terminal”. The CLI app uses streams to communicate with other parts of the system, like the keyboard or the network. The three default streams available on Unix terminals are STDIN to read from (the keyboard), STDOUT to write to (the shell), and STDERR to write errors to (also the shell). Standard behaviour can be easily changed, so that an app can be made to write to a file instead of the STDOUT stream, for example.

By default your CLI app connects the keyboard as STDIN and the shell window as both STDOUT and STDERR. When you type something (STDIN) it’s passed on immediately to the app which does two things - it pushes it as it is to window (STDOUT) so that you see what you’ve typed; and it keeps it around waiting for you to hit return. At the point it will parse what you have typed and run it as a command if it can, and print the output to the window (STDOUT); if it doesn’t understand it, it will print an error message to the window (STDERR)

# Terminal taking your STDIN and copying it to STDOUT
> echo "hello"
# <return> detected - command run and result printed to STDOUT
# this command wasn't understood
# error message printed to STDERR (which
# in Terminal is exactly the same window as STDOUT)
> gibberish
-bash: gibberish: command not found

The bash commends we saw earlier, ‘ls’ or ‘grep’ or ‘echo’, will also get the same STDOUT, STDIN and STDERR as Terminal - depending on the program, you may get the same output twice.

# program "cat" recognized and started
> cat
# it just sits there and collects everything you type in STDIN until
# you type <return> then it prints it to STDOUT
# meanwhile your shell window is also pushing everything it gets
# to STDOUT,  therefore you get it twice
line1 # printed to STDOUT by shell window as you type
line1 # printed to STDOUT by cat when you typed <return>
line2 # printed to STDOUT by shell window as you type
line2 # etc
# to quit the shell program normally you use <ctrl-c>

Streams redirection

You can easily redirect one of the three standard streams to something else - typically a file. To redirect STDOUT, use >

# program cat recognized; STDOUT redirected to a file
> cat > test.txt
# your shell window is still printing to STDOUT as you type; but cat
# itself is not, it is printing to the text file.
# So this time you only get each line once
line1 # printed to STDOUT by shell window as you type
line2 # printed to STDOUT by shell window as you type
# to quit the shell program normally you use <ctrl-c>
# if you open your text file, it will have the
# text you have just typed inside
> open test.txt

STDIN is redirected with <

# program cat recognized; instead of taking
# STDIN from keyboard, use a file
> cat < test.txt
# all the text in the file is printed out in one go

STDERR is redirected with 2>

# error printed to STDERR
> cat gibberish
cat: gibberish: No such file or directory
# STDERR redirected to "the null device",
# i.e. an address on Unix systems
# that absorbs all error messages and suppresses them
> cat gibberish 2> /dev/null
# no output - it's disappeared into /dev/null

Program arguments

Programs can also have arguments - these are values that are typically typed in and passed to the program by Bash as an array. Arguments are space separated (you can use quotation marks to include a space as part of the argument).

# program "echo" called, and 3 arguments passed to it - a, b, and c
> echo  a b c
# echo does its thing - which is simply to print out arguments
a b c

# this time echo is called with one argument:
# the complete sentence
> echo "What’s it going to be then, eh?"
# in the case of echo, the result looks exactly the same.
# It may not do for other programs
What’s it going to be then, eh?

A lot of programs support both arguments and STDIN / STDOUT; but they don’t have to. Take grep for example - a program that prints out the input if it matches a pattern. When you run it, it looks at how many arguments it was passed to decide what to do:

# grep called with two arguments:
# export and ~/.bash_profile
> grep "export" ~/.bash_profile
# it runs on the file .bash_profile in your home folder (~/)
# and prints out each
# line that matches the pattern
export PATH="$HOME/bin:$PATH"

When it detects two arguments, it treats the first as a pattern, and the second as the path of a file to open and read line by line. It then prints any line in the file that include the pattern

But grep also supports STDIN:

# grep called with only one argument: export
# Instead of connecting to a file, it waits for input on STDIN
> grep "export"
# Terminal prints what you type to STDOUT, as usual
I am now typing something
# still Terminal...
grep is looking for the string export - will it find it?
# grep has detected "export" in its STDIN - so it prints it to STDOUT
grep is looking for the string export - will it find it?

With only one argument, the programmers who created grep decided to treat the first argument as a pattern as before, and to wait for input from STDIN. It makes sense since with only one argument it wouldn’t know which file to open. In the example above I start typing some random stuff and press return, and when grep finds the string matching the patter in my text it will spit out the string again.

Streams piping

What makes Unix so useful is that you can connect small programs together by joining the STDOUT of a program with the STDIN of another - using the pipe character, |, and because streams are treated like files, it will just work. But you already knew that.

# the STDOUT of the ps program is connected to the STDIN of grep
> ps -ef | grep httpd

Why can’t you pipe a command to echo?

With all that out of the way, the explanation is quite simple - piping commands to echo does not work, because echo was not programmed to care about STDIN. All it’s wired up to do is to take the arguments and copy them to STDOUT.

# ls puts the output on STDOUT, which is connected to echo's STDIN
# but echo ignores STDIN, all it cares about is command line arguments
> ls | echo

So if your command ignores STDIN, what you have to do is to find a different one which does the same thing, but also reads from STDIN. In the case of echo, that substitute is cat, which as we saw above, does what echo does, but using STDIN as input:

> ls ~/ | cat

But that’s not the whole story.

Using xargs to transform STDIN to arguments

Turns out you can pipe to echo, if you use xargs. Xargs is a command that takes STDIN and turns it into arguments for a command (if it finds no command it will use echo). So:

# xargs is basically creating the command
# echo Applications Desktop Documents ...
> ls ~/ | xargs echo
Applications Desktop Documents ...

Notice the difference between cat and xargs. cat adds newlines - it treats each space separated word as a different input. xargs instead removes newlines - part of its purpose is to normalize blank spaces, tabs and newlines into a consistent format.

You can see that better by passing the argument -1 to ls, which prints the arguments one per line:

> ls -1 ~/
>  ls -1 ~/ | xargs echo
Applications Desktop Documents ...

Further reading

There is lots of info around the web, here are a couple of simple links: