Introduction to Using the Shell in a High-Performance Computing Context: Instructor Notes

Overall

Many people have questioned whether we should still teach the shell. After all, anyone who wants to rename several thousand data files can easily do so interactively in the Python interpreter, and anyone who’s doing serious data analysis is probably going to do most of their work inside the IPython Notebook or R Studio. So why teach the shell?

The first answer is, “Because so much else depends on it.” Installing software, configuring your default editor, and controlling remote machines frequently assume a basic familiarity with the shell, and with related ideas like standard input and output. Many tools also use its terminology (for example, the %ls and %cd magic commands in IPython).

The second answer is, “Because it’s an easy way to introduce some fundamental ideas about how to use computers.” As we teach people how to use the Unix shell, we teach them that they should get the computer to repeat things (via tab completion, ! followed by a command number, and for loops) rather than repeating things themselves. We also teach them to take things they’ve discovered they do frequently and save them for later re-use (via shell scripts), to give things sensible names, and to write a little bit of documentation (like comment at the top of shell scripts) to make their future selves’ lives better.

The third answer is, “Because it enables use of many domain-specific tools and compute resources researchers cannot access otherwise.” Familiarity with the shell is very useful for remote accessing machines, using high-performance computing infrastructure, and running new specialist tools in many disciplines. We do not teach HPC or domain-specific skills here but lay the groundwork for further development of these skills. In particular, understanding the syntax of commands, flags, and help systems is useful for domain specific tools and understanding the file system (and how to navigate it) is useful for remote access.

Finally, and perhaps most importantly, teaching people the shell lets us teach them to think about programming in terms of function composition. In the case of the shell, this takes the form of pipelines rather than nested function calls, but the core idea of “small pieces, loosely joined” is the same.

All of this material can be covered in three hours as long as learners using Windows do not run into roadblocks such as:

Preparing to Teach

Teaching Notes

Specific Lessons

Why Use a Cluster?