Wildcards in Unix & macOS: Why Most People Only Think They Understand Wildcards

Last modified date

Comment: 1

A very useful concept on the macOS (and Linux) command line is that of wildcards. Many casual command line users will have a passing familiarity with them but most people don’t really understand how they actually work.

What’s a Wildcard?

Many, many years ago I remember a tutor in an introductory CS lecture telling the whole class that wildcards were nothing to do with cards. That’s patently incorrect, but probably forgivable in those pre-Google days (he didn’t look like a hardcore card player). Because just like the Joker or “wild card” in a pack of cards, a wildcard is a character that can act as a stand-in for any other character (or characters) in a file path.

Single-Character Wildcard: ?

Suppose you have a directory with a lot of similarly-named files in it:-

$ ls
appendix.txt    report1.txt report3.txt report5.txt
index.txt   report2.txt report4.txt

Now what if you want to do some common operation to just the report files? You can use the single wildcard character ? to specify all the report files like this:-

$ ls report?.txt
report1.txt report2.txt report3.txt report4.txt report5.txt

What has happened here? The shell has treated the ? character as a special character that means “match any character”. The pattern matches filenames composed of the string report followed by another single character, followed by the string .txt. It has then found all the files in the current directory than match this pattern and listed them using ls.

Multi-Character Wildcard: *

Single-character wildcards are quite limiting, and in reality they’re not used terribly often. To see why, imagine what would happen when the number of report files in the example directory reached 10: the filter would no longer match because it only matches a single character, and now the number is two digits long.

Thankfully there’s another way: the * (asterisk) wildcard. This is the workhorse of wildcards, and is the one most people know about. A version of it exists in Windows, too, but it’s nowhere near as powerful. Let’s see why…

Filtering the report files is easy with the asterisk wildcard:-

$ ls report*.txt
report1.txt report2.txt report3.txt report4.txt report5.txt

In fact, it’s easier than that:-

$ ls report*
report1.txt report2.txt report3.txt report4.txt report5.txt

Or even:-

$ ls r*
report1.txt report2.txt report3.txt report4.txt report5.txt

The magic thing about the asterisk is it tries to match any number of characters (including none). So in the final example above, anything that starts with the letter ris enough to match – the asterisk matches all the remaining characters. A file called raptor999 would also match. The key thing to remember when using it is to ask: what is the minimum pattern that would uniquely identify what I’m looking for?

Say you wanted to make a copy of all the report files to a new subdirectory. First create a new empty directory called copies:-

$ mkdir copies

Then use the wildcard pattern r* as an argument to cp as follows:-

$ cp r* copies

Confirm it has copied the files to the subdirectory:-

$ ls copies
report1.txt report2.txt report3.txt report4.txt report5.txt

Why People Only Think They Understand Wildcards

Wildcards are an incredibly powerful tool, which are often misunderstood. On first meeting wildcards, many people – including experienced developers – think the wildcard is somehow sent to the command, such as ls or cp, which then filters the results based on it. Although that’s what happens on other operating systems (such as Windows), in Unix-based systems that is absolutely not what happens.

This can be demonstrated using the lowly echo command. echo simply takes the arguments you pass to it and sends them to the Terminal output:-

$ echo Hello world
Hello world

But if you pass it a wildcard, something else happens:-

$ echo r*
report1.txt report2.txt report3.txt report4.txt report5.txt

So, what’s going on here? To understand, it’s crucial to remember that the wildcard is interpreted by the shell itself, before it is sent to the command.

The shell (not the command) expands the wildcard pattern to match as many files as it can, from the current working directory. So it would expand the above command to be:-

$ echo report1.txt report2.txt report3.txt report4.txt report5.txt

Now the output makes sense. echo is just echoing the filenames it received from the shell. Because this crucial step is never seen by the user, the exact behaviour of wildcard expansion is often misunderstood.

Pro Tip: Using echo to Preview

echo is a very good (and safe) way of checking a wildcard expansion before sending it to any potentially destructive commands such as mv or rm.

You can also use * on its own to match all files in a directory:-

$ ls *
appendix.txt    report1.txt report3.txt report5.txt
index.txt   report2.txt report4.txt

Again, this might not seem any different from using ls on its own, but the crucial thing is that the asterisk wildcard is expanded by the shell into a set of paths, so for example, if you wanted to move all the files in the directory to another directory, or delete them, or whatever, you can: * will expand to the list of files in the current working directory, and you can pass than list to any command.

Lee

A veteran programmer, evolved from the primordial soup of 1980s 8-bit game development. I started coding in hex because I couldn't afford an assembler. Later, my software helped drill the Channel Tunnel, and I worked on some of the earliest digital mobile phones. I was making mobile apps before they were a thing, and I still am.

1 Response

Leave a Reply

Your email address will not be published. Required fields are marked *

Post comment