As programmers, assigning names makes up a big part of our jobs. Phil Karlton said “There are only two hard things in Computer Science: cache invalidation and naming things.” It’s a hard problem, and it’s something we deal with every time we write a line of code. Whether it’s a variable or a table or a column in that table or a file on the filesystem, or what we call our projects and products, naming is a big deal.

Bad variable naming is everywhere. Maybe you’ll find variables that are too short to be adequately descriptive. The programmer might as well have been working in TRS-80 BASIC, where only the first two characters of variable names were significant, and we had to keep a handwritten lookup chart of names in a spiral notebook next to the keyboard.

Sometimes you’ll find variables where all vowels have been removed as a shortening technique, instead of simple truncation, so you have $cstmr instead of $cust. I sure hope you don’t have to distinguish the customers from costumers! Worse, $cstmr is harder to type because of the lack of vowels, and is no longer pronounceable in conversation.

There are also intentionally bad variable names, where the writer was more interested in being funny than clear. I’ve seen $crap as a loop variable, and a colleague tells of overhauling old code with a function called THE_LONE_RANGER_RIDES_AGAIN(). That’s not the type of bad variable name I mean.

While I’m well aware that variable naming conventions can often turn into a religious war, I’m entirely confident when I declare The World’s Worst Variable Name is $data.

Of course it’s data! That’s what variables contain! That’s all they ever contain. It’s like if you were packing up your belongings in moving boxes, and on the side you labeled the box “matter.”

Variable names should say what type of data they hold. Asking the question “what kind” is an easy way to enhance your variable naming. I once saw $data used when reading a record from a database table. The code was something like:

$data = read_record();
print "ID = ", $data["CUSTOMER_ID"];

Asking the question “what kind of $data?” turns up immediate ideas for renaming. $record would be a good start. $customer_record would be better still.

Vague names are the worst, but right behind them are naming related objects with nearly identical names that do not distinguish them. Therefore the World’s Second Worst Variable Name is: $data2.

More generally, any variable that relies on a numeral to distinguish it from a similar
variable needs to be refactored, immediately. Usually, you’ll see it like this:

$total = $price * $qty;
$total2 = $total - $discount;
$total2 += $total2 * $taxrate;

$total3 = $purchase_order_value + $available_credit;
if ( $total2 < $total3 ) {
    print "You can't afford this order.";
}

You can see this as an archaeological dig through the code. At one point, the code only figured out the total cost of the order, $total. If that’s all the code does, then $total is a fine name. Unfortunately, someone came along later, added code for handling discounts and tax rate, and took the lazy way out by putting it in $total2. Finally, someone added some checking against the total that the user can pay and named it $total3.

The real killer in this chunk of code is that if statement:

if ( $total2 < $total3 )

You can’t read that without going back to figure out how it was calculated. You have to look back up above to keep track of what’s what.

If you’re faced with naming something $total2, change the existing name to something more specific. Spend the five minutes to name the variables appropriately. This level of refactoring is one of the easiest, cheapest and safest forms of refactoring you can have, especially if the naming is confined to a single subroutine.

Let’s do a simple search-and-replace on the coding horror above:

$order_total = $price * $qty;
$payable_total = $order_total - $discount;
$payable_total += $payable_total * $taxrate;

$available_funds = $purchase_order_value + $available_credit;
if ( $payable_total < $available_funds ) {
    print "You can't afford this order.";
}

The only thing that changed was the variable names, and already it's much easier to read. Now there’s no ambiguity as to what each of the _total variables means. And look what we found: The comparison in the if statement was reversed. Effective naming makes it obvious.

There is one exception to the rule that all variables ending with numerals are bad. If the entity itself is named with a number, then keep that as part of the name. It's fine to use $sha1 for variable that holds a SHA-1 hash. It helps no one to rename it to $sha_one.

After I wrote the first version of this article, I created policies for Perl::Critic to check for these two naming problems. My add-on module Perl::Critic::Bangs includes two policies to check for these problems: ProhibitVagueNames and ProhibitNumberedNames.

What other naming sins drive you crazy? Have you created automated ways to detect them?