Bad, vile and meaningless: On LC_NUMERIC and Perl from Alan's clob

LC_NUMERIC and Perl -- or when $x != "" . $x - 0

One of my users rather innocuously had a statement like this buried in some application boilerplate code:

use POSIX;
setlocale(LC_ALL, "fi_FI");

He was using one of my classes, called Bank::Payment whose purpose is to present unified interface to Finnish www-banks payment form creation code.

He was reporting a crash from the code. My class Bank::Payment said that the numeric format for his payment amount was incorrect. He was giving it a Perl number, such as 12.34. Now, nothing wrong with that, so the crash was obviously in my code. It crashed here:

croak "Amount format incorrect: $amount" unless $amount =~ /^\d+.\d+\z/;

The purpose of the regex is to simply check that it looks something like 12.34. But something was wrong! The error message said "Amount format incorrect: 12,34"!

So, LC_NUMERIC was in effect and changed the way perl does number-to-string conversion. This is, maybe, so that it's easier to print perl numbers on generated reports (easy to set LC_NUMERIC and have output look right) or something. I don't know, but it sure as hell isn't very nice otherwise.

The most important consequence by far is that code that may have implicitly done conversion to string and back to number somewhere will no longer work. Such code could be very subtle to discover. In my case it was very easy to find the exact spot because it died right on it, but more complex programs will undoubtedly have very interesting places where warnings come out and decimal digits get dropped and impossible things happen. Welcome to the land of Perl.

What about use locale

It would be wonderful if all the above only happened after the programmer has explicitly made his wish to be locale-aware known to Perl. Unfortunately that is not to be. Using locale pragma is irrelevant to this.

The inconsistencies

It turns out that LC_NUMERIC stops being in effect with another pragma:

use bignum;

But using Math::Big* classes is definitely going to slow down your program (and everyone else's that happens to get your heavy values from functions). However, it will nevertheless also act as a workaround.

The subtleties

Consider the differences of output between these two programs:

use POSIX;
POSIX::setlocale(LC_NUMERIC, "fi_FI");
$x = 2.03;
$x = "" . $x;
print $x;

and:

use POSIX;
POSIX::setlocale(LC_NUMERIC, "fi_FI");
$x = "" . 2.03;
print $x;

The first one prints 2,03, the second prints 2.03. The reason is compile-time constant folding. In the first case, perl gets 2.03 and converts it to string after LC_NUMERIC change is in effect, however in the second case the runtime sees only $x = "2.03". However, the two programs look disturbingly equal.

I originally thought that this would fix it:

sprintf("%s", $amount) =~ /^\d+.\d+\z/;

But that was because I had tested it with sprintf("%s", 12.34) and it got me a number without a comma because perl precomputed the constant expression already at compile-time.

The remedy

Perl must give up its behaviour on LC_NUMERIC. It must not affect the way numbers get stringified at all. For the love of sanity, please, we must have $x = "" . $x - 0 to work, and I'd prefer it to work the same way across all the locales.