Bad, vile and meaningless: Bruteforcing TUPAS V2 from Alan's clob

Analysis of feasibility of bruteforcing the Personal ID from hashed codes

A web merchant who wishes to authenticate his customers using Finnish web banks may do so.

The banks agree to several kind of authentication schemes. One is where the bank just hands you the personal ID code and you may do with that what you will. To be allowed to get that ID, however, you would need a clearance from the appropriate authority for holding such register.

For instance, consider that I would present a random person coming to this website with a form and by pressing a button, the person would be transferred to bank of his choosing. If he logs in successfully, then the bank would redirect his browser back to this site with his details like name and the Personal ID code. Then, if that code is right, I'd show the particular person some content I intended for no-one else. However, this way I would potentially acquire the Personal IDs of people who I have no business with.

So, an alternative way to do authentication is to know who you expect to be authenticated. This might go as follows: I phone someone whose Personal ID I know and tell him to come to my website. But before he gets to see the content I intended for him, I'll redirect him to his webbank of choice and request for a hashed Personal ID code. After the web browser returns from the bank with that hash, I would compute (based on my information of his ID) what the right hash should be, and then I'd compare the two hashes. If the hash is the same, then it is him. If it differs, it must be someone else but I'm not supposed to be able to know who.

Now, here's the problem. The space of Finnish Personal IDs is so small that a moderate desktop computer can crack it in about 30 minutes. Assuming that I get a hash and I don't know what ID it corresponds to, I still do know every other bit of detail that went into building that hash. So, I will go through the space systematically until the hash I get is the same hash, and then I'll know. Voilá, I just circumvented a piece of legislation.

The computation of the feasibility of this attack goes as follows. Firstly, the structure:

DDMMYYXCCCZ (for instance, 010100-123D)

DD  = day from 01 to 31
MM  = month from 01 to 12
YY  = year from 00 to 99
X   = century code, from 1800 to 2000: '+', '-' or 'A'.
CCC = 3-digit integer code, of no inner structure
Z   = check code

Of these, DD, MM, YY, X and the CCC are freely chosen. The Z is computed from DD, MM, YY and CCC. Because there's approximately 365 days in a year, and the C-code is 3 digits, it follows that all the personal codes for that year must be selected from the set of one of these 365.000 different codes.

Because there are only about 100 years you may expect a person to live, it follows that to crack the code, you can make a search checking the most likely age groups first and extend to the unlikelier years shortly, and at maximum it will take you 37 million tries to find the personal code. There are several methods to priorize searches -- one is to keep the years-to-be-checked in an array and each time when you crack a code, move that year from its position to the head of the array, so it gets checked early in the future.

The bank code hashes are seeded from the request data in such a manner that it is impossible to pretabularize this data. That means that rainbow tables and prehashings and so on won't help you here. But a malicious merchant can (albeit usually not in real time) compute the personal ID of an user he tricked into going through the webbank authentication scheme.

Caveat: My 30 minutes estimate is based on the performance of a very naive Perl code I wrote. An intelligent algorithm in C would search this space possibly up to 100 times faster (or more likely, 10-20 times faster). So that makes it reasonable to expect answers within the first minute with the hypothetical C implementation -- and that means a single computer will deliver near realtime results.