Tuesday, September 29, 2020

A waste of two days

 This post is just for reference when people ask me what can possibly go wrong with a simple process. In this case, the solution was trivial but the problem was enormous.

Basically, I had some code for which decrpytion was central. I had all my tests run on each check-in that used Bouncy Castle to decrypt data encrypted by gnugpg. This was also tested in lower environments but the upper environments were closed to me. Only DevOps could go there. And, lo-and-behold! decryption failed there with exactly the same artifact as was tested elsewhere.

The error was "checksum mismatch at 0 of 20". A quick Google turned up this StackOverflow post that indicated there was a problem with the pass phrase (a means of protecting the private key). I looked at the file in which the passphrase lived using:

Playing with hexdump [SO] I ran something like:

hexdump -e'"%07.8_ax  " 8/1 "%03d "  " |\n"' passphrase.txt
00000000  049 050 051 -62 -93 052 053 054 |
00000008  055 010                         |

Well, firstly there appears to be a line return (010) in there but this is a read herring as if you use the command to get this into KeyVault as outlined in a previous post (that is, by cating it) then the new line is removed. This was serendipitous as the gpg man pages say:

       --passphrase-file file

              Read the passphrase from file file. Only the first line will be read from file file. 

If we'd used the proper az command line (using the -f switch to indicate a file rather than cating it), Key Vault would have contained the newline. 

So, after wasting some time on that, I next looked at why my JVM was saying the pass phrase was 21 characters when I could only see 20. Checking each byte of the pass phrase, it became clear that the character "£" (that is, the pound sterling sign) was taking up two bytes. 

Fair enough, I thought. It's not ASCII but I can encrypt and decrypt fine with this pass phrase using gpg. So, why was my code failing?

And this is the kick in the pants. "Unless otherwise specified, the character set for text is the UTF-8 [RFC3629] encoding of Unicode [ISO10646]." (OpenGPG format RFC) But there is a UTF-8 enconding for the pound sterling symbol, so what gives? Well, different implementation are at liberty to map characters to UTF-8 as they see fit, it seems:

"Please note that OpenPGP defines text to be in UTF-8.  An implementation will get best results by translating into and out of UTF-8.  However, there are many instances where this is easier said than done.  Also, there are communities of users who have no need for UTF-8 because they are all happy with a character set like ISO Latin-5 or a Japanese character set.  In such instances, an implementation MAY override the UTF-8 default by using this header key.  An implementation MAY implement this key and any translations it cares to; an implementation MAY ignore it and assume all text is UTF-8." [ibid]

So there is some ambiguity how to handle these characters and I don't appear to be the only one to have fallen into this trap as the Bouncy Castle mailing lists suggest. And it's not just Bouncy Castle but Open Keychain also appears to have this problem.

Ambiguous specificationas are the root of all evil.


No comments:

Post a Comment