Bruce Fitzsimons (bwooce) wrote,
Bruce Fitzsimons

  • Mood:

Joys of Hexdump

Noone on the internet apparently knows this, but say you have binary file A that you want to embed into source code of program B.

One way of doing it is to use hexdump, which is non-portable, obtuse, and finicky, but witht he following format:

" " 1/1 "0x%02x, " ""

and a few options: hexdump -v -ftmpformat /usr/share/zoneinfo/posixrules
where tmpformat is a file containing the format above (I could never get the command-line format string working).

The resulting output can then be #include'd into the middle of a static char array definition, like so:

char fileAsString[] = {
#include "fileAsString.h"
0 };

(final null has to be worked around, required due to the final comma).

You may ask the reason I'm doing this. Its not crypography, don't get me started...
The reason for this malarky is that unix timezone conversion code doesn't exist in any sane form.

Most people work around this by playing with TZ. This means you encounter several problems:


The getenv/putenv function calls don't do what they appear to do. No, no, they don't. The fucking things are a relic of halfarse '70s API design. You see, putenv takes a char *. A single char *, which initially just looks inconvenient. Its even a const char * on some archs. So you pass it a char * of "TZ=MET". Was that a stack allocated array? Oh sorry, you've just made your first mistake. The fucking thing does not use the first chars as a key, and copy the value. Oh no. It takes *ownership* of the entire thing.

Ok, so you give it a static char * every time then? Bzzzt. No no no, this in fact causes major problems and odd slowdowns on some archs (HP-UX 11.0). It almost seems like each putenv pushed more onto an internal stack, it was odd, I can't imagine why it would be implemented like that. I digress. In fact the only reason to use putenv is to make sure that your static char * is the one being used(and therefore a goodly size). Yes thats right gentlepeople, you have to give away ownership of the memory. You can't get it back, so its damn lucky that relying on the OS to free memory at process termination isn't a capital crime.

Which brings us to getenv. Lovely function. Now I've explained putenv it won't surprise you to know that it returns you a char * to the first character after the = in the original string. Consequently, gentle reader, it is not possible to determine if the buffer in use is yours or not by a simple pointer comparison. You need to compare the original + [= offset] to the pointer returned by getenv.

All of this is documented to a greater or lesser extent, depending on your architecture of choice. Its certainly non-obvious on some, and the resulting problems from any misunderstanding are assured to be odd. We had *performance* problems from this...of all things.


Nearly all the timezone code in the unix world is descended from one point, apparently, ( Its very...interesting code. HP-UX don't use it. It doesn't seem to cope with the simple TZ case code either, which isn't immediately apparent. Regardless of this there is an implicit problem that changing the TZ will result in a new call to tzset (eventually). tzset is always implemented as going and searching the filesystem for that TZ and reading the appropriate file. Even on HP-UX where thay have one file (tztab), they don't cache the thing, they reread it. Every single time. If you're doing much TZ conversion then this becomes a little bit of an issue, if you can find a way to determine that you're doing 10 filesystem reads per TZ change...

2 performance problems in one bit of code. Stunning. I've glossed over the problems caused by the stupid halfarsed '80s API design of the posix time calls that use static buffers. All good and easy to remember until you call someone elses function that does any interesting time calls in the course of its business. Like debug output, in my case.


I'm embedding the posix TZ information in a file with a lookup function and some time calls that use it and we'll ship the fucker. TZ can't actually be fully described by this, but it'll be good enough for our purposes. I don't even want to approach the mire of TZ=[Three to Four different syntaxes, with optional bits, and some undefined arch-dependent syntax].

  • Post a new comment


    Comments allowed for friends only

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 1 comment