Log in

No account? Create an account
Daily Log

> recent entries
> calendar
> friends
> My Website
> profile

Saturday, October 14th, 2006
12:10 pm - TCP Naiveté and Jabber, it's even worse
Ohhh. Just had this thought -- even worse than showing presence information erronously based on a dead s2s connection that neither side has timed out is the fact that neither side will bother to re-establish it until it completely times out. Even then I'm not sure if they'll tear down any existing apparently valid but broken connections to the same server that they have. Urg. c2s might be the same.

I should go and read the XMPP specs and source code again.

(comment on this)

12:09 pm - Is Presence Information a 2nd Class Citizen? part 1
Presence information, in IM systems and others, looks likely to rival spam in terms of the quantity of barely-relevant traffic transferred over the internet.

See this analysis for some eye-opening statistics. I'm still not sure if I believe them, however I've done my own calculations for status notifications to mobile phones to tell them if they're in a geo-fenced zone and they were similarly unpretty. Even worse: start thinking about telling everyone on your IM list not just about your state but also geolocation. Spinning yet?

Presence information is odd stuff. Most of the time noone cares about it, but when they do it had better be up-to-date. There must be a good term for this. Temporal relevance?

Anyway, we were thinking about this a bit. Perhaps presence could be OOB from messages? So it could be throttled under load? Is order important? Is order even important for messages in a conference?

It's clear that if anything should be throttled between content and presence, then presence should lose. I'm not so sure that it should ever come to this though, or that the relative ordering of the streams should ever be lost. Ideally, and I like ideals, we should strive to present the same picture to all sides of the audience at the same time. Knowing someone is there, and even where there is, can be just as important as what they said and when they said it.

It is clear that polling for presence doesn't work (inefficent on the watchers, slow to notice changes) but the current unicasting of presence information also looks like a doomed model (inefficient on the watched, linear traffic increase per watcher).

So, what to do? Is Presence even important? I'd argue that we're increasingly relying on presence information, that we unconciously want more, and that the demand for this is only going to increase. Look at the common phone and it's evolution. It has always been as much a presence device, along with a communication device. Mobile phones has improved it (text messaging, "no I won't answer you button", call routing per CLI, etc. Presence clearly is important.

Hmm. I need more time to think the solution to this through, so this will be part 1.

(comment on this)

Thursday, October 12th, 2006
6:18 am - TCP Naiveté and Jabber
Wow. We'll I've been having a few problems with TelstraClear, Jabber/XMPP, and the ejabberd implementation.

Things I now know:
1. Jabber doesn't confirm messages, ever. If a call to send() succeeds then the message is forgotten. This means it could be in a multi-megabyte TCP buffer (un-ack'd or ready to send). Link failure of any kind means that these messages are blackholed.
2. Jabber opens two TCP ports for server to server (s2s) connections, and treats them as unrelated. I've done this before (and been critised for it) but I was using the state of the socket (open/closed) as explicit flow control for the traffic -- the recipient of traffic didn't start listening until they were ready. I can't see any reason for it in Jabber, and the two connections are explicitly unlinked. Weird, esp. when the same fault can hit both with the result of blackholed traffic and misleading statuses for different durations.
3. ejabberd relies on the TCP heartbeating (defaults to start 2 hours after the last traffic) to close connections. I did float doing TCP heartbeating, but it was correctly pointed out that this only marginally improves matters. It can still take 30 minutes before the default linux configuration will close a TCP connection because an ACK has not been received.

I can't believe this. We know a lot about TCP in this century, so protocol designers should not be doing this shite anymore. TCP links are only reliable in the sense that traffic sent into the stack will come out the other end in order, or not at all. It doesn't properly monitor connections, or give any assurance that stuff you thought you sent did arrive.

(sidenote: just found this as an error description for send(2) -- emphasis added:
The output queue for a network interface was full. This gener-ally indicates that the interface has stopped sending, but may be caused by transient congestion. (Normally, this does not occur in Linux. Packets are just silently dropped when a device queue overflows.))

Various proposals for "correctly closing" streams have been proposed, others have promoted counting the length of the TCP buffer so you know how much wasn't sent when a stream closes (yurk), others using the same unreliable mechanism for ack packets that were designed as human feedback. A relatively sane proposal for ack'ing packets does exist but is remarkabily inefficient for general purpose reliable transport. None of these fixes the hole, the only one that comes close is the buffer counting but buffer byte count to specific message translation makes baby jesus cry, and still opens up the duplicate message problem (losing ACKs means duplicates).

Using SCTP rather than TCP would help. The auto-heartbeating would mean that link failure would be noticed sub-second. Link diversity would be possible at the SCTP layer without protocol knowledge, head-of-line blocking and some DoS attacks would be reduced. But fundamentally it would still lose packets that had been asynchronously sent by the application

SCTP works on a non-blocking efficient ACK mechanism, it tells the source what the highest number packet for that connection was in the acks. Gaps are also listed, so those (only) can be retransmitted. This could be implemented in Jabber, certainly for s2s where loss of messages is particularly egregious. It should be clear though that connection loss is for the application to sort out, and to decide if it wants to validate link-to-link or end-to-end. I've not decided either way.

SCTP could work better as a solution for Jabber reliability if there was an interface to to blocking writes. Yes, dear reader, I said blocking writes. This mode would fit amazingly well into a model where each conversation-stream direction had it's own process. Yeah, I'm high on Erlang. It would remove the "oh it was buffered" failure case, because the process would have direct and immediate feedback about the success of each packet. It would stuff up performance though, even though only on the same conversation-stream, because none of the packets would overlap. I'm not sure if I care about that for a chat protocol though...

I'm criticising the protocol here, not ejabberd, let's be clear. I can't see any way to significantly improve ejabberd within the bounds of the protocol. There isn't even any proper/cheap way of doing server-initiated heartbeats within the protocol. Grrr.

Next up: Is Presence Information a 2nd Class Citizen?

(comment on this)

Sunday, October 8th, 2006
11:25 am - w00t! My own Jabber server.
Right, I now have ejabberd going, so I can finally reduce all my jabber contacts onto one, on my own server.

Federation with google talk appears to be working, but not with jabber.org. I'm going to put that down to general jabber.org flakiness (despite running ejabberd) and I'll try again in a few days.

My jabber buddies are going to have to re-authorise me but this should be the last time. Hurrah. Next stop Jingle!

All hail the (jabber) Federation.

(comment on this)

Sunday, October 1st, 2006
4:53 pm - Right, back to it...
Okay, it's been drawn to my attention that I've not updated this in a while. Let me have another go at this.

Things I want to do:
1. Become a board director. I'd like to have some influence over some companies. I think I have a lot to learn, rather than teach, at this point though. It seems like a great way of never retiring too, it would keep the brain going in my dotage.
2. Get my own company running properly. One of them... It sure is hard turning ideas into reality without getting distracted by the next good idea.
3. Get CISSP certified. Which shouldn't be that hard, with a bit of study, it's just organising it. I might hit my company up for it.
4. Live in Spain for a bit. I've tried every other way to improve my Spanish, and eating nice food in a hot climate while doing it is attractive.
5. Learn Mandarin. It's the new world and the writing is on the wall for the English and the USA at least. I'd learn an Indian language, but there are so many, I wonder which one is most common?
6. Get EU Citizenship. I won't give up residency so easily next time, I'm finding it's hard to get back in. It would be a reasonable gift for my kids.

Right. Thats enough for now. I'd better hop to it and take a step towards one of these...

current mood: introspective

(1 comment | comment on this)

Saturday, September 13th, 2003
2:03 pm - Evil application idea for today: DomainSnatcher.co.nz
Let me avoid some critcism upfront and say I hate domain speculators. I think thats the polite term.

But...sometimes theres a domain name thats not being used (perhaps someone is speculating with it), and doesn't look like it will ever be used. I'd like to set up a service (for myself, or profit) to look, periodically, at the Domain Name Commisioners website and see when its going to expire, and nab it when/if it becomes free.

I think I could even make money off it, make the contract "best-efforts" and worst case you refund someones money at some point in the future.

If I have time, I might even implement it. With a high-enough cost of entry it won't be usable by speculators, but won't deter casual seekers.

I think I would prefer a flat namespace where anyone can invent any extension, but I don't see it happening in the current environment. Extra suffixes are currently a huge money making exercise, so I see them being drip-fed for years to come. They are a scam, as if a large company doesn't immediately take up the ones that relate to it then it can be accused of not defending its trademarks and they could become public property. I can imagine this will cause some...concern...when .sex or .xxx comes out. mcdonalds.sex??

(comment on this)

1:36 pm - The TZ issue is still not resolved

Facts I have discovered

1. There is 1 (one) publicly maintained list of Timezone rules for computers in the entire world (ftp://elsie.nci.nih.gov/pub/), the "Olsen" lists.
2. These lists and associated portable C code are used by almost all Unix in the world. HP-UX being the only exception I can find.
3. The TZ rule text format is one of the most unparsable I've seen. The rules (although complex and flexible) have many implicit facets.
4. All the code seems to ignore that most people in the world only know their 3-4 character TZ code (NZDT, CET, EST, etc). The American ones are listed, of course...
5. Some people have attempted to reparse the TZ database into a more usable form:
5.1 The libical team implementing VTIMEZONE from the iCal RFC, although this project seems to have fractured and splintered a lot. The ximian evolution team who originally did this had a convertor from Olsen->VTIMEZONE format, but this is apparently now lost, and they're using static copies of old rules.
5.2 The IBM OSS library whose name I forget, now (just) has code for implementing a full set of Olsen TZ rules inside it. It "compiles" the files into almost the same form as the original, and then works over that to produce a file that (so far) has defied my attempts to work out what the hell they're doing.
6. POSIX is pretty shite. It ignores leap seconds, the the Olsen code allows you to read it.
7. The posixrules file I was going to use is simply a weirdass artifact of POSIX, and is usually just a link to the file you're using for your local timezone. Except some linux distros (Gentoo) don't get that, which is probably because its completely stupid.
8. All the POSIX unix functions are a pile of cack. The Olsen ones (sorry Mr Olsen) aren't much better.


There does not appear to be any code anywhere in the world that allows accurate, updateable, TZ conversions without shagging around the operating system (and causing the implicit actions I ranted about below).

I briefly attempted to define some C++ objects to hold the Olsen data in a useable form. The final design I came up with was basically just a consecutive list of A-B periods describing an offset from GMT/UTC for each TZ. I've stopped for the moment as parsing caused me headaches, and I've realised that doing the above is quite a lot of work for a machine to do, so maybe precompling is a better option. I do want to end up with 1 file or header that describes it all though. Maybe I'll end up doing a 3GPP copout and gzip the expanded version...

I cannot believe this situation. Does noone in the world need to do TZ conversions properly?

(comment on this)

Monday, September 1st, 2003
12:25 am - Joys of Hexdump
Noone on the internet apparently knows this, but say you have binary file A that you want to embed into source code of program B.

One way of doing it is to use hexdump, which is non-portable, obtuse, and finicky, but witht he following format:

" " 1/1 "0x%02x, " ""

and a few options: hexdump -v -ftmpformat /usr/share/zoneinfo/posixrules
where tmpformat is a file containing the format above (I could never get the command-line format string working).

The resulting output can then be #include'd into the middle of a static char array definition, like so:

char fileAsString[] = {
#include "fileAsString.h"
0 };

(final null has to be worked around, required due to the final comma).

You may ask the reason I'm doing this. Its not crypography, don't get me started...
The reason for this malarky is that unix timezone conversion code doesn't exist in any sane form.

Most people work around this by playing with TZ. This means you encounter several problems:


The getenv/putenv function calls don't do what they appear to do. No, no, they don't. The fucking things are a relic of halfarse '70s API design. You see, putenv takes a char *. A single char *, which initially just looks inconvenient. Its even a const char * on some archs. So you pass it a char * of "TZ=MET". Was that a stack allocated array? Oh sorry, you've just made your first mistake. The fucking thing does not use the first chars as a key, and copy the value. Oh no. It takes *ownership* of the entire thing.

Ok, so you give it a static char * every time then? Bzzzt. No no no, this in fact causes major problems and odd slowdowns on some archs (HP-UX 11.0). It almost seems like each putenv pushed more onto an internal stack, it was odd, I can't imagine why it would be implemented like that. I digress. In fact the only reason to use putenv is to make sure that your static char * is the one being used(and therefore a goodly size). Yes thats right gentlepeople, you have to give away ownership of the memory. You can't get it back, so its damn lucky that relying on the OS to free memory at process termination isn't a capital crime.

Which brings us to getenv. Lovely function. Now I've explained putenv it won't surprise you to know that it returns you a char * to the first character after the = in the original string. Consequently, gentle reader, it is not possible to determine if the buffer in use is yours or not by a simple pointer comparison. You need to compare the original + [= offset] to the pointer returned by getenv.

All of this is documented to a greater or lesser extent, depending on your architecture of choice. Its certainly non-obvious on some, and the resulting problems from any misunderstanding are assured to be odd. We had *performance* problems from this...of all things.


Nearly all the timezone code in the unix world is descended from one point, apparently, (ftp://elsie.nci.nih.gov/pub/). Its very...interesting code. HP-UX don't use it. It doesn't seem to cope with the simple TZ case code either, which isn't immediately apparent. Regardless of this there is an implicit problem that changing the TZ will result in a new call to tzset (eventually). tzset is always implemented as going and searching the filesystem for that TZ and reading the appropriate file. Even on HP-UX where thay have one file (tztab), they don't cache the thing, they reread it. Every single time. If you're doing much TZ conversion then this becomes a little bit of an issue, if you can find a way to determine that you're doing 10 filesystem reads per TZ change...

2 performance problems in one bit of code. Stunning. I've glossed over the problems caused by the stupid halfarsed '80s API design of the posix time calls that use static buffers. All good and easy to remember until you call someone elses function that does any interesting time calls in the course of its business. Like debug output, in my case.


I'm embedding the posix TZ information in a file with a lookup function and some time calls that use it and we'll ship the fucker. TZ can't actually be fully described by this, but it'll be good enough for our purposes. I don't even want to approach the mire of TZ=[Three to Four different syntaxes, with optional bits, and some undefined arch-dependent syntax].


current mood: aggravated

(1 comment | comment on this)

Thursday, August 28th, 2003
9:12 pm - Muzakerl, part two
Well it broke. Lacking a remote maint. connection I'm not quite sure what the person did, but its apparently not playing music and not serving pages.

I'm very suspicious that they've stuff their windows box, rather than it not serving though. Thats probably just my rampant optimism acting up again though.

So now they've gone quiet, and are not replying to emails, which I consider to be a ploy so they can reformat the drive and install windows/winamp. So I'll get it shipped back here (sigh) and fixerise it.

(1 comment | comment on this)

Thursday, August 14th, 2003
3:49 pm - Muzakerl, part one
Doing software for friends and family is sort of like lending them money. It sets a precendent, they come back to you all the time expecting support, and its generally a pain. But rewarding, yes rewarding, I must remember rewarding. I'm at the point of "just ship the fucker" now.

I've been doing this jukebox in Erlang and Yaws for my brothers background music co. Its called muzakerl, which is a name thats probably going to get me sued so I'll change it to something better (eventually). Its been running really well, the only problems I've had are around:

1. The Xitel HiFi Link AN-1 (a usb audio output device). Not offically supported by linux, and by alsa even less, but I've got it mostly working. Quite a few "usb_control/bulk_msg: timeout" messages come out (kernel 2.4.20), and I've not monitored the output properly to see if they coincide with stuttering output.
2. There is 1 command line MP3 player in the world that supports the auto-levelling mp3 tags(RVA2 - Relative Volume Adjustment). And its almost impossible to find. Madplay, closely related to madlib, and it kicks mpg123 and mpg321's arse. You get the tags into the mp3 using Normalize which seems to also be the only thing on the planet that *can* insert the tags.
3. If playing fails, because the device is not there/not accepting or some other reason, it spins like a wild crazy thing. So I'll have to build in some sort of dynamic delay which decays away on success or something. Grrr.

Its super-robust apart from those little issues. On a Duron 1300 it uses 1.3% cpu to play the mp3's, and would be lower if I could be bothered downsampling them.

Off to Nelson tomorrow for a long weekend. The break from work will be nice, even if I spend all weekend setting up this system.

current mood: contemplative

(5 comments | comment on this)

Wednesday, August 13th, 2003
5:34 pm - Random bits'o'TVP
ETYMOLOGY: From the phrase the real Simon Pure, after Simon Pure, a character in the play A Bold Stroke for a Wife, by Susannah Centlivre (1669?1723).

(after discovering the game from a link on slashdot, I think, and forwarding it to Damian many many times, he responded with this):

In other news my level of emacsese has increased to the point where I compile and debug inside it. I should have progressed to this point ages ago. A major step forward, esp. now its bound to F9 (blame microsoft quick-c exposure at an early age).

I still seem to have a problem with emacs "forgetting" how to format files if they are reloaded from the saved desktop, which is annoying, but I will file that investigation until I get some more time.

The Windows RPC-DCOM worm has been hammering poor Pedro all night, but he limps along still. I suspect he will die soon, its 10+ years old now. It would be nice to have the justification to replace him with a small Linksys router, but I do wonder if one of those would just end up frustrating me. Certainly the replacement will be quiet and non-dust-bunny-sucky.

I'm ignoring heaps of Aplio email. Which is crap, but being tech support for them takes time I don't have at the moment.

Accelerating a product roadmap, which means releasing product versions before they were officially due, basically craps all over everyones plans because unless you can delay the subsequent release you've permanently altered all the schedules. This wasn't initially clear to me, but now I've got a basis to stop this happening in the future. It also impacts the TCO guarantees we've contracted to by increasing the number of releases in a 5 year timeframe.

(comment on this)

Tuesday, August 12th, 2003
4:54 pm - Rules for C++ config reading
If you're going to make objects read or accept config then the following best practice applies...

* Make it reconfigurable at the outset
Its much easier than retrofitting it later, honest.

* Make the ctor use the reconfigure method directly
Then the code is concentrated in one place, and the reconfigure method is generally more complicated as its dealing with existing state. I've had so many leaks/cores from forgetting or overdeleting objects after adding in reconfiguration. Do it right the first time.

* Make the reconfigure method only start using the new config if it was all sane and relevant
It makes sense that it shouldn't crash, as thats the whole intent of making it runtime reconfigurable. This is really hard to do later, as by that stage you've got config you can't copy/move around from the temp space to permanent privates.

* Make the reconfigure method throw a specific bad config exception
This caters for the bad config for the ctor and reconfigure cases. In the ctor case the parent can die after printing/rethrowing the nice error message you've put in the exception), in the reconfigure case the parent can print/syslog/rethrow and continue running. You're giving the parent the choice, at least, about what is most appropriate

current mood: Pensive...

(comment on this)

10:34 am - Spellcheckerisms
Fitzsimons - Unknown Word, suggestions:

After a lifetime of correcting the spelling of my surname I'm allowed to gain some amusement from this.

On a miore practical note, you'd think that the configured username and company would be passed into the spell checker automatically. I know the guy writing the spell checker wasn't the same guy who wrote the prefs, but really, this is a known problem. I'm sure there is Risks posts back to the '80s about this.

current mood: Jovial

(comment on this)

Monday, August 11th, 2003
11:18 pm - Well, here we are then
Passe it may be, but here's my little space to rant and rave about the world in general, and my life specifically.

(comment on this)

> top of page