I've noticed something exceptionally stupid:
djb2 hash function.
http://www.cse.yorku.ca/~oz/hash.html
unsigned long
hash(unsigned char *str)
{
unsigned long hash = 5381;
int c;
while (c = *str++)
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
return hash;
}
and more popular version which has
hash = ((hash << 5) + hash) ^ c;
(The latter one is more prevalent. There was a topcoder match about finding collisions in latter one.)
The function itself is not all that stupid.
What's stupid is that if you search for djb2 on google, you see all sorts of people 'personally recommending' it as best and fastest simple hash, people trying to explain why it is good (when the answer is: it is NOT good), people wondering why 5381 is better (it's not), people tracking the history of this "excellent" function, naive people whom trust this sort of advice using it in various software (ruby's hash tables?), etc. All in all people presuming that 5381 and 33 got some special significance and are much better than e.g. 0 and 31.
What is so stupid about it? For starters, even though the output of this function is 32 bits, not even for the 2 char ASCII strings do you have a guarantee for lack of collisions. In fact "SV" collides with "Pt", and "g5" collides with "as" [in the second version that uses xor], just to name two examples out of hundreds. Each character except first provides only about 5 bits because that's how much you get out with *33. That's not good. That's complete crap. From a 32-bit hash, you can expect to get no collisions at all between 2 character strings, especially restricted to lowercase or uppercase alphanumeric. Most primes work no worse; you can use 257 and then your function at least will not collide on 2-character strings (it will still be crap though, especially if you use parts of hash; this doesn't need to be a prime, only needs to be odd and you ought to run code to select best for hashing some real data like list of all file names if you want a good number, not look on the internet and judge by people's reputation). Furthermore, there are a lot of collisions between strings that differ by 2 characters, because 2 consecutive characters can be altered to keep same 'hash'. Got to give some credit though. In some very limited original usage (hash table of specific size, with specific key statistics, e.g. English words), which I do not know, and which you are highly unlikely to replicate, it may have been good. Or actually, not good. Merely not too bad.
What is the significance of 5381 ? Apart from low 8 bits of 5381*33 (in the variation which has xor instead of add), it is pretty much totally irrelevant to collision resistance, it is just multiplied by 33
n and added in. This function is pretty much as crap with start value of 5381 as with start value of 42 or 100 or 12345 - the only difference is that unexplained 5381 hints at some deep wisdom whereas 12345 does not.
Now the structure. The new version of that hash function has multiply and xor. This function sucks as much with xor as it does with addition, but with xor it looks better and harder to analyze than with addition.
Moral of the story:
Do not trust magical looking code. 99.9% of magical looking code out there is utter shit, only obfuscated enough that you can't immediately recognize that it is shit, and most of the time 'smart' looking constants are also really shitty choices or are at best random choices (at the VERY best, magical constants could have been selected for some very particular case which you know nothing about, by a method which you know nothing about, and are still most likely than not bad for whatever you want to do).
Do not trust internet advice or consensus either. Keep in mind that majority of acclaimed programming experts are expert only on one thing: persistent bullshitting to obtain 'expert' status and sell some books - they're totally mediocre programmers and software engineers (doesn't apply in this case 'cause qmail seems kind of good, but nonetheless).
Especially the old usenet celebrities. Not even if they have a wiki page about themselves, not even those whom somehow got PhD. Also, keep in mind that majority of people in 'consensus' are simply repeating each other, like parrots, and haven't devoted much brain time to thinking about the question.
This is why science does not and cannot function by reference to authority, but only by reference to argument, to actual reasons, and why if no reasons are given you shouldn't assume that any exist.
edit: also, don't even get me started on "fast". If you want fast, you'd better do 4 chars at once, on 32-bit machine.
edit: clarified on the version with + and version with ^, even though those have very similar properties.