December 24, 2006

Why use symbols in Ruby?

I ran a test where I did the following:
- created 100,000 random 32 character strings
- choose 1,000 of those strings randomly
- found each of the 1,000 strings in the 100,000 string array

I did this using strings, then the same exact string converted into symbols to see how the performance compares for symbol matching & string matching. Here are the results on my laptop:

Test String Time = 38.2969999313354
Test Symbol Time = 26.4219999313354

The results are independent of the ordering of the two operations and are fairly consistent. It is interesting how non-linear this is with the length of the string:

2 char/10k test set:
Test String Time = 1.84299993515015
Test Symbol Time = 1.35999989509583

8 char/10k test set:
Test String Time = 3.20300006866455
Test Symbol Time = 2.67199993133545

32 char/10k test set:
Test String Time = 3.15700006484985
Test Symbol Time = 2.73399996757507

256 char/10k test set:
Test String Time = 3.28099989891052
Test Symbol Time = 2.6560001373291

Yes, comparing 256 character strings took LESS time than comparing 32 character strings. Clearly there is some form of indexing in play here.

November 21, 2006

Webscale Papers

I had a nice chat with an old friend tonight about Snapvine & recent stuff he's come across at MSR. This inspired me to go look up a few really interesting papers from MSR & google labs. If you haven't had a chance to look at the papers on google labs, check it out: http://labs.google.com/papers/

Of particular interest are GFS, Chubby & BigTable.

Paxos algorithm for electing a master in a fault tolerant system
http://research.microsoft.com/users/lamport/pubs/paxos-simple.pdf

This is all quite far beyond anything we need to implement at Snapvine, but it's interesting reading about how you go from 100s of servers to 10,000s of servers.