My first motivation for investigating redis was due to an optimization challenge at work. Our network of websites covers track and field and cross-country. We have a database of 13 million performance records, and naturally our site provides visitors with the ability to view up-to-date rankings. Currently we use Sphinx for this, but Sphinx is geared for full-text search, and was quite slow (we’ve since optimized sphinx further, which I hope to write about later).
Had the energy and focus to make more progress on my ideas for redis last night. See my previous post here if you don’t remember what those ideas were.
The only things left to do:
Fix freeSetObject() such that it frees intset and/or hash table memory that may have been allocated Fix code in rdb.c that’s responsible for loading the data from disk and writing to it when redis is shut down.
Had an idea for reducing the memory footprint of redis sets. A redis set may be encoded as an intset (if the set contains nothing but integers) or a hash table. But if a set contains integers and other values, you lose out on the benefit of intsets (speed and memory efficiency) and in some cases you have to wait while redis converts your intset to a hash table. This may not be a big penalty, but it all depends on the order in which you add items.
Pieter responded to my thread. He’s worked on compressed sorted sets recently, and while doing that he fixed the issue I was having with intsets. His code should be merged into 2.2 in a few weeks after enough people have tested. Apparently I’m the only one to report the problem with intsets.
His branch’s code is similar to my approach, in that he also extended zsetopsrc with type and encoding variables.
At Thursday night’s PDX Hackathon I worked more on the redis changes I wanted to make. I mostly completed them, and announced my progress on the mailing list that night. Seems no one was aware that zinterstore was broken for a set encoded as an “intset” (a more efficient way to store sets of integers). So yay, I found a bug! But now another programmer has alerted me he’s made progress in another area of the code that should be relevant to me, so I’ve got to check that out.
Spent another hour reading the redis source today, and I’m beginning to feel more enlightened. I’m thankful for the comment at the top of t_zset.c that describes how sorted sets are implemented with a hash and a skiplist. Weighing that knowledge against the t_set.c code for un-sorted sets, it’s now much clearer how the two types are alike and how they differ.
This is all in an effort to teach redis to zinterstore sets against sorted sets.
Yesterday, during lunch, I read through these blog posts that walk through the redis source code:
I won’t go into the details now, but while investigating redis for work I saw a use case where being able to intersect sorted and unsorted sets would be nice. Un-sorted elements would have a 0 score (even better if it were customizable). Redis doesn’t currently support this, so I hope to contribute by adding this feature.