Learning Rust - Sharding StatsD Proxy · weekends are for leisure

TL;DR: I’m learning Rust, creating a sharding StatsD proxy as a learning project, and am creating YouTube videos of the process. See Relaxing With Code.

I’ve always loved low-level programming. The first major project I worked on as a young programmer was a cross-platform (Linux and Win32) peer-to-peer chat app written in C. It was so incredibly rewarding! Over the years I’ve found employment as a webapp developer and currently as a DevOps/Infrastructure Engineer, but coding “closer to the machine” has remained dear to me. I’ve had the chance to write a few client/server model node and Python3 apps for some personal projects over the years. It has kept my skills sharp, but I want more!

Given the above, it shouldn’t come as a surprise that Rust has been on my radar. I also have books on Embedded Linux and Linux Kernel Development sitting on my shelf. Those too are on my radar! For some day, hopefully in the near future. But I digress … Rust is a fascinating language, a joy to learn, and a common choice for low-level and embedded programming these days. Sold!

The best way for me to learn a new language or tool is to use it for a non-trivial project. Especially if the project solves a problem I’ve seen in my day-to-day. The first project idea I had – and a perfect fit for Rust – was a sharding StatsD proxy. Learning by building. What’s StatsD? Give me a couple paragraphs and then I’ll come back to Rust …

StatsD is a tool created by Etsy. Say you’ve created a relatively high-traffic webapp and you want to increase your visibility into how it operates for debugging purposes. You likely should emit metrics that can be graphed or alerted upon. Things like “user logins per minute” or “latency between microservice1 and microservice2” are examples of useful metrics to have. You can relatively easily add metrics emitting logic to your webapp code by using a StatsD library for your language. By default the metrics will be sent as UDP packets to a central StatsD server where they’ll be aggregated every 10 seconds. If you’re emitting a metric every time a user logs in, the central server sums those logins to arrive at the final login count for a given 10 second period. Ten seconds is the default “collection period” for StatsD, but that can be changed. After aggregation, the data can be flushed to a time-series database like Influx. To visualize and alert upon those metrics you may want to use something like Grafana.

Now consider what happens when you have 100,000 metrics messages going to a central server every minute. If the StatsD server takes too long to aggregate a slew of messages, you’ll have missing or inaccurate metrics data in your time-series database. Hence the need for a sharding proxy. It would serve as a central collection point in front of 2 or more StatsD servers, accepting messages and forwarding them on to a backend server. Sharding is necessary due to the way aggregations work. We’ve got to send all the “a user has logged in” messages to the same server so the aggregation is correct.

Ok, back to Rust and my learning project. Some of concepts that will be involved in creating a sharding StatsD proxy are:

UDP network programming
Regular Expressions for parsing StatsD messages
Sharding function implementation, which shards messages into N queues
Concurrently sending queued messages to the appropriate backing StatsD server as quickly as

You can follow along with my journey on my Relaxing With Code YouTube channel if you’d like. Here’s the Rust StatsD Proxy playlist.

Thanks for reading!