Memcache++ Updates (New Releases!)
The past week has been a big open source week for me which started from the weekend that spilled into the week. I've been updating and improving the implementation of the Memcache++ Client which I've released the first version of (0.9) last February 2008. One full year and some months afterwards I got an email from a former client who asked me if I can improve the client to handle newer memcache protocols that I first didn't need, and some older ones which he does need. Needless to say, I went at it and the product of which is a cleaner implementation that also supports more commands than it used to.
Before I go dive into the details of the implementation, let me first give you a history on the Memcache protocol, why it's important, and the project that led to the development (and release) of the memcache++ client library.
History
So back in February 2007, I started work with one of the largest social networking sites in the world. While there, I found myself in the middle of a project that was meant to increase the performance and scalability of an internal service that was crucial to the proper functioning of the site. One of the approaches taken was to leverage memcached to store frequently accessed and seldom changing data.
Without going through the whole project's history, we first used libmemcache which was the easiest thing to do at the time. It worked for a while, until we saw that there were some issues with the library because it made too many connections to the memcached server and this degraded performance over time. This may have been addressed with tuning the server on which the software we wrote (which was in C++) was running on, but then that meant operational overhead if we added machines or later encountered a spike that causes the number of connections to balloon to an insane number. While it worked, it was not the scalable solution that we were looking for.
For a while we tried to stick with it, but performance and stability tests kept complaining that over time the application (because it opened and closed lots of connections to the memcached servers) became unstable and performance numbers would suffer. This meant we either had to abandon the memcached idea altogether or we just write our own library to access the memcached servers from scratch. So we tried the writing our own library approach, and we did it from the ground-up with C++ forgoing the C implementation.
After a month or so of heavy development and integration work, we were finally able to meet the performance requirements of the internal service. Not only that, we were also able to make the library a separate component that just did what we needed -- access memcached, pool a set of connections to many servers, and re-use long-lived connections instead of opening and closing new sockets everytime we need to issue requests. The performance numbers were impressive and so we were happy that we got to where we needed.
Open Source
Then the idea came to mind: why don't we open source this library that we worked so hard to implement internally to get us to the point where we wanted to be (in a position to succeed) -- after all, memcached was instrumental to the success of other sites and was open source, why not the client library? Besides, it's one way for us to give back to the open source community on which a lot of the software we've written is running on.
So after a short discussion with the powers that be, the library was open sourced on January 2008 -- the project was registered on Sourceforge and the first release was uploaded on February 2008. This was met with some buzz, but apparently there aren't a lot of C++ projects that required the use of a memcache client out in the open. It seems most of the web projects that needed memcache were written in languages other than C++ -- hardly disheartening, because just a short few days later some hits came through and there were some users out there who were thinking about using the client in their C++ applications too.
Why Memcache?
So is memcached just for the web? I'd say hardly so. Let me give you a few examples where you'd want to be able to use memcache in your non-web (C++) applications:
- Multiple Reads, Minimal Writes -- if you have an application that does a huge number of selects from a database from multiple sources (maybe different threads or different machines) and the data being queried rarely changes or is not time-critical data, then that means you can safely cache this either in your application or to an external cache provider. This external cache provider can be memcached and since you can store practically anything in memcache, you can definitely leverage this and alleviate your database from the strain of multiple readers and infrequent writers.
- Shared Data -- Say you have an application which appends information to a queue or a file and you'd like to be able to share this across many instances of your application, you might think about putting this data straight to a database and then querying the database anytime you need that data. The problem here is that your bottleneck becomes the Database and it's pretty hard to fix that bottleneck if you rely on it for your shared data needs. Enter memcached which supports "append" and "prepend" operations -- so instead of just writing to the DB, you can also append or prepend the data to a key in memcached and just query that across instances of your application.
- Atomic Counters -- Sometimes you might want to keep just a count of the number of times something is done (say the number of times a particular piece of data is accessed or modified). Typically you'll use a DB too so that you can query this information anytime you need it -- problem is that sometimes operations where you do COUNT() on primary keys from a DB would be resource-intensive especially if you have a large table. Enter memcached's 'incr' and 'decr' operations where you can set a key to be '0' and merely increment that key when a piece of data is accessed or modified. You can then query this information as often as you like alleviating the strain this introduces to your database.
There are all sorts of places in Enterprise or High Performance applications where you may opt to cache data and usually this is where memcached would be a good fit. Of course, if you just keep in mind that the data is just cache data and data you can afford to lose, then it fits perfectly into a solution where you can access the cache first (if it's not there, then go to the canonical source -- usually a database server).
The Client
Memcached is the server which maintains a hash map in memory -- mapping keys to values. Values can be anything, and keys can be strings that don't have control characters or spaces in it. It can be a UUID, an aggregated string, a random value, an MD5 hash, or just any string you want to associate data to. Although there's a lot of cool technology in the memcached server, the client we've released (and now I maintain) leverages this server-side technology by offering a fast and stable interface to setting and retrieving the data.
The Memcache++ client offers the following features:
- Support for memcached's 'get', 'set', 'add', 'replace', 'append', and 'prepend' commands.
- Supports redundancy pools where you can associate more than one memcached server to a pool identifier and when getting data from this pool you try all the members of the pool to retrieve the data; when setting data on this pool, all 'set', 'add', 'replace', 'append', and 'prepend' operations are mirrored across members of the pool for redundancy.
- Uses persistent connections per memcache client handle: using one handle per thread allows you to maximize multi-core machines without having to synchronize operations on a single handle.
The latest release (came out last Wednesday, August 26, 2009) is version 0.11.1 which supports all the above features mentioned. In the following days I may release version 0.11.2 which adds a little more polish to the implementation.
These releases are just gearing up for release 1.0 which should support the "compare and set" (cas) operation, multiple get support, proper 'incr' and 'decr' support, among other performance and implementation enhancements.
Try it Out!
So if you have a C++ application that you think would greatly benefit from a separate cache provider like memcached, try out the library and let me know what you think! Hopefully it allows you to achieve performance and stability too like it allowed us to have back then -- and up until now.
You can get the source code through two means:
- Download the latest from Sourceforge.net.
- Checkout the Git repository: git://memcachepp.git.sourceforge.net/gitroot/memcachepp/memcachepp



3 comments:
can u see any reason on coding in java for high performance servers and why?
can u see any reason on coding in java for high performance servers and why?
Hi CodeMonkey -- I personally don't use Java for the work that I do. I would rather not comment on whether I see any reason for coding in Java for high performance servers because the choice of what programming language to use depends on the many different things you consider when doing a software project.
For high performance servers I would like to think compiled to machine code languages (like C, C++, Haskell, D, etc.) may be better choices if you're concerned about just performance. Software is rarely about just performance and it usually means maintainability and what your team members know how to use. If you were starting from scratch and any language would do, I'd say a safe bet would be to prototype first in a high level programming language like Python then see if Erlang, C++, or other languages will give you better performance.
Post a Comment