Hacker Newsnew | past | comments | ask | show | jobs | submit | bmatheny's commentslogin

This screenshot was taken on my dev box which has plenty of other activity happening. On a production memcache box, seeing 1Gb/s, I see ~2% packet loss.


Since I wrote memkeys, maybe I can clarify a few things.

First, dropping packets matters. If you see only 30-40% of your traffic you can't guarantee that you have enough data to know what your hot keys actually are. This is especially true when you are interested in (for instance) sorting keys by bandwidth usage. You might have a key that gets half as many hits as the hottest key but is 4x the size and causing network link saturation. In this case, depending on how much data you're able to capture, you may or may not even see this data point. Also, the follow-up comment from corresation about patching memcache doesn't make sense to me.

Second, this was no 'jab' at etsy. I know the etsy guys incredibly well and we're all friends. We've collaborated on work in more than one occasion. The jab comment seems like unnecessary speculation. The comment about seeing how memkeys affects performance is of course spot on. In this case, one thread will peg a CPU core for packet capture but besides that will not be CPU intensive. Since it uses packet capture, memkeys doesn't actually interact with memcached directly so the impact should be minimal. We used it at Tumblr.

Third, fixing the packet loss issue in mctop wasn't feasible as the problem is with ruby-pcap not with mctop. Additionally, while Tumblr has plenty of ruby code in production we don't generally use it for building 'real-time' applications. There are better languages for the job.

I built memkeys because it solved a problem we had, and was fun. That's it.


I'm just going to point out that this site is hosted by wisegeek.com, a site that was hit hard by the panda update (see http://www.quantcast.com/wisegeek.com). Although the data is accurate, I wouldn't feel sorry for most of these sites.

Full disclosure. I used to work for ChaCha. I am no longer associated with the company, but in my time there I know a lot of time/effort/money went into producing original content.

An aside. Pandalized was using a domain proxy so I connected to port 80 via telnet which gave me back the following banner (which gave up the hostname): Apache/2.2.8 (Debian) DAV/2 SVN/1.4.2 PHP/5.2.5-3+lenny2 with Suhosin-Patch mod_ssl/2.2.8 OpenSSL/0.9.8g mod_perl/2.0.2 Perl/v5.8.8 Server at strongwiki.wisegeek.com Port 80


Interesting aside. I didn't even know domain proxies existed, although I'm sure that's common knowledge, and a fairly obvious thing to create.

Can you walk me through the way you discovered the hostname? If I `telnet pandalized.com 80`, I don't get anything interesting back.


Ah .. I got it. telnet in, then print some garbage, causing the server to respond with an error. This comes from the main Apache instance, rather than the individual virtualhost, and that has the hostname you mentioned. Clever.


I worked on Staircar so maybe I can answer a couple of questions.

Machine failures: failures we'll likely address in a future post but basically each redis instance can have a slave which can be failed over to.

Performance: Staircar isn't nearly as fast as redis, but the primary project goal was making the redis infrastructure opaque to any clients. The secondary goal was performance, which is still quite good.

Early Optimization: In a large infrastructure (thousands of machines, multiple data centers, etc) you have locality issues, slow clients, machine failures and other operational considerations. Again, redis performance is fine and we weren't concerned with that. We were interested in creating a high performance proxy to a pool of redis instances. Also, with Staircar we can make online changes to the redis pool without needing to notify clients of those changes.


Is the code open source or do you have plans to open source it. I am actually pretty interested in this and would love to be able to read the code for it. -Steve


Hey Steve, we do have plans to open source the code in two parts. The first part to likely be open sourced will be the high performance logging library that was written for it. After that, the proxy component. Keep an eye on the Tumblr engineering blog for a notification.


Thanks


I find it interesting that there are two distinct bands in the response time plots. Any ideas what the two represent?


The upper band essentially represents new users. At the current rate of growth, we push into a new redis shard every 2-3 days. When we first start writing/reading against that shard, it appears to take a bit of time to warm up.


Great documentation, like the previous commenter mentioned. Question, is this just a drupal installation or what?


Yeah, the documentation site is a Drupal install, with a few modules, some color changes and a new logo. The main Doculicious.com is all custom, but for the doco I didn't want to have to worry about building the site ... just wanted to get some content up there.

Thank you both for your comments too, I appreciate it.


Remember the underpants gnomes from South Park? Their plot was simple. 1. Steal underpants, 2. ???, 3. Profit. Most YC people have something similar going on: 1. Come up with great idea, 2. Get YC to fund it, 3. ???, 4. Profit, 5. Get a Girlfriend. You've already jumped to step 5, skip 1-4.


Found the girl early on in life. Way before college. lol. Now stuck with the girl. Love her to bits though so can't let go and move without her.


If you can afford it, UltraDns (Neustar) is the best. This is the service that Amazon uses, as well as a number of other very large sites. They have a 100% uptime SLA and a latency one as well. We haven't had an outage with them in almost two years. They also have an API which is nice. Oh, and they can do geo-distribution of requests, as well as a variety of other more advanced functions.


Last time I checked they charged per 1,000 DNS requests?


Citation?


The googlers I've talked to have said they use gmail and google docs internally.


I can also confirm that


Google uses pretty much all of their own products internally. Notably, the corporate version of gmail, and Google Calendar.

Here's a cite: http://blogoscoped.com/archive/2008-03-12-n39.html


Oracle is just 'allowing' you to use the product on EC2. You don't get utility licensing.

See http://blog.mobocracy.net/2008/09/oracle-does-not-enter-aws-...


We use The Planet (http://www.theplanet.com) along with EC2 from AWS. I've used The Planet for about 7 years now, for both business and personal use. Good prices, decent support and a top notch network.


Did the explosion change your opinion of theplanet?


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: