Mcrouter, the secret sauce of Facebook’s scalability

There’s a piece of technology that’s not discussed very often outside of platform engineering for the largest web apps, and I wanted to write this post to share about it more broadly and to give it more credit.

That piece of technology is called mcrouter. It was announced as many as 10 years ago from the moment I’m writing this. The original blog post gives a great overview of reasoning behind the project.

Like many other projects from Facebook/Meta, it is open source, and I was fortunate to use it at Shopify to become a big fan of it.

Why and what

Here’s a list of tricks that lots of us would do with traditional SQL databases:

  • Keeping a primary database and a replica (or a “follower”) for resiliency and backups
  • Utilizing database replica to serve extra read throughput
  • Keeping a replica in another region that is closer to customers to serve traffic from that region
  • Using a replica to shadow some traffic to run an experiment or validate new features

But we never do these with cache!

Mcrouter brings all of that and more to your memcache fleet.

Why is that important?

Imagine that you’d like to give more cache capacity to all tenants in your SaaS app. Normally, that would require provision of an extra caching cluster and various changes to the application logic to use different cluster depending on the tenant.

Mcrouter can keep caches in the passive region warm for you.

Mcrouter can make it that all cache keys that start with a prefix of tenant1: goes to one cluster, tenant2: goes to another, and the rest goes to the default cluster. What’s more, those clusters can be of different configuration and sizes.

If you’re running a cache cluster with N partitions and using consistent hashing based on the key to handle traffic across partitions, you might find yourself in a case where one partitions gets hotter than others.

mcrouter can make it that all writes are replicated to all partitions, and all reads are evenly balanced between partitions.

You can also think of mcrouter as an Nginx but for your cache.

Getting started

While mcrouter may not be the friendliest one to get started with and may not have a huge community, it’s core of Facebook/Meta’s scalability and it’s well maintained by a team there.

You might have to build your own Docker image with it and spend some time in its wiki to craft a config – but that’s well worth the effort for the scalability that it will unlock you.

Playground

I’ve had to debug a specific setup of mcrouter recently, and I’ve thought it might be good to share my local setup for ease of debugging for my readers and myself if I happen to need this again in a year from now.

For the start, let’s assemble a docker-compose.yml to have a small memcache cluster and mcrouter running.

version: '3.8'

services:
  memcache1:
    image: memcached:1.6.26
    command: memcached -vv
    ports:
      - "11211:11211"

  memcache2:
    image: memcached:1.6.26
    command: memcached -vv
    ports:
      - "11212:11211"

  memcache3:
    image: memcached:1.6.26
    command: memcached -vv
    ports:
      - "11213:11211"

  mcrouter:
    image: <your container with mcrouter>
    ports:
      - "11210:11211"
    volumes:
      - "./mcrouter_config:/usr/local/mcrouter/config"
    command:
      - "/usr/local/bin/mcrouter"
      - "--config-file=/usr/local/mcrouter/config/mcrouter_config.json"
      - "-p"
      - "11211"
      - "--num-proxies=1"
      - "--stats-logging-interval=0"
      - "--asynclog-disable"
      - --probe-timeout-initial=1000
      - --probe-timeout-max=15000
      - --reset-inactive-connection-interval=900000
      - --disable-miss-on-get-errors
      - --keepalive-count=6
      - --keepalive-interval=4
      - --keepalive-idle=30
      - --timeouts-until-tko=6
    depends_on:
      - memcache1
      - memcache2
      - memcache3

You may have noticed that I’m mounting the config as a volume. Here’s the content of that:

# mcrouter_config.json
{
  "pools": {
    "generic-pool": {
      "servers": [
        "memcache3:11211"
      ],
      "server_timeout": 150,
      "keep_routing_prefix": true
    },

    "larger-pool": {
      "servers": [
        "memcache1:11211",
        "memcache2:11211"
      ]
    }
  },
  "route": {
    "type": "PrefixSelectorRoute",
    "policies": {
      "tenants:1:": "PoolRoute|larger-pool",
      "tenants:": "PoolRoute|generic-pool"
    },
    "wildcard": "PoolRoute|generic-pool"
  }
}

This config defines two pools – generic-pool and larger-pool and makes it so all keys starting with tenants:1: go to larger-pool, while any other keys go to generic-pool.

mcrouter provides some useful meta commands like get __mcrouter__.route_handles(<op>,<key>) that let you look up how a specific cache key would be routed. You can think of it as EXPLAIN in MySQL. You can read more about meta commands in the Wiki.

Let’s get to a more interesting config and build a replicated cache setup.

{
  "pools": {
    "replicated-pool": {
      "servers": [
        "memcache1:11211",
        "memcache2:11211"
      ]
    },

    "generic": {
      "servers": [
        "memcache3:11211"
      ],
      "server_timeout": 150,
      "keep_routing_prefix": true
    }
  },
  "route": {
    "type": "PrefixSelectorRoute",
    "policies": {
      "generic:": "PoolRoute|generic",
      "replicated:": {
        "type": "OperationSelectorRoute",
        "default_policy": "PoolRoute|replicated-pool",
        "operation_policies": {
          "add": {
            "type": "AllInitialRoute",
            "children": [
              "Pool|replicated-pool"
            ]
          },
          "decr": {
            "type": "AllInitialRoute",
            "children": [
              "Pool|replicated-pool"
            ]
          },
          "delete": {
            "type": "AllInitialRoute",
            "children": [
              "Pool|replicated-pool"
            ]
          },
          "incr": {
            "type": "AllInitialRoute",
            "children": [
              "Pool|replicated-pool"
            ]
          },
          "get": {
            "type": "RandomRoute",
            "children": [
              "Pool|replicated-pool"
            ]
          },
          "gets": {
            "type": "RandomRoute",
            "children": [
              "Pool|replicated-pool"
            ]
          },
          "set": {
            "type": "AllInitialRoute",
            "children": [
              "Pool|replicated-pool"
            ]
          }
        }
      }
    },
    "wildcard": "PoolRoute|generic"
  }
}

With this config, all writes to keys that start with replicated: will be replicated to two memcache servers: memcache1:11211 and memcache2:11211. And all reads will be randomly distributed between those two, giving you more read throughput for especially hot cache keys.

And operations with keys that start with generic: will use a single server. It becomes super convenient to control that from the app.

Wrap up

mcrouter is a powerful tool that can significantly improve the scalability and flexibility of your memcache infrastructure. By leveraging its features, you can achieve better resiliency, increased read throughput, and improved performance for your applications. From keeping caches warm across regions to enabling different cache capacities for tenants in a SaaS app, mcrouter offers a wide range of benefits.

While getting started with mcrouter may require some initial setup and configuration, the effort is well worth it for the scalability and flexibility it unlocks. My quick config examples demonstrate how you can set up a local environment for debugging and experimentation, showcasing the ability to route keys based on prefixes, create replicated cache setups, and control operations at a granular level.

As a core component of Facebook/Meta's scalability strategy, mcrouter is a battle-tested and well-maintained tool.

I hope this post has shed some light on mcrouter and inspired you to explore its capabilities further.

Written in May 2024.
Kir Shatrov

Kir Shatrov helps businesses to grow by scaling the infrastructure. He writes about software, scalability and the ecosystem. Follow him on Twitter to get the latest updates.