cloud-mirror – Platform Engineering Operations Project of the Month

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

cloud-mirror – Platform Engineering Operations Project of the Month

John Ford-8
Hello from Platform Engineering Operations! Once a month we highlight one
of our projects to help the Mozilla community discover a useful tool or an
interesting contribution opportunity. This month's project is our

The cloud-mirror is something that we've written to reduce costs and time
of inter-region S3 transfers. Cloud-mirror was designed for use in the
Taskcluster system, but is possible to run independently. Taskcluster,
which is the new automation environment for Mozilla, can support passing
artifacts between dependent tasks. An example of this is that when we do a
build, we want to make the binaries available to the test machines. We
originally hosted all of our artifacts in a single AWS region. This meant
that every time a test was done in a region outside of the main region, we
would incur an inter-region transfer for each test run. This is expensive
and slow compared to in-region transfers.

We decided that a better idea would be to transfer the data from the main
region to the other regions the first time it was requested in that region
and then have all subsequent requests be inside of the region. This means
that for the small overhead of an extra in-region copy of the file, we lose
the cost and time overhead of doing inter-region transfers every single

Here's an example. We use us-west-2 as our main region for storing
artifacts. A test machine in eu-central-1 requires "firefox-50.tar.bz2" for
use in a test. The test machine in eu-central-1 will ask cloud mirror for
this file. Since this is the first test to request this artifact in
eu-central-1, cloud mirror will first copy "firefox-50.tar.bz2" into
eu-central-1 then redirect to the copy of that file in eu-central-1. The
second test machine in eu-central-1 will then ask for a copy of
"firefox-50.tar.bz2" and because it's already in the region, the cloud
mirror will immediately redirect to the eu-central-1 copy.

We expire artifacts from the destination regions so that we don't incur too
high storage costs. We also use a redis cache configured to expire keys
which have been used least recently first. Cloud mirror is written with
Node 5 and uses Redis for storage. We use the upstream aws-sdk library for
doing our S3 operations.

We're in the process of deploying this system to replace our original
implementation called 's3-copy-proxy'. This earlier version was a much
simpler version of this idea which we've been using in production. One of
the main reasons for the rewrite was to be able to abstract the core
concepts to allow anyone to write a backend for their storage type as well
as being able to support more aws regions and move towards a completely
HTTPS based chain.

If this is a project that's interesting to you, we have lots of ways that
you could contribute! Here are some:

   - switch polling for pending copy operations to use redis's pub/sub
   - write an Azure or GCE storage backend
   - Modify the API to determine which cloud storage pool a request should
   be redirected to instead of having to encode that into the route
   - Write a localhost storage backend for testing that serves content on

If you have any ideas or find some bugs in this system, please open an issue For the time being, you
will need to have an AWS account to run our integration tests (`npm test`).
We would love to have a storage backend that allows running the non-service
specific portions of the system without any extra permissions.

If you're interested in contributing, please ping me (jhford) in
#taskcluster on

For more information about all Platform Ops projects, visit our wiki. If
you're interested in helping out, has
resources for getting started.
dev-quality mailing list
[hidden email]