speaker-page

Christine Spang

Christine went to MIT, dropped out of an operating systems graduate program to be an early engineer at Ksplice, and most recently cofounded Nylas, a startup building an email platform. When she's not building rock-solid infrastructure for the Internet or speaking around the world at conferences like DebConf and PyCon, rumour has it she can be found on cliff walls, remote trails, and dance floors. She lives in Oakland, California.

Speaker home page

Billions of Emails Synced with Python: How we built the Nylas Sync Engine

Scalable python, Intermediate
8/13/2017 | 11:15 AM-12:15 PM | House Canary Room

Description

The open source Nylas Sync Engine provides a RESTful API on top of a powerful email sync platform, making it easy to build messaging into apps. It’s built using Python and gevent and has scaled to sync billions of messages over its lifetime deployment. In this talk, we’ll show you how it’s built and what technical challenges we’ve solved along the way.

Abstract

Why a sync engine?

If you’ve ever tried to build anything that works with email, you’ll find that it’s a problem full of twisty corners: the protocols themselves are obtuse and require entire RFCs just to describe how to implement sync with them, if you want your integration to work with everyone’s email you face implementing several different protocols or flavours of protocols (IMAP with CONDSTORE, IMAP with no CONDSTORE, Gmail IMAP, Exchange Web Services, Exchange ActiveSync, Office365 REST) plus OAuth authentication for different providers, and once you’ve gotten data flowing you still need to handle parsing email, which involves a complex format known as MIME as well as pretty much every way of encoding non-ASCII text as ASCII that has ever been invented.

We’ve built a platform that layers a sync engine over 30 years of email history and allows developers to read and write to mailboxes and calendars using a modern REST API. It’s not just a simple proxy that makes calls to IMAP or Exchange behind the scenes: in order to meet the speed and reliability demands our customers require, when a user connects their email account to a developer’s app, our infrastructure syncs a copy of that mailbox and keeps it up-to-date as changes are made from that app or traditional web, mobile, and desktop email clients. This is a demanding technical challenge and wasn’t easy to build.

How a sync engine?

A semi-monolithic application composed of several services that all share a common database and a fair amount of code, but run on separate server fleets (email sync, api frontend, webhooks, etc.)

~90k lines of Python, including tests and migrations

MySQL: one sharded database and one global database

Major libraries we use: Flask, gevent, SQLAlchemy, pytest In production: haproxy, nginx+gunicorn w/gevent pywsgi adapter.

Technical challenges (so far!)

What are the major problems that we’ve solved?

  • A universal API across providers
  • Philosophy: whenever possible, unify the API across providers
  • We should allow developers to build one integration, not many
  • A few exceptions: Folders vs labels
  • Transactions, delta streaming & webhooks
  • Capturing mailbox changes & allowing apps to subscribe to them
  • For now: SQLAlchemy events & MySQL are the backbone
  • Error handling & retries using gevent
  • Wrapping greenlets to implement backoff
  • Saving & aggregating errors
  • Sharded data store
  • How we split data across multiple MySQL clusters
  • Performance instrumentation
  • Extensive custom instrumentation built on top of greenlets
  • Available for you to use: nylas-perftools
  • Load balancing
  • Mail accounts are heterogeneous: different protocols, sizes, rates of new mail receipt…
  • How to distribute load across a fleet of servers & keep them balanced?

The future

mypy, Python 3, Kafka, more flexible MySQL clusters, and beyond!