SHARE

June 16, 2022

Scaling Plaid’s internal developer experience with a remote development environment

How Plaid’s Developer Efficiency Team improved engineering velocity by moving development to the cloud.    

By Arjun Puri, Jarrod Dunne, and Oleg Dashevskii

Plaid’s engineering team of 360+ engineers supports a developer customer base of over 6,000+ financial apps and services. Plaid engineering relies on its Developer Efficiency Team to build internal products that optimize common workflows, and increase internal velocity. Our team decided to create “Devenv”, an internal CLI tool backed by a remote environment to tackle several scaling challenges we were experiencing. This tool has significantly improved developer efficiency at Plaid over the last couple years. 

This article reviews our results and learnings including: 

  • domain-driven interfaces

  • cloud solutions to local performance problems

  • fine-tuning cloud workflows to achieve maximum iteration speed

The challenge: Plaid’s rapid growth required a developer environment upgrade

When Plaid was founded, its engineers developed locally using Vagrant to run a Linux VM on their work Macs. This was so slow and unreliable that when Docker for Mac became available, we chose to migrate all services to containers; this process was complete by early 2017. We chose Docker Compose for local development and CI because it had every service containerized and listed in a centralized configuration file. The most frequently used docker and docker-compose commands were automated using a Makefile.

Plaid’s engineers often faced challenges with these tools. Docker Desktop for Mac was slow and resource-intensive, plus running multiple service containers along with tests strained CPU and memory, making iteration painful. The Makefile-based interface wasn’t intuitive. The system could get so overloaded and difficult to control and debug that engineers sometimes had to purge Docker’s virtual disk and start from scratch, losing a great deal of time. It was also difficult to effectively write, test, and debug software while using multiple repositories with services written in three languages, using various data storage solutions. (MySQL, Mongo, Redis, memcached, Postgres are all used in different parts of the ecosystem). 

Stage 1: Defining the scope of Devenv

Our Developer Efficiency Team realized that we needed a more standardized way to manage the company's software stacks, repositories, and nearly 100 internal services. We decided to build Devenv, a crucial command-line interface (CLI) tool used to power various development workflows as well as test suites in our continuous integration (CI) environment. We also decided to shift many development workflows to the cloud to address performance issues.

Devenv allows developers to build, debug, and run services from the command line. More concretely, it allows developers to: 

  • setup and validate the local development environment

  • clone and update repositories

  • start, stop, and rebuild dockerized services

  • run unit and integration tests

  • perform lint checks

Stage 2: Introducing an interface

Plaid had to determine how to cohesively bundle its use cases and model them in a way that made workflows faster and more efficient. 

After several rounds of user groups, interviews, and experiments, our team settled on introducing a CLI aimed at giving developers a tool that abstracted away a lot of the underlying complexity of Docker, Make, language specific tooling, etc. We wanted to give developers an interface that closely modeled a typical development workflow: Repositories, Services and Groups of Services

Let’s talk briefly about the 3 primary command groups:

Repositories

The commands in this group aim to help developers in the first phase of shipping features: writing code. When writing code, you want a tight iteration loop with modules you’re working on. You want to make sure the code lives up to the hygiene expected by the linters and that programmatic contracts aren’t being broken during development by frequently running unit tests. Plaid wanted to facilitate cross repo development so the repository command group abstracts away the various language specific test / lint invocations and their respective flags and options into a single devenv repo unit-test <repo_name> –filter=<test_name>. Utilizing a Docker container provides a consistent environment and tooling with which tests and linters are run, removing the need to have local system libraries configured to support every repo/language.

Here are some example commands of the devenv repo command group:  

Services

The next step of development is usually turning source code into an executable and running it. At Plaid, these executables are almost always long-running services that serve gRPC / HTTP requests. The devenv service command group abstracts away various lifecycle controls (start, stop, reload) and debugging controls (process list, logs, shell access) for these services. On top of the simplicity, the service command group offers these features:

  • A simplified service state machine. Services can only be in two states: running or stopped. This contrasts Docker’s state machine of created container -> running container -> stopped container -> removed container.

  • A wrapper command called reload that represents a full iteration cycle (build artifact + stop service + start service with new artifact). Expose the ability to watch for changes and continuously reload the service.

  • Ability to auto generate boilerplate templates for a new service (docker-compose files, directories, port mappings, etc.).

  • Utilities to slice and dice service logs (such as filtering by log levels, outputting to JSON or raw text, querying by timestamp, tailing, etc.).

Groups

Groups are the final stage of development before the feedback loop continues again. They are a collection of services that form a particular subsystem at Plaid. They typically have an integration test suite associated with them that helps define the exact contracts of the system. Groups allow developers to test services within the context of a system. The functionality available to services is almost entirely available to groups as well (ex. pulling images for all services in the group, or starting / stopping a group, etc.).

High level architecture of initial Devenv implementation

This diagram below outlines what our initial Devenv implementation looked like. A simple Golang CLI with three command groups (service, repo and group) backed by various development technologies like Docker, git and language runtimes for Go, Python, and Typescript (used to power various services at Plaid).

Stage 3: Extending the Devenv Interface

Over time, the Devenv interface became a home for more development workflows. Engineers added tools to iterate on their documentation, to interface feature flags in the Devenv and tools to seed data models. Here are some examples:

Stage 4: Moving to the Remote Devenv in the cloud

As the various parts of the Devenv grew, it became apparent that it was time to shift users to the cloud. Running several services locally was no longer feasible from a performance standpoint. We needed to speed up build times, cut down Docker image pull times, and improve the reliability of the development experience (Docker for Mac was less reliable than the Linux counterpart). A standardized set of EC2 instances that our Developer Efficiency Team could control also enabled easier maintenance and debuggability. 

Our team leveraged Docker’s client/server model to offload the heavier “work” of the Devenv (like running service containers or building images) to the remote Docker daemon, saving CPU, memory and disk space. With this configuration, remote instances were in the same AWS regions which also allowed for much faster Docker image pull times.

Each engineer was provisioned an EC2 instance with a running Docker Daemon. The Devenv CLI running on a user's laptop acted as the client, with all of the heavy lifting done by the EC2 instance. The infrastructure was configured using Terraform, including "global" configuration (such as security groups, subnets, and routing tables) and per-user configuration (such as the EC2 instance and Route53 record).

After beta testing, the new 'Remote Devenv' was rolled out over several quarters. As of now (~1.5 years after initial development/rollout) 87% of users exclusively use the Remote Devenv and 97% of all commands are run on the Remote Devenv.

High level architecture of initial Remote Devenv implementation

This diagram below outlines what our initial Remote Devenv implementation looked like. We essentially took our local implementation and configured the CLI to interact a remote Docker daemon over HTTPS.

The beginning of Remote Devenv

Stage 5: Optimizing for the cloud

Our Developer Efficiency Team continued to optimize Remote Devenv, working to improve the iteration flow. We discovered that by using the remote instance's filesystem as a staging ground for changes, the Devenv could sync files over to the remote instance using rsync. Then the Docker context transferring could happen on the same machine running the Docker daemon. We also ran unit tests and linters on the remote machine in parallel to increase the speed, and decreased the latency for impure unit tests that needed to interact with databases.

Using the remote instance's file system allowed for volume mounts, which were a key part of iteration workflows for local Devenv. Some languages (i.e. Node) had good support for 'watch'-ing code for changes, and restarting the service on any code changes (i.e. Nodemon). By combining local watching to rsync code from a developer's laptop to the remote instance on file changes (using Watchman), and volume mounting from the remote instance into the Docker container, users could now get a remote service running the latest code within 15 seconds of saving a file. Engineers extended this behavior to recompile code and trigger service restarts in our other major languages (Go and Python).

Impressive KPIs

By investing in the interface first, Plaid was able to make a smooth implementation transition from local Docker to Remote Devenv and finally, to Remote Syncing Devenv. Currently, Devenv powers more than 150 services and is run >2,000 times a day by developers; CI does more than 200,000 daily Devenv invocations. It has become an everyday tool and a necessity. Code added there almost instantly appears on developers’ laptops and in CI instances.

Today, Devenv's future looks promising, as Plaid’s Developer Efficiency Team explores replacing dedicated EC2 instances with a shared Kubernetes cluster. This would bring the development environment even closer to production!

Arjun Puri, Jarrod Dunne, and Oleg Dashevskii are engineers on the Developer Efficiency Team at Plaid.