November 19, 2020
How we built developer efficiency at Plaid
Updated on November 19, 2020
For the longest time, Plaid was a company with just a few engineers primarily focusing on the product and execution. Everyone was working on the product to create as much of a direct customer impact as possible. As we grew to hundreds of engineers: teams depend on each other to ship new features and products and ownership of products sometimes becomes murky. In this process, some products and systems became abandoned, and the creation of different teams with varying needs stretches the limits of existing systems. Responsibilities such as local development environment, testing infrastructure, and shared libraries that were implicitly owned by all engineers are suddenly not owned by anyone. Fintech is a fast-paced industry; therefore, it’s incredibly important for us to invest in internal tooling to ensure our engineers have a seamless development experience.
Enter the Developer Efficiency team! The Developer Efficiency team was officially launched in July 2020 with the express purpose of improving developers’ iteration, observability, and foundational experiences at Plaid. Specifically, we build internal products and focus on common workflows and processes.
Building The Team:
While the need for a team dedicated to tackling developer efficiency problems was always there, we didn’t officially begin the planning process for such a team to exist until 2020. Few things happened.
Engineering pulse survey for the end of 2019 showed that our internal tooling and technical foundations were two large areas of improvements.
Engineering velocity was highlighted as one of the top 3 challenges to tackle for 2020.
We first gathered inputs and enumerated the problems impacting developer efficiency. Throughout the process, we determined that no existing teams were well-positioned to address these problems. There were two eligible teams - Core Services, responsible for Plaid's core microservices and data model abstractions and Infrastructure, responsible for Plaid's low level infrastructures. We decided that Developer Efficiency positions itself between these two teams. With these inputs we kicked off the launch, to have it fully staffed by July.
Upon announcing the creation of this team, we put out a company-wide call for engineers. We wanted to ensure that engineers across both our NYC and SF HQ locations were adequately represented on the Developer Efficiency team; we resisted the temptation of focusing too much on one HQ location. We also wanted to make sure that we had “good coverage” in the sense that our team members came from a variety of existing teams. After assembling the team, the first thing we worked on was drafting our mission, principles, and responsibilities. As of now, the Developer Efficiency team spans four time zones and is composed of seven engineers, most of whom had firsthand experience with the many issues and pain points we initially set out to fix.
The first thing we did as a newly formed team was to come up with a governing mission statement and set of principles. This allowed the newly formed team to align on direction and responsibility areas:
Mission: DEVELOPER PRODUCTIVITY MULTIPLIER
Principles: We build internal products, we focus on common workflows, and we ship processes and guidelines along with tools.
Identifying Focus Areas & Priorities
We operate by prioritizing what will have the most direct, large impact upon engineers at Plaid. That influences everything else. A lot of inherited systems were poorly maintained or simply not maintained at all. We don’t want to be overwhelmed by trying to fix everything at the same time, so we surveyed the landscape of things to do and took a holistic, big-picture approach to prioritization with an emphasis on improving common workflows.
As I mentioned above, we also have three overarching focus areas: iteration, observability, and foundations.
The iteration focus area concerns speeding up the process by which developers make changes in their local development environments and submit those changes for feedback. These include CI speed, test flakiness, and robustness of local development environments. Fast CI runs - under 10 minutes - allow developers to focus without context switching. Flaky tests break this flow. Developer Efficiency is not responsible for fixing flaky tests, but to monitor and triage them on behalf of the engineering organization.
One tool worth mentioning is Devenv, a local development environment powered by Docker. Developers use Devenv to start their own version of Plaid on their laptops. We've been iterating on the second version of Devenv, named Remote Devenv, which utilizes remote EC2 instances to offload heavy tasks.
The observability focus area is all about making sure that engineers have the ability to clearly understand our systems. This includes being able to monitor server performance, request per second, as well as the kinds of requests that are failing, being alerted when errors arise, and having visibility over the entire system when things go wrong. We currently use multiple tools including ElasticSearch, Grafana, Prometheus, Lightstep, Sentry, and Pagerduty. However, these tools operate independently instead of providing a cohesive experience.
High level architecture of the Plaid's observability stack
Finally, the foundations focus area includes taking ownership of programming language support, libraries that have historically not been owned by any particular engineering team. Our programming language support extends to Go, Node, and Python, as well as core libraries such as GRPC, OpenTracing abstractions, and feature flagging. The landscape becomes more complex as we consider the cartesian product of interactions.
We currently have the capacity to handle about 1.5 of our focus areas. As of now, we’re concentrating on iteration and foundational tools. We’re especially focused on doubling down on our support of Python, which is used by multiple teams at Plaid, with the goal of making Python one of Plaid’s primary programming language.
Long Term Goals
We have several long term goals. Currently, we break problems into three stages: possibility, speed, and elegance.
First, it must be possible for us to build a given system; next, we’ll focus on ensuring developers are able to utilize that system efficiently; and lastly, we’ll focus on providing an elegant way for developers to leverage that system. Internal tools and systems at many companies can be of low quality because they are only consumed internally. In keeping with our principles, we want to treat Plaid’s developers as true customers and provide them with a polished experience. It will take time for us to incorporate all three aspects for everything our team works on. Right now, we are at the tip of the iceberg. We’re keen on standardizing existing systems while allowing these systems to remain flexible enough to cater to a variety of teams. Additionally, we are fully aware that developers will want to use new tools and processes that diverge from our standard repertoire, and we will do our best to incorporate these new tools and processes into our workflow.
We’re very excited to continue improving our internal tooling to make it much more efficient and easy for engineers at Plaid to build meaningful products for our customers.