May 17, 2022

Sync: A new paradigm for building on transaction data

Nick Sundin

To build fintech apps, developers need up-to-date, cleanly categorized feeds of consumers' bank transactions data. Plaid originally introduced the Transactions API to address that need: using Transactions, developers can create clean feeds by maintaining and comparing their own dataset with the transaction data retrieved from Plaid.

Over the years, we’ve learned a lot from the feedback of developers who leverage Transactions. We listened to our customers and now we’re excited to introduce Sync, an upgraded way to help developers connect their apps to transaction information more quickly.

Sync gives developers direct access to the feeds of changes to transaction data, radically streamlining how developers access and work with transaction information while decreasing integration time with Plaid—making it easier to build better fintech apps.

In this post, we’ll discuss why we’re making such a large interface update, our development process, and what this new endpoint means going forward for Plaid and developers.

Transactions API: The standard since 2013

The original Plaid Transactions API’s mental model was simple and easy to understand: a virtual bank statement with a customizable time range, sorted by transaction date.

Over time a dominant integration pattern emerged: customers keep their own copy of a user’s transactions, with the addition of potential application data, and subsequently perform a merging process any time new results from Plaid become available. This local-copy pattern is very powerful since it gives developers the flexibility and performance of local data backed by a connection to Plaid, providing ongoing updates. To eliminate the need for developers to poll periodically, we also provided webhooks to alert them to fetch the virtual bank statements when updates are available.

This pattern of interacting with Plaid’s Transactions API has become the heart of fintech data ingress logic.

Local Copy Pattern

While the exact implementations of the “keep a copy of all transactions data” pattern has many variations, the basic algorithm roughly follows this pattern:

Fetch - Webhook (or timer) triggers a call to /transactions/get, fetching a statement of transactions for the past 2 weeks.
Reconcile - Developers then compare the latest few weeks of data from Plaid to their local database, removing transactions no longer present in the new statement, and adding new transactions present in the new statement, but not in the local database.

This pattern solves the majority of cases that our customers face: new transactions would be discovered and removed transactions are implied by their absence from the new responses.

However, developers have found edge cases that are not easily taken care of with this design pattern. An example of a common edge case:

On March 1, a consumer pays $40 for their dinner at a restaurant. They add a $10 tip to the receipt. The merchant charges the $40 meal to the consumer’s bank and it’s reflected on their bank statement and in Plaid’s API.
On March 20, the restaurant adds the $10 tip, updating the transaction total to $50.
Since this update happens on a transaction more than 2 weeks old, the new $50 amount is never reflected in the app.

Each customer needs to build complex logic to ensure consistency. It became apparent that this /transactions/get API interface could not easily support data consistency between Plaid and our customers. This can be quite a heavy lift for certain use cases.

Challenges

Implementation cost

Our customers routinely share their pain points with Plaid’s customer engineering team around building the Local Copy Pattern logic in-house. Of all the steps in integrating with Plaid, this is by far the most difficult and error-prone - and customers need to do this for most use-cases. To implement Transactions, developers need to:

Design database schema, queries, and algorithms to efficiently compare new and cached transactions (an error-prone and tedious process).
Build a high-availability webhook ingestion system.

However, this initial implementation cost is usually not sufficient. As customers scale, they generally need to build a consistency checking tool on top of this pipeline to build resilience when webhooks are dropped or the transactions change in unexpected ways.

Changing dates

When calling the Transactions API, customers specify two dates as boundaries (usually two weeks). For instance, if a developer wants to look for new additions and removals from May 1 to May 19, they’ll first fetch this range from Plaid, then fetch from their database, and find the difference between the Plaid new data and transactions inside their database.

This works as long as the dates remain relatively stable. However, sometimes financial institutions annotate a new date for a previously fetched transaction. In the example above, if a bank decides to re-annotate previously fetched transaction A from May 1st to April 30th, transaction A will no longer be contained in the Plaid provided date range. Most implementations remove transactions A from the local database in this case - but this would be omitting a valid translation from their local database.

Overextension of webhooks

While Plaid’s introduction of the TRANSACTIONS_REMOVED webhook provides a way for developers to delete local changes that are no longer presented bank-side, this webhook also introduces a possible consistency point-of-failure: when a developer misses a webhook, this means they are also missing the only opportunity to remove an erroneous transaction from their database.

Another limitation developers face is webhook size. For large batches of removals, the webhook itself grows very large, easily into the 4MB range. Many customers do not expect webhook responses this large, and some frameworks can have problems supporting unbounded webhook sizes.

A new mental model

Devising a solution

As we talked with more and more of our customers, we went back to the drawing board: what is the desired end-state for them? It turns out–for many of our power users, while they are implementing the “keep a copy of all transactions data” pattern described above–their goal is synchronization and consistency with Plaid data. Our customers’ measure of integration success is synchronized and consistent transaction data with Plaid.

As we interviewed developers a wishlist emerged and roughly converged on:

Integration-side fault tolerance (e.g. our customers' servers are offline for 1 hour, how do they re-fetch the right data?)
Plaid-side fault tolerance (e.g. if Plaid has a bug with their webhook or endpoint availability, how do we deal with it?)
Bank-side updates are available. How do we know and how can we easily patch this new information into our application data?

The a-ha moment: For many of our power users, their requirements fit very well with a synchronization paradigm - rather than a snapshot of transactions paradigm. We realized that we can provide synchronization-type solutions for these particular use cases - and that would help our customers build much more quickly and easily without worrying too much about edge cases. With this in mind, we began to prototype a new sync-based interface to deliver transaction data.

Designing the new API

Fortunately, synchronization is a mature and tested space with long-running research.

Contents of the updates

Plaid considered two major approaches to sending changes in our transactions database to our customers:

Diff-based

Plaid calculates the difference between the developer’s last acknowledged transaction data state and sends only the information they need to “catch up.” If multiple changes happen between developers checking, we compress these changes to the simplest patch needed.

Event-based

Plaid provides developers with a complete stack of every single change that we had received and internally reconciled from Financial Institutions. If multiple changes occurred on a single transaction, customers fetch all transformations from Plaid and apply them to bring their copy up-to-date with ours.

Decision

We ultimately choose the diff-based approach for three main reasons:

Simplicity: We want implementations to be as simple as possible for the majority of developers. Of the implementations we researched, all updates from Plaid ended in database storage of some sort on our customers’ side. Therefore, we want to deliver new changes in the easiest form for our customers to apply to their database. Based on our customer interviews, adds, removes, and modifies (the diff-based model) are easier for our customers than the event-based model, since event-based would force customers to write extra event compression code before they can store the updates.
Efficiency: Plaid-side changeset compression improves the efficiency and performance for developers running in a “batch” configuration. For instance, if a bank changed the description of a transaction 3 times in the past week, a weekly update job following the diff-based model will receive only the latest copy. This means that our customers do not have to worry about the outdated and irrelevant prior two changes - or have to write code to account for these edge cases.
Semantics: A diff-based model fits better with the granularity of data Plaid currently has access to. Many Financial Institutions do not provide detailed records of every historic transaction change through their APIs, so we can’t guarantee that our API will have this granularity either.

Decoupling webhook delivery from consistency

To overcome the consistency problem due to missed webhooks on our customers’ side, we have designed the Sync interface to have a new webhook that contains no transaction information, only to notify our customers that there is a new update. This allows Sync to get rid of consistency limitations while still providing the real-time benefits of webhooks to our customers.

Ordering and delivery time of a “notify” event is unimportant–only the time that customers call Sync to update their data matters. If customers missed a webhook, they will still get the changes when the next webhook fires, or whenever the developer decides to make a manual call to the Sync endpoint. Customers can easily “catch up” to any changes that they may have missed - for example: if they experienced an outage that coincides with webhook delivery. Sync allows both Plaid and our customers to be more resilient and consistent.

Building the API

Text is cheap, code is not

Before building the Sync, we caught up with a few of our customers who had given us feedback about Transactions API challenges that they faced. We showed them documentation for the not-yet-built Sync endpoint, and asked them to provide feedback. Specifically, we were looking to see if this would fix the integration problems they had told us about, figured out what was easy to understand and found improvements we could make to the design of Sync.

This feedback helped shape the API and led to its current form. One concrete example: While moving from four webhooks to one benefited the majority of customers from a simplicity standpoint, one customer mentioned missing the more granular completion status the separation of two of the webhooks provided (“historical” and “initial”, for those familiar with the Plaid), since they needed to trigger specific business logic based on how far along they are in Plaid’s data fetching process with financial institutions. To address this concern, we added additional status information to the new unified webhook.

Seeing is believing

After the initial API design review and feedback from our customers, Plaid internally built out a prototype of Sync in 4 months. In November 2021, we invited a few customers to try the beta version of Sync. The beta program and feedback from our customers every step of the way helped Plaid internally grow confidence and momentum in this product.

What Sync means for Plaid and developers

With the introduction of Sync, we have reduced many integration and data consistency pain points for our customers, making Plaid integrations shorter and simpler. Sync enables new developers to get started on Plaid more quickly while helping existing customers simplify their codebases. For example, with Sync, a developer can go from the very first line of code to a fully updating database of transactions in as little as a few days.

Plaid launched our API almost 10 years ago in order to help developers build transaction-driven applications. In doing so, we helped usher in a decade of fintech innovation. It has been satisfying to see how developers rely on platform companies like Plaid to provide the right mental models that enable our customers to build scalable and robust systems.

At the same time, since we have the privilege of seeing common developer pain points, we are anticipating the next few years of leading-edge changes that our model can (and must) support - and continuously working to make our platform better, more robust, more scalable, and easier to use. Today, we’re leading the fintech industry by introducing the new Sync pattern for financial data delivery. We believe this Sync mental model is the future, guided by customer feedback and the desire for faster, more consistent, and more scalable data systems.

We’re excited to see what folks are going to build on Sync! Our inbox is always open for feedback–don’t hesitate to send us a note: transactions-feedback@plaid.com.

If you’re interested in solving these types of problems at scale and helping to power the next generation of fintechs, click the button below.

Come work with us at Plaid

engineering

For consumers