Proton web’s journey from polyrepo to monorepo

Publicado el 18 de agosto de 2022

Last year at Proton, we migrated from a polyrepo architecture to a monorepo architecture to make it easier to manage the packages that are part of our front-end web application stack. We’d been facing problems for some time, and after considering our options, we decided that a monorepo would be the most suitable solution. This article explains the problems we faced with our polyrepo setup, explores the benefits of a monorepo setup, and describes our journey from polyrepo to monorepo.

Before going further, when I say “polyrepo” and “monorepo”, this is what I mean:

Polyrepo: A system of source code modules that have dependencies between each other but are separate version control repository instances.
Monorepo: A system of source code modules that have dependencies between each other but all live under a single version control repository instance.

I’m going to say “Git repositories” or just “repositories” instead of “version control repositories” from here on out. And, to be clear, Git is not a prerequisite of monorepo architecture.

The beginning

Proton started with an email client, Proton Mail(new window), as its sole application but has since evolved into a privacy provider offering a broad range of products, including web applications for Proton Mail, Proton Calendar(new window), Proton Drive(new window), and the Proton Account that links them all. Adding new applications to our stack has caused the number of Git repositories we maintain to grow proportionally, with one repository per application. However, we created repositories beyond the ones required for our applications. As you might imagine, our apps have to share the same functionality, look, and feel, even if they are different products. It follows that we used repositories for code that was shared between products.

As an example, we used to have a separate repository for shared React components. This was the result of a natural evolution of our existing systems. However, sharing code across codebases became increasingly complex as we added more applications and products, making it hard to manage packages under this multi-repository structure. There are several reasons this system didn’t scale well.

Our main issue with our polyrepo

During and after our transition to a monorepo, we started seeing how we could benefit from its architecture. However, one issue in particular — the unnecessary and wasteful replication of administrative tasks — drove us to look into this monorepo option in the first place. Whenever a feature implementation required changes across multiple projects to complete (e.g., adding a React component for a new feature inside the Proton Mail application), administrative tasks in Git were highly impractical to execute. To prepare a single feature, we had to mirror Git operations — branching, committing, opening merge requests, reviewing, rebasing, etc. — across many repositories.

We then came across the idea of “atomic changes”, which resonated with us, even if it represented a shift in our philosophy. The main idea behind atomic changes is that instead of having changes scoped to a technical concern of your project(s), you scope changes to their semantic group as chunks of modification to your product’s functionality. There’s no reason to split up changes that intrinsically affect our shared UI components and (for example) the Proton Mail application if they all address the same concern. Such semantically connected changes should be:

Grouped under the same change, diff, and commit
Reviewable simultaneously (not in two separate merge requests)
Revertible as one unit.

A monorepo allows us to achieve this as it naturally supports atomic changes in the form of Git commits.

In the polyrepo, testing code before accepting and merging it to the main branch also proved challenging, especially from an automation CI/CD point of view. Builds had to include versions of dependencies not on the main branch of their respective repository. Nonetheless, with some CI/CD hacking and trickery, we could get the job done, and it was possible to send features through the development lifecycle successfully.

We also weren’t using semver and registry hosting to version our packages (and still aren’t), which would have been one way to address some of these issues. However, semver would have been far from a silver bullet for our needs, and it comes with its own baggage, such as complexity around managing hosted packages, publishing them, and versioning them on consumption.

Polyrepo repository architecture has many other minor, inconvenient quirks given our needs. I’ll go into more of the problems we faced while discussing the advantages of our monorepo. For more context, our polyrepo architecture presented problems besides developer experience, including inherent technical issues. One tangible example was that we couldn’t perform rollbacks to previous versions on a cross-repository basis. If a new feature that affected multiple repositories was merged and then turned out to have an issue, it was challenging to perform rollbacks automatically as no single operation could perform a rollback on separate Git histories simultaneously.

These issues were slowly piling up, and it became apparent that we needed a solution. After some consideration, that solution turned out to be migrating to a monorepo architecture.

Weighing our options

With the decision to migrate locked in, we had to devise a plan.

At that time, we had about 15 developers on the Front-end team working on our web application stack. Additionally, many people from other teams, such as Crypto or Back-end, would also frequently contribute to our repositories. Having many people actively working on these repositories meant that the physical migration would need to happen fast, and the implementation would have to be robust once we were on the other side. Otherwise, we risked blocking our colleagues’ work for an extended period of time.

To ensure a robust implementation, we spent quite some time researching different tools and experimenting with proofs of concept. We would check how one option felt or if we could get it to behave as we wanted it to. We explored different package managers (specifically, npm, yarn, pnpm), semantic versioning with a hosted registry, different types of dependency installations, lockfile management, and more.

In the end, we decided to go very bare bones. We chose Yarn (Berry) and Yarn Workspaces, a single lockfile in the root of the monorepo, no semantic versioning, and no zero-installs. We arrived at these decisions because we wanted as little overhead as possible, mature tools, and for our team to already be familiar with said tools.

All the potential benefits of a monorepo

A key moment during our research on monorepos was realizing that, while this architecture would certainly deal with the problems we were facing, these systems offered so much more. Monorepos provided many benefits we hadn’t necessarily considered, most revolving around developer collaboration.

We argued that monorepo architecture would incentivize people to collaborate on projects they don’t necessarily own by making all of the code visible, thus empowering developers to implement simple fixes. Instead of being forced to look for help because you’re looking at a black box, you might be able to implement a necessary change yourself since all of the code would be easily accessible.

Monorepos would also likely make large-scale refactoring a possibility, as we would be able to change huge parts of different projects with unified commits. Since all of the interdependent source code would now be hosted in the same Git repository, the availability and file system location of any piece of code would be predictable. That would make it possible to provide utilities for performing any action necessary to work with the monorepo locally or in continuous integration (CI), e.g., environment configuration, dev-servers, builds, checks, automated sym-linking, lockfile management, and more. We were pretty hyped about it, to say the least.

After arriving at a monorepo blueprint that we were happy with, we put together a presentation for the rest of the team, presented our findings and proof-of-concept, collected feedback, and iterated upon it. We wanted to make sure that we wouldn’t create a setup that someone would be unable or unhappy to work with. It was well received, and we decided to move forward.

The physical migration

As we prepared to migrate, our main objective was to avoid disrupting ongoing work. We wrote a script that would take all the existing repositories from our polyrepo setup, merge their Git histories into a single history, and fill in the gaps necessary to realize the full monorepo. This script could generate our entire monorepo at the execution of a command, which meant that we could create the monorepo at any instant, no matter what state the polyrepo was currently in. This was much better than having to shut down development while we manually built the monorepo from the polyrepo.

The full implementation also saw a complete rewrite of our CI for all the app and package checks and deployments, which was quite a big part of the transition. Exploring how to adjust and write CI for a monorepo will be covered in its own article at a later date.

Once everything was ready and set up, we set a date for the migration: a Saturday. We chose a weekend day so people could go home, leave their work behind on a Friday, then come back the following Monday and find what they had been working on now inside the monorepo.

At this point, we considered the polyrepo deprecated because we didn’t want to maintain multiple conflicting Git histories continuously. To ensure that no work got lost, we compiled a list of all the active branches people wanted salvaged and ported over (we added support for this in our monorepo creation script).

On the other side

As unrealistically ambitious as the plan sounds on paper, it worked out for us quite smoothly! During the first week after the migration a few pipelines failed, and some incomplete bits of code were left behind in the polyrepo setup and had to be ported over manually post-transition. Apart from these and a few other minor hiccups, everything went well. Nobody was seriously blocked from continuing their work, and now that the migration is complete, nobody has looked back.

We’ve discovered the monorepo offers even more benefits than anticipated since the migration. It’s much easier to onboard people to our codebase now, thanks to the one-click type setup on a local development environment. A small internal community has developed around it, and it’s not just members from the Proton Front-end team. It includes anyone interested in monorepo architecture and anyone who works with ours. In this community, we talk about:

Monorepos in general (and our WebClients monorepo(new window) in particular)
Dealing with issues around monorepo when people need help
Proposing improvements to our monorepo’s workflow.

Most importantly, we’re now all speaking the same language when it comes to Git workflow and administration. Since it’s all one Git repo now, we’ve also normalized guidelines for Git across different front-end feature teams and universally configured the rules of our Git hosting tool that spans the entire monorepo (e.g., merge rules).

Conclusion

In retrospect, this monorepo implementation has exceeded our expectations. It’s a good solution given our needs, and we’re happy we went with it! The improvement in developer experience led to a notable boost in productivity. It’s still not a silver bullet, and there are many challenges that come with it, but for us, these challenges are heavily outweighed by the benefits it has delivered. We hope this baseline package architecture will hold up and allow us to scale and add any other required packages with ease for the foreseeable future.

The Git repository discussed in this article is open source and can be found at https://github.com/ProtonMail/WebClients(new window).