Update about the service interruption on Feb. 1, 2021

Share this page

Proton’s servers experienced an outage on Monday that interrupted service for users for about one hour. In line with our policy of full transparency and disclosure, we are publishing the results of our internal investigation into the incident.

The outage began around 7:30 AM CET and impacted all Proton services for approximately one hour. No user data was lost, and this was not the result of any attack or breach. We have identified and corrected the problems that led to the downtime. Moreover, our work over these last few days will inform a number of new tests and safeguards that will make Proton more resistant to problems, even under abnormal conditions.

We understand that even a few minutes of downtime is disruptive to our community. We will be taking steps to ensure our systems are even more resilient in the future. This article details some of those steps, as well as a timeline of the issues.


At 7:22 AM CET on Monday, we switched our API services into offline mode to perform a scheduled maintenance to upgrade some of our database hardware, which was successful. The intervention should have taken less than five minutes.

However, four minutes later, at 7:26 AM, we noticed spikes in system CPU and memory utilization across our API server instances, which almost instantly brought all applications to a halt.

As most of the server fleet became unresponsive due to too-high CPU and memory usage, we were not able to recover quickly. First we attempted to stop traffic from reaching the impacted servers so they could “cool down.” However, that was ultimately ineffective; even with no web traffic reaching the servers, we could not connect to them. The servers could only be recovered via a hard reboot triggered from out-of-band-management. For security reasons, this can only be done manually by a small number of engineers, which means it is not a fast process. 

When the servers came back up, some of them also had improper configurations because some of them had gone down before the config changes we were pushing out had all been applied. This required an additional server-by-server check before we could bring the server fleet back online. In total, we needed approximately one hour to restore all services. 


Each server has a local cache of certain critical configuration data. This data expires in a time period that varies between three and four minutes, and each server then pulls an updated copy. While cache expiration times are normally staggered, setting all services to offline mode had the unfortunate side effect of approximately synchronizing them, as part of this operation resets this cache.

This configuration cache normally should not be needed at all during offline mode. However, there was a bug in our application code where authenticated requests (those of logged-in users, the vast majority of requests) erroneously required this data, even when offline.

What this meant was that offline mode turned into a ticking time bomb. After about three minutes, on every server, the configuration cache expired, triggering requests to a database which at this point was still unreachable. This meant that requests that usually took milliseconds to handle, now required up to 10 seconds as they waited for a timeout. The spike in response times caused API requests to pile up, triggering a massive memory and CPU usage spike which rendered all affected servers unresponsive in a matter of seconds.

We do have some protections in place against “thundering herd” problems such as this, but for reasons which are still under investigation, they were not sufficiently effective to prevent the problem.

To recap, here is the sequence of events and errors:

  1. Each server manages its own cache of configuration data which expires at semi-regular intervals. This data should not be required in offline mode.
  2. Because of a bug, authenticated requests to the servers (the vast majority of requests) triggered dependence on this cached data.
  3. The cached data expired three or four minutes after offline mode was entered, which triggered many requests to an offline database.
  4. This flood of requests resulted in memory and CPU usage spikes on the API servers, which rendered them unresponsive.


Our first priority was to fix the two code issues that caused the immediate problem. There is no longer database dependence in offline mode for authenticated requests, and the configuration cache is being reworked to continue to use the “stale” data in the event of a refresh failure rather than clearing it entirely. This will fully prevent this particular issue from recurring in this manner.

We also continue our efforts to fully simulate the failure so that we can build in additional safeguards at different levels of our technology stack. This in-depth defense is essential for ensuring that one bug cannot create a fatal cascade of errors.

This outage has also revealed the need to be better when it comes to thoroughly stress testing our infrastructure and code in abnormal conditions, such as when various dependent services are offline. Offline mode tests, for instance, must include various types of requests, including authenticated ones, and should have gone on for longer to trigger the cache expiration issue. It also highlights the need for more regular testing. While offline mode has been tested in the past, between the last test and the incident on Feb. 1, Proton’s traffic has grown so dramatically that more tests should have been conducted even though none of the software involved has been changed. 

We are also changing our standard operating procedures for maintenance operations to ensure that more engineers are on-call in case something unexpected happens. One of the reasons for the longer recovery time during this incident was the time required to call additional engineers to accelerate the recovery.

This was a highly unusual downtime and stressful both for our team and our users. We apologize for this inconvenience, and we’re grateful for your trust as we work to provide the highest levels of privacy, security, and reliability to the Proton community.

Best regards,
The Proton Team


Feel free to share your feedback and questions with us via our official social media channels on Twitter(new window) and Reddit(new window).

Protect your privacy with Proton
Create a free account

Share this page

Proton Team

We are scientists, engineers, and specialists from around the world drawn together by a shared vision of protecting freedom and privacy online. Proton was born out of a desire to build an internet that puts people before profits, and we're working to create a world where everyone is in control of their digital lives.

Related articles

Last week, the Spanish Presidency of the European Council delayed a vote regarding the Council’s position on the controversial Child Sexual Abuse Regulation (CSAR) due to a lack of consensus over the issue of encryption, among others. This proposed r
At Proton, we’re always working on new and innovative ways to protect the privacy and data of the Proton community. Sometimes that means developing entirely new services, like our Proton Sentinel program, which combines AI and human security analysts
How to unsend an email in Gmail, Outlook, Proton Mail, and Apple Mail
“Undo Send” gives you a chance to stop an erroneous message you’ve just sent. We’ve all done it. You hit Send on an email only to spot you’ve misspelled someone’s name, forgotten an attachment, or accidentally sent a cringing joke to half your conta
Google has already taken privacy washing to the extreme by trying to brand itself as “privacy focused”, even though its business model is based on surveillance.  Lately, the company’s marketing strategy has turned toward outright Orwellian doublespe
Last week, the UK government made a statement in the House of Lords acknowledging that portions of the controversial Online Safety Bill might not even be technically enforceable without breaking end-to-end encryption. This rightly received a lot of a
What is email spoofing?
Email spoofing is a technique attackers use to make a message appear to be from a legitimate sender — a common trick in phishing and spam emails. Learn how spoofing works, how to identify spoofed messages, and how to protect yourself from spoofing a