One line of code that did cost $8,000

Due to a simple bug, Screen Studio app did generate over 2 petabytes of network traffic
profile photo
Adam Pietrasiak

TLDR

Due to a bug, screen.studio app kept downloading the auto-update file repeatedly, every 5 minutes for every single user. The update file is approximately 250MB. This resulted in 9 million file downloads and more than 2 petabytes (2,000,000 gigabytes) of traffic on Google Cloud.
Image without caption
This screenshot might not look so scary at first, but take a look at the scale of it. For over a month, we generated at least 100Mib/s (a second!) and, at times, almost 1GiB/s of traffic (every single second!)

That bug was painfully simple and stupid.
Screen Studio is a desktop app. It means we need some auto-updater to allow users to install the latest app version easily.
The app checks for the update every 5 minutes or when the user activates the app.
Normally, when the app detected the update - it downloaded it and stopped the 5 minutes interval until the user installed it and restarted it.

Tragic refactor

The problem with the auto-updater we had was that it would prompt the user to update the app as soon as it became available. This resulted in a popup appearing while users were recording the screen, which obviously provided a bad experience as it interrupted the recording the user was making.
While refactoring it, I forgot to add the code to stop the 5-minute interval after the new version file was available and downloaded.
It meant the app was downloading the same 250MB file, over and over again, every 5 minutes.

Tragic context - app running in the background for weeks

It turns out thousands of our users had the app running in the background, even though they were not using it or checking it for weeks (!). It meant thousands of users had auto-updater constantly running and downloading the new version file (250MB) over and over again every 5 minutes

The math

Let’s do some quick math here.
  • Doing something every 5 minutes means doing it approximately 288 times a day.
  • The update file is about 250 MB, meaning 72 GB of downloads per user daily.
  • We had this situation happening for over a month before we noticed it.
  • We had at least a thousand such app instances running in the background at any moment.
  • 250 MB * 288 downloads per day * 30 days * 1000 users:
Image without caption
It means it was roughly:
  • 2 000 000 gigabytes,
  • or 2 000 terabytes
  • or 2 petabytes of traffic.

Series of bad mistakes

We did not have cost alerts on Google Cloud. Before this situation occurred, we were paying at most $300 a month.
We were also not regularly checking the situation as it just worked.
We noticed it because my credit card started to block the transaction due to limits I had set on it (lucky me!).
Image without caption

Consequences for the users

It was not only bad for us but even worse for some of the users.
As mentioned, the app was generating so much traffic. It means it was their machine generating network traffic on their home router and their internet provider.
One of our users, who lived in a house, had their internet provider cancel their contract due to enormous traffic generated during a month. It was extremely problematic as there was no other internet provider available around.
We decided to take responsibility and offer to cover all the costs related to this situation.
Luckily, it was not needed as the person could figure out the situation with the provider without bigger problems.
That was, however, quite a terrible experience for that person and me. As a designer, I value the experience product I create provides to the users. And this was not even a bad experience; it was actually harmful.

Summarising

  • Set alerts on your cloud at all times.
  • Write your auto-updater code very carefully.
  • Actually, write any code that has the potential to generate costs carefully.
  • Add special signals you can change on your server, which the app will understand, such as a forced update that will install without asking the user.
  • Regularly check your cloud.
Related posts
post image
When the same bug is caused by multiple issues and only fixing all of them at once will make it go away
post image
Don’t increase your Electron app code complexity as you add more windows to it
Never read pixels into JavaScript memory.
Powered by Notaku