Published on January 13th, 2019
We are always taught that we “should not put all our eggs in one basket” (🥚🥚🥚 + 🧺 = ☠) to avoid the risk of losing everything at once. This can be applied to many verticals, whether it’s applied to skills, investing, income sources, etc… We always need to diversify.
Should we also apply the rule and diversify when it comes to cloud providers? We will figure out.
Photo by Autumn Mott Rodeheaver on Unsplash
Last December, on Christmas Day (disasters usually happen on the holidays for some reason🧐 when everyone is off) one of our clients had their AWS account suspended. Because of the suspension — which we had no idea why it happened in the first place — their production servers, databases and storage completely stopped. Connections to the servers or the databases were timing out, nothing could be reached.
They were using the compute service (EC2) for multiple load-balanced servers, a central caching server, the relational database service (RDS) as a central database serving all applications and the storage service (S3) as a CDN plus an object store for everything else. Luckily the DNS was not managed by Route53 — so that gave some hope in restoring backups on another cloud until the issue is resolved…
We wanted to dig into the AWS account suspension issue deeper to see why it happened and if it was possible to resolve it and get everything up and running quickly. While checking the account billing (since that’s the only thing you can do for a suspended AWS account) we noticed high usage of massively large Windows instances that incurred tremendous charges we know nothing about.
The server instances that we saw on the bill were the most powerful ones to date (Windows running onp3dn.24xlarge) — these were actually just unveiled by Amazonthe same month:
“p3dn.24xlarge has 2.5 GHz (base) and 3.1 GHz (sustained all-core turbo) Intel Xeon P-8175M processors and supports Intel AVX-512.”
Amazon states the following use cases for these machines:
“Machine/Deep learning, high performance computing, computational fluid dynamics, computational finance, seismic analysis, speech recognition, autonomous vehicles, drug discovery.”
The mentioned instances ran for a couple of days on the client’s AWS account before the suspension. What the client knows for sure is that they have not launched these instances by themselves or anyone who has authorized access to the account. Which leaves us thinking about two possible scenarios:
Unfortunately, solving the problem was taking some time, so it made sense to take more than one action in the same time.
The client has always had regular file and database backups 👍 taken hourly, daily and weekly. We concluded that it was time to temporarily deploy all servers and databases on another cloud provider from the most recent backups.
It all looked good until we realized that all backups were stored on Amazon S3 😱 — and that was the exact moment when the last hope vanished since we could not even restore backups because S3 was suspended and practically learned that we should apply this saying:
“Don’t put all your eggs in one basket”
It’s just not enough to have regular backups, you are still not safe!
You need to haveregular backupsstored inmore than one cloud basketsince a single cloud account can simply disappear for a reason or two.
Looking at some of the possible options to see if they were going to be sufficient for a quick recovery:
If setting up backups for every project is too much manual work -and indeed it can be- then giveSimpleBackupsa try.
SimpleBackupsmakes it a breeze to schedule automated backups of all your website files and databases in a simple dashboard. You will get alerts if any of your backups fail and you can store your backups on different storage providers like AWS S3, Google Cloud Storage, DigitalOcean Spaces and more.