Your Data are Pets (Not Cattle)

You still have a pet in the cloud, and her name is Data (or whatever you decide to name her – she is yours, after all)

Anyone who has been through Amazon’s AWS Cloud 101 (not a real class) has heard the Cattle vs. Pet metaphor, which was coined some time ago by Bill Baker (ironically from Microsoft).

If you’re unacquainted with the concept, the thought process goes something like this: In a traditional IT shop, your servers are treated like pets. They live with you (if you own your datacenter), you give them names, you check in on them to make sure they’re feeling ok, and you tend to them when they’re sick. Bad CPU? Replace it. Blue screen? Troubleshoot it (sometimes for days). You paid tens of thousands of dollars for this block of metal, and you’re not about to let that go to waste.

Enter the cloud. Here, everything that makes your server unique is (hopefully) defined by code.  It’s not a block of metal in the basement – it’s a logical object that can be blown away and recreated at will. Here, your servers are cattle: nameless clones that can be easily replaced. If it’s not performing like you expect, you don’t nurse it back to health, you put it out of its misery and replace it with another one just like it (except the new one works).

There are two things you need to know about me right off the bat. First, I absolutely buy in to this concept. Why spend hours (or days) troubleshooting an issue, when you can just kill the server and spin up a new one? Why pay for a permanent DEV environment (or twelve) that sits idle on nights and weekends? Why patch web servers when you can just add new ones to the pool and kill the old ones? Kill them. Kill them all. We can resurrect them later (if you can’t, you’re doing it wrong).

The second thing you need to know about me is that I’m a Database Administrator. We DBA’s aren’t known for throwing caution to the wind when it comes to… well, anything. We backup everything. We secure everything. We treat every server as production whether we need to or not. We are the gatekeepers of very valuable data, without which the companies that employ us cannot operate. The result is that we end up owning a lot more of these “pets” than just about any other layer of the technology stack.

It’s probably not surprising, then, that the first time I heard the Cattle vs. Pet argument, I went full Luddite. Instinctively, it just felt wrong. “How dare you suggest I destroy Whiskers every night?!”

Once I was able to get over the mental hurdle of treating my beloved data pets with callous disregard, however, I was able to see the issue with some clarity.

And my initial reaction was half right:  Yes, the vast majority of your infrastructure is now generic, replaceable “cattle”.

— BUT —

There’s one pet that you have to bring with you, even to the cloud: Your production data store.

Your production data should under no circumstances be treated like cattle. The underlying servers and schema are cattle, but the data isn’t. You don’t just blow it away.

Copies of the production data store? Sure – Truncate tables. Mangle the schema. Burn it with fire when you’re done. Copies of the data are absolutely cattle, which means you can spin up and tear down multiple non-prod versions of your app without worrying about keeping the data in a valid state. But your production data store is and will continue to be a beloved pet.

Go ahead, name it.

In fact, your data in the cloud arguably requires more care and feeding than it did on-premises. Backups and patching might be taken care of automatically if you’re using a PaaS service like Amazon RDS, but now your have to worry about things like SSL and encryption-at-rest (the physical data isn’t in the basement anymore), unencrypted connection strings on web and app servers (which you may or may not have control over) and myriad other aspects of the technology stack that other teams just “took care of” on-premises (public vs. private subnets, VPC peering, firewall rules/security groups, CIDR notation, and the list goes on…).

My goal for this blog is to help other DBA’s who are making the transition from on-prem to the cloud, primarily focusing on AWS since that’s where I’m spending all of my time these days. I expect the majority of my posts to be very practical tips and tricks for working with databases in the cloud, and there will most certainly be some opinion sprinkled in for flavor.

I’m learning as I go, and I’m inviting you (and your pet) to learn with me.

Let’s go.

Data Modeling with NoSQL

Thinking back to one of my favorite moments from Re:Invent 2015, I thought this was worth re-sharing.

After giving an overview of the DynamoDB service, Rick Houlihan (Principal Solutions Architect with AWS) shared this pithy pearl of wisdom:

“Data in NoSQL is not non-relational.”

“WHAT? I thought we didn’t have to worry about relationships anymore!!”, cried the masses.


This is one of the most important things to remember as you make the pivot from an RDBMS like SQL Server or Oracle to a NoSQL platform like DynamoDB or MongoDB. Relationships still exist in your data; your schema is just more flexible. Failing to account for relationships in your data, even in NoSQL, is prone to result in poor performance and substandard design patterns.

Here Houlihan shows us how to model one-to-one, one-to-many, and many-to-many relationships in DynamoDB. The same general concepts can be applied to other NoSQL platforms like MongoDB, though the specific implementation will be different (obviously).

If the video below doesn’t take you there automatically, the real magic starts at 26m 40s into the talk.


How To Share an Encrypted RDS Snapshot With Another AWS Account

Let the heavens rejoice; we can now share encrypted RDS snapshots between accounts within an AWS region.

Why am I so excited about this? Mainly because it allows us to more readily adopt a microservices approach to AWS account management – something I plan to write about in a future post.

For now, just take take this simple piece of advice: Don’t run your prod resources in the same AWS account as your dev/test (non-prod) resources. And since everything but your data is defined by code (it is, right?), it’s easy to deploy your application stack into any AWS account.

But how do you copy your database schema and data from one account to another? In the past, you had to 1) Export the schema manually, 2) Export the data with something like BCP, 3) Transfer the data, 4) Apply the schema to your new instance, 5) Import the data 6) Hope and pray that you didn’t miss any database settings, schema objects, or rows of data in the process.

Each time you wanted to copy data down from prod to dev, this complicated, fragile process had to be repeated. Why? Because restoring from a snapshot wasn’t supported between accounts. The only way around it was to keep your prod and non-prod resources in the same account (DANGER!).

Apparently I wasn’t the only one complaining to Amazon about this shortcoming, and they took a major step forward late last year by announcing support for sharing unencrypted database snapshots between accounts. This solved half the problem, but one major issue remained: What about my encrypted RDS instances? If you’re entrusted with sensitive, personally identification information (PII), then end-to-end encryption should be a top priority in the public cloud, including data at rest (yet another reason I argue you should treat your data as pets, not cattle).

Alas, there was still no viable option for a simple snapshot-based data overlay from one account to another.

Just when we (I) had all lost hope, Amazon announced four short months later that they now support sharing encrypted RDS snapshots between accounts!

At long last, we can use an AWS-native method to achieve a database overlay, even with encrypted volumes.

While this is great news, the process isn’t exactly straightforward. There are several moving pieces involved, which is why I decided to provide a walkthrough of the process from start to finish.

I’m going to show you how to accomplish the job through the console because it helps illustrate the concepts, but you should never use the console in production for anything but to verify your code worked. Get comfortable with the steps, then use your favorite AWS SDK or Command Line Tools to complete the same tasks in your projects.

And Now for the Main Event

There are 4 major tasks involved in sharing an encrypted RDS snapshot with another AWS account. They are:

  1. Create and share a custom KMS encryption key.
  2. Create an encrypted RDS instance using the KMS key you created.
  3. Create and share a snapshot of the encrypted RDS instance.
  4. Copy the shared snapshot to the target account.

(Note that you can only share encrypted snapshots within the same region. If you want to share a snapshot across regions, you cannot use encrypted storage.)

Without further ado, let’s get started.

Step 1:  Create and Share a Custom KMS Encryption Key

Before you can share an encrypted RDS snapshot, you must create a new KMS key. You can’t use the default key because you can’t share access to it. If you want to share an existing snapshot of an encrypted instance that uses the default aws/rds KMS key for encryption, you’ll have to make a copy of it using your new custom KMS key first (similar to what you’ll see in step 4).

  1. Go to Identity and Access Management (IAM).
  2. Select “Encryption Keys” from the menu on the left.
  3. Click “Create Key”.
  4. Provide an alias and description for the new key and select “next step”.
  5. Choose the IAM users and roles that can administer the new KMS key through the KMS API.
  6. Choose whether the key administrators you chose in the last step can delete the key.
  7. Choose the IAM users and roles that can use the new KMS key.
  8. IMPORTANT:  Click “Add an External Account” and enter the target account number. This is what allows the target account to decrypt the encrypted snapshot later on.
  9. Preview the key policy JSON document, and click finish.
Step 2:  Create an Encrypted RDS Instance Using the New KMS Key

Now that your KMS key is created, you can create an RDS instance with encrypted storage using this key.

  1. From the AWS Console, select the RDS service.
  2. Click Launch a DB Instance.
  3. Select your engine and version. Here, I’ll select SQL Server Standard Edition.
  4. If you’re deploying the instance to PROD, select “Production” (which gives you Multi-AZ high availability and Provisioned IOPS). Otherwise select DEV/Test, as I am doing here.
  5. On the “Instance Specifications” page, select all the options you typically would with one exception: The instance class has to be at least db.m3.medium to support encryption (current as of the time of this writing).
  6. On the next page, I suggest using a non-default VPC, a private subnet group, a non-default port, and don’t allow the instance to be publicly accessible (but choose your own adventure).
  7. IMPORTANT:  Select “Yes” for enable encryption, and select the new KMS key you created earlier under “Master Key”.
  8. Finalize your parameter choices, and click “Launch DB Instance”.
Step 3:  Create and Share a Snapshot of Your Encrypted RDS Instance

Now the moment we’ve all been waiting for. Let’s create a snapshot and share it with another account.

  1. From the RDS Dashboard, select “Snapshots”.
  2. Click “Create Snapshot”.
  3. Choose your DB Instance and give the snapshot a name, then click “Create”.
  4. Wait for the snapshot to complete. Mine took just over a minute (You might have to hit refresh a few times before it shows as complete).
  5. Go back to the Snapshots tab in the RDS Dashboard and select the snapshot you just created. Click “Share Snapshot”.
  6. Notice that the public option for snapshot visibility is not available for encrypted snapshots. This is because, 1) The public at large does not have access to your KMS key (that would defeat the purpose), and 2) Why would you want to encrypt data at rest if you want to make it publicly available?
  7. Add the account number of the account you want to share the snapshot with (the same one you granted permission to access the KMS key earlier in this tutorial).
  8. Log out of the current AWS account, and log back in to the target account (or sign in to the target account using another browser – you can stay logged into both accounts that way).
  9. Navigate to the RDS Dashboard and select the “Snapshots” tab.
  10. Click the drop down filter list. It should say “Owned by Me” by default. Select “Shared with Me”.
  11. You should now see the snapshot you shared earlier in this list.

So you would think at this point that you can just select the snapshot and click “Restore Snapshot”, right?

Well, not quite. We have one more step.

Step 4:  Create a Copy of the Shared Snapshot on Your Target Instance

Before restoring a shared, encrypted snapshot, you first have to make a copy of the snapshot in the target account.

  1. Select the snapshot and click “Copy Snapshot”.
  2. Note that you cannot share encrypted snapshots across regions, so the Destination Region is pre-selected for you.
  3. Enter in a value for the New DB Snapshot Identifier field, and select an encryption key (you’ll only see keys that are local to the current account).
  4. Select Copy Snapshot
  5. Switch your filter back to “Owned by Me”.
  6. You should now see the snapshot copy in progress. Wait for the copy to complete (took about 2 minutes for me).
  7. Once the copy is complete, you can select the copy of the snapshot and select “Restore Snapshot” to create a new instance from this snapshot.

Bonus Tip:  If you already have a snapshot that was encrypted using the default KMS key, you can still share it. Just create a new KMS key like we did in step 1, make a copy of the snapshot in the source account using the new KMS key like we did in step 4, then follow steps 3 and 4 above.

And there you have it, four simple steps to sharing an encrypted RDS snapshot. Happy overlaying!