Your Data are Pets (Not Cattle)

You still have a pet in the cloud, and her name is Data (or whatever you decide to name her – she is yours, after all)

Anyone who has been through Amazon’s AWS Cloud 101 (not a real class) has heard the Cattle vs. Pet metaphor, which was coined some time ago by Bill Baker (ironically from Microsoft).

If you’re unacquainted with the concept, the thought process goes something like this: In a traditional IT shop, your servers are treated like pets. They live with you (if you own your datacenter), you give them names, you check in on them to make sure they’re feeling ok, and you tend to them when they’re sick. Bad CPU? Replace it. Blue screen? Troubleshoot it (sometimes for days). You paid tens of thousands of dollars for this block of metal, and you’re not about to let that go to waste.

Enter the cloud. Here, everything that makes your server unique is (hopefully) defined by code.  It’s not a block of metal in the basement – it’s a logical object that can be blown away and recreated at will. Here, your servers are cattle: nameless clones that can be easily replaced. If it’s not performing like you expect, you don’t nurse it back to health, you put it out of its misery and replace it with another one just like it (except the new one works).

There are two things you need to know about me right off the bat. First, I absolutely buy in to this concept. Why spend hours (or days) troubleshooting an issue, when you can just kill the server and spin up a new one? Why pay for a permanent DEV environment (or twelve) that sits idle on nights and weekends? Why patch web servers when you can just add new ones to the pool and kill the old ones? Kill them. Kill them all. We can resurrect them later (if you can’t, you’re doing it wrong).

The second thing you need to know about me is that I’m a Database Administrator. We DBA’s aren’t known for throwing caution to the wind when it comes to… well, anything. We backup everything. We secure everything. We treat every server as production whether we need to or not. We are the gatekeepers of very valuable data, without which the companies that employ us cannot operate. The result is that we end up owning a lot more of these “pets” than just about any other layer of the technology stack.

It’s probably not surprising, then, that the first time I heard the Cattle vs. Pet argument, I went full Luddite. Instinctively, it just felt wrong. “How dare you suggest I destroy Whiskers every night?!”

Once I was able to get over the mental hurdle of treating my beloved data pets with callous disregard, however, I was able to see the issue with some clarity.

And my initial reaction was half right:  Yes, the vast majority of your infrastructure is now generic, replaceable “cattle”.

— BUT —

There’s one pet that you have to bring with you, even to the cloud: Your production data store.

Your production data should under no circumstances be treated like cattle. The underlying servers and schema are cattle, but the data isn’t. You don’t just blow it away.

Copies of the production data store? Sure – Truncate tables. Mangle the schema. Burn it with fire when you’re done. Copies of the data are absolutely cattle, which means you can spin up and tear down multiple non-prod versions of your app without worrying about keeping the data in a valid state. But your production data store is and will continue to be a beloved pet.

Go ahead, name it.

In fact, your data in the cloud arguably requires more care and feeding than it did on-premises. Backups and patching might be taken care of automatically if you’re using a PaaS service like Amazon RDS, but now your have to worry about things like SSL and encryption-at-rest (the physical data isn’t in the basement anymore), unencrypted connection strings on web and app servers (which you may or may not have control over) and myriad other aspects of the technology stack that other teams just “took care of” on-premises (public vs. private subnets, VPC peering, firewall rules/security groups, CIDR notation, and the list goes on…).

My goal for this blog is to help other DBA’s who are making the transition from on-prem to the cloud, primarily focusing on AWS since that’s where I’m spending all of my time these days. I expect the majority of my posts to be very practical tips and tricks for working with databases in the cloud, and there will most certainly be some opinion sprinkled in for flavor.

I’m learning as I go, and I’m inviting you (and your pet) to learn with me.

Let’s go.

Data Modeling with NoSQL

Thinking back to one of my favorite moments from Re:Invent 2015, I thought this was worth re-sharing.

After giving an overview of the DynamoDB service, Rick Houlihan (Principal Solutions Architect with AWS) shared this pithy pearl of wisdom:

“Data in NoSQL is not non-relational.”

“WHAT? I thought we didn’t have to worry about relationships anymore!!”, cried the masses.

False.

This is one of the most important things to remember as you make the pivot from an RDBMS like SQL Server or Oracle to a NoSQL platform like DynamoDB or MongoDB. Relationships still exist in your data; your schema is just more flexible. Failing to account for relationships in your data, even in NoSQL, is prone to result in poor performance and substandard design patterns.

Here Houlihan shows us how to model one-to-one, one-to-many, and many-to-many relationships in DynamoDB. The same general concepts can be applied to other NoSQL platforms like MongoDB, though the specific implementation will be different (obviously).

If the video below doesn’t take you there automatically, the real magic starts at 26m 40s into the talk.

Enjoy!