4 min read

S01: E05 – Continuous desired server state monitoring & enforcement

Picture of Luke Walker Luke Walker : Oct 22, 2020 11:23:00 AM

Podcasts

S01: E05 – Continuous desired server state monitoring & enforcement

Server configuration drifts can lead to significant compliance and security challenges. Continuously monitoring server state for configuration drift and autonomously enforcing corporate standards at scale is challenging. Luke Walker shows how you can automate continuous checks, enforce compliance and remediate drifts for thousands of EC2 instances in just a few clicks.

Read the Transcript

Rick: Enforcing server state is critical to stability and security for Windows and Linux sysadmins and cloud engineers. Yet trying to ensure that your servers meet your standards is a constant persistent challenge.

I am Rick Hebly, and in this edition on OpsTalk Luke Walker and I are going to show you how you can standardize and continuously sweep and remediate your System configurations, installed software and Operating System and application patches across thousands of EC2 instances, in just a few clicks.

Luke – welcome to OpsTalk, but I have a burning question for you – I thought this was solved with configuration management tools?

Luke: Thanks for having me here Rick. Dealing with non-standard servers is still a significant problem both in and outside of cloud, for any team. In a perfect world, your images would be built with a well-baked in configuration, VMs will update themselves, and your Ops team has perfect control over what and how the environment is built.

Sadly, that is far from the truth for customers. We have one such customer that specializes in the field service management space, and they discovered that enforcing desired state configurations created bottlenecks within the CloudOps teams.

Across 100s of systems, there was a hard choice between getting buried in their ticketing systems because they can’t approve changes to server images fast enough, or they can’t guarantee the build process because those server images are coming from everywhere.

Rick: Effectively, if I was a sysadmin, a cloud engineer, I’m being handed a system built by someone else, and then expected to figure out how to enforce company standards after the fact? How do you even find the time on top of all that to report your compliance status to management?

Luke: It becomes nearly impossible to consistently report that status. Just trying to enforce standards is a painful exercise… because, now you have to bake agents in, build additional infrastructure, create central repositories, gain network access to every system – more and more work is required on top. Even in scenarios where SecOps have existing Chef recipes for you to consume, or your requirements may be straightforward, this work doesn’t go away.

Ops teams shouldn’t have to make tradeoffs between delivering image changes, solving build process challenges, or providing consistent reporting.

Rick: So how are you tackling this differently with DAY2?

Luke: Let me bring up my environment and show you how you can do this in under 30 seconds, and all without that build and maintenance process.

Now, our Corporate Standard has two parts.

This is a Windows server. So, let’s say that the corporate standard is that PowerShell 5.0 is always available and that my local admin passwords are scrambled and sent back to Active Directory.

First let’s define the registry check for signs that PowerShell 5.0 exists, and we’ll include a remediation step so that if a given server doesn’t have PowerShell v5 installed, we’ll take care of it automatically because it’s a low impact fix.

Next, we’ll create a second rule for LAPS. LAPS a handy Microsoft solution for scrambling local Administrator accounts, then populating the details back into AD. The team has already created the deployment script. I’ll drop the script right in here as a PowerShell script. We’ll skip remediation just for this example.

Quick scan of our settings, everything looks good. Let’s now finish creating this policy and then schedule it to run against our server fleet.

Rick: That’s it?

Luke: That’s it – we’re checking our servers meet the corporate build standard across the entire server fleet in 3 different AWS accounts, in this environment. 3 steps and now the entire server fleet is under the governance umbrella.

Rick: That was fast… very fast. Can we dive a little deeper into what these rules let you do? I ask because usually agentless solutions give you a simple choice and not much flexibility in what you can check for, or remediation is manual – yet here, I can see we did include remediation steps for PowerShell.

Luke: Desired Server State supports single or multiple rules as you just saw in the corporate standard we enforced. The rule types can vary – you can implement a compliance check with something as simple as checking for the existence of certain registry keys or files.

You can execute Chef recipes, Ansible scripts, PowerShell scripts or shell commands at scale, and there’s no need to upload the script ahead of time or have your VMs access a central repository. The important thing here is that you don’t have to punch holes through your firewalls.

Then there’s the remediation – we accept both a Windows PowerShell and Linux Shell script either as a parameter or the script itself, and this will only get executed on individual servers that fail the rule check.

The remediation scripts are all executed using AWS’s Systems Manager agent, which comes bundled with Ubuntu 16.04, 18.04, Amazon Linux 1 & 2, Server 2008 and up AMIs by default. So again you don’t need to roll out agents across your server fleet.

Rick: Once this is created, you only need to schedule when and for which systems this policy applies for?

Luke: Correct – whether it’s once a day, or multiple checks a day, that choice is always up to you as the operator.

Rick: So, let’s say the compliance check fails, and the remediation fails, how do I get a view across not just my server fleet, but just my application, so I can assess what my priorities are or what application owners I need to engage?

Luke: We have two easy reports for that, let me show them to you now.

This first report lets you select only the applications you want to inspect, in this case it’s the mission critical ERP application. You can now generate a more targeted report that enables you to make a business context decision.

We also have a report that shows the compliance status across the server fleet – all departments, all accounts, all regions. The compliance status takes into account both the desired server state, and also patch management, if you are using DAY2 to scan and patch your instances.

We can even filter just to look specifically at desired server state, and separate out patches, or the inverse.

By having this all information in one place, you’re not having to fight your basic or your existing tools to extract and report to your CISO or Compliance Auditor what the actual compliance status is, for 1000s of VMs, or that one application you’re trying to report on.

Rick: Yes, especially as doing the compliance checks is part of the struggle. Summarizing and reporting back to a CISO or your manager is just as taxing on an engineer’s time. Thank you, Luke for taking us through this in greater detail – is this something that’s available today?

Luke: We’ve just released this functionality as part of the Well-Managed Server suite. You can try this out now via montycloud.com

Rick: Thank you Luke.

This concludes today’s OpsTalk. Stay tuned for the next episode. Don’t forget to like and subscribe to this
channel!