Server shell access is the most potent tool in a sysadmin’s tool box. Luke Walker talks about how MontyCloud DAY2 works with AWS Systems Manager to enable one-click shell access to EC2 Windows and Linux VMs.
Rick: In an earlier episode of OpsTalk we talked about task automation.
As CloudOps teams chase automation, one sobering thought stands out. Automation works for predictable, routine tasks. But you can’t predict every single scenario, and therefore automate everything. So how do you deal with such unforeseen tasks on the fly?
That is the question we address today
SysAdmins and DevOps frequently need to get shell level access to their EC2 Instances precisely for such tasks on the fly. The on the fly tasks can be for routine maintenance or more often than not for serious break glass interventions. Getting shell level access to servers, remotely, can be surprisingly complicated, cumbersome and, error prone. Sometimes exposing security holes in your infrastructure.
Thank you for joining this session of OpsTalk podcast. I am Rick Hebly, and with me – is Luke Walker, head of Product Management at MontyCloud. Today, we discuss why it is worth the effort to simplify the process of getting shell access with one click.
Rick: Luke, welcome to OpsTalk, thank you joining.
Luke: Thanks Rick, great to be here.
Rick: This may sound a bit oblivious, but haven’t compute infrastructure, operating systems and management tools evolved enough that you don’t need to get shell access to compute nodes anymore? Why do IT admins and DevOps engineers still require shell-level access to their virtual machines, even as they use immutable infrastructure?
Luke: Well Rick, while organizations are starting their journey to more deterministic builds, environments where systems are built and deployed to a standard and configuration management is automated and catered for, the ability to directly login to a target server and look under the hood while it’s in operation remains one of the most powerful tools in a CloudOps engineer’s toolkit.
Even when you’re managing a large server fleet with config management tools, something as simple as an inadvertent change can result in a bad system state and you need to ‘break glass’ to determine what change to the config is necessary.
Rick: Having to break glass is quite the dramatic moment, but it sounds like you still see this frequently occurring even in the most robust environments?
Luke: More than many realize. One of our customers – Leadsquared, a Marketing Automation platform powered by AWS, ran into this very scenario not too long ago.
While the application is generally well-managed, they pushed out a change that inadvertently caused the Guest OS to deny access to critical application traffic for about 20 minutes. In most cases you would roll back to the last known good configuration, but in this instance the only way to recover was to get shell access and directly remediate the application and supporting frameworks.
Rick: I just hope this is incidental? In such a scenario, why wouldn’t one immediately write a remediation script to prevent this in the first place?
Luke: It’s every Ops dream to be able to write that one single master script that does everything, including brew coffee, but the reality is LeadSquared did not anticipate that a routine software update would result in the GuestOS denying access. Not every scenario can be automated ahead of time. Well, maybe with the exception of brewing coffee, that has been done! But I think you get the picture here.
Rick: If only my own coffee would just magically appear at my table… but given we’re talking about direct access to a VM, in this age of automation, data collection, and AI, there’s reasons why direct access is an overlooked important piece of the Ops toolbox?
Luke: Absolutely – while inadvertent config drifts are one reason why you want shell access to a VM, it could also very well be necessary to directly review an application and server logs, fine-tune a configuration, or even just terminate a runaway process. These are not always tasks best suited to automation or remediation scripts, because they still require the human in the decision making process.
Rick: You’re saying planning for failure still doesn’t mean definitive automation and process, in effect you’re nowhere without humans holding the right tools. So what makes this so difficult that we’re even talking about it? Surely logging into a server is easier than it was pre-cloud.
Luke: Surprisingly, providing secure, robust remote access into one’s infrastructure can be time consuming or presents several operational and security risks. As you setup a well-architected AWS application environment with a blueprint, one of the core tenants is Security. You need to ensure that you are establishing controls and protect your systems from attack. But in order to be able to observe & react to incidents like LeadSquared’s where human interaction is still required, you need secure access into those target EC2 instances.
For most people today, whether you’re in AWS, Azure or an on-premise environment, that means creating a secure path such as Bastion Hosts, or a VPN, managing SSH keys, or an agent-based management approach. All of these mean more infrastructure, which also equally must be secured and well-managed.
The remediation scenarios are clear. Now, we are working with another customer who has an environment for data science. One of their top 3 support tickets is to re-generate SSH keys that have been lost or misplaced by users authorized to access instances within the environment.
Rick: If I get you right, that’s really no different than having to replace my house keys every week! A security nightmare. In a more controlled environment, I’m sure that this may be better managed, but especially with the majority of us working remotely in these extraordinary times, how exactly would you go about ensuring secure access?
Luke: Sometimes it just takes a fresh look at the environment to find a better way.
Recently, MontyCloud just shipped a feature that extends AWS System Manager’s Sessions Manager service. Quite the mouthful, but leveraging this service, DAY2, we can open shell access directly into Windows and Linux EC2 instances in a single-click Customers no longer to build all of the additional security infrastructure that we’ve seen in LeadSquared and other accounts time and time again.
So, this means – no bastion hosts, no VPNs, no more managing SSH keys. Straightforward single-click from a web browser, and you’re right into a PowerShell session or a shell prompt.
Rick: If something almost sounds to simple to be true, you have me looking for the catch. I mean, remote access usually in my mind means directly connecting to a host, so how is this different and how do even keep this secure?
Luke: The interactive shell sessions are very secure and AWS native. DAY2 is working in concert with Sessions Manager, these interactive shell sessions are handled securely through AWS’ services and endpoints through to the System Manager agent that’s bundled by default on nearly every AWS image.
You could have an instance deployed with no inbound rules. Because we work with Systems Manager agent, we can still ensure a secure path to gaining shell access to that host without compromising your infrastructure.
In addition to a secure path, with DAY2, every session is logged – so all the commands you write, the output the operator sees back – are logged directly into a CloudWatch log trail and retained for 30 days. This means the session details are kept off-host from the original server and secured away in case they need to be inspected.
Rick: So all of the access with none of the tradeoffs, you still get to keep that powerful tool in your toolbox, and you get to make your infrastructure more secure by not having to build any special or crazy backdoors.
Luke: Correct. CloudOps engineers want to avoid creating complicated environments – I certainly do – and avoid being forced into making trade-offs between responsiveness and security.
Rick: Given that you are leveraging AWS services here, what does DAY2 add to Sessions Manager? Can I not directly use Sessions Manager to do the same thing?
Luke: That is a fantastic question Rick. There’s two parts to the puzzle of gaining secure access. Configuration & actually gaining access. Where DAY2 shines is by simplifying the process.
Seeing is believing. I’ll just show you.
DAY2 makes the entire process of configuring Systems Manager, activating the agent present within your EC2 instances amazingly simple – a single task is all it takes, could be 1 instance, 10, 1000 instances – it’s a single, simple task. DAY2 will go ahead and co-ordinate with AWS to ensure the necessary roles and agent activation requests are submitted and processed accordingly.
Once your hosts are under Systems Manager & DAY2 management, you can view your entire server fleet across all accounts, all regions from a single pane and gain shell access directly into a Windows or a Linux host with a single click – within our Infrastructure page on the portal, there’s a Remote Console option directly on the interface, next to each of the servers you can see here, and in a single-click, up pops the shell dialog, and you’re in.
Rick: That’s really amazing to see action and you talk through it. You said this Remote Console functionality is GA right? Can I test it myself?
Luke: This is base functionality for DAY2 – you can sign up a free trial today via our website, montycloud.com, or if you’re already a customer, just login with your account and by accessing the Infrastructure pane you’ll see the Remote Console option for any managed instance discovered within your attached cloud accounts.
Rick: Thanks Luke, and it’s worth mentioning there’s more details published upon our OpsTalk Blog at montycloud.com/blog, and I’d certainly recommend trying out this feature.
Rick: This concludes today’s OpsTalk. You can find more details on One-Click shell access and other DAY2 automations at montycloud.com.
Subscribe to get the reminder. Thank you for joining us today!