4 min read

Application-level alerts, IFTTT style reactions and remediations with DAY2 CloudOps Automation Rules

We announced DAY2 CloudOps Automation Rules in December 2020. With DAY2 CloudOps Automation Rules, IT teams can save time in implementing system checks and improve accuracy of application reliability responses. With “if this than that” conditional logic DAY2 CloudOps Automation Rules enables IT teams to define the logical sequence of events that should take place based on events such as application spikes. In this blog Sumant Dubey – Director of Engineering at MontyCloud, writes about the components that power DAY2 CloudOps Automation Rules and how they work together to simplify and automate responses to application-level alerts.

  Sabrinath S. Rao

 

Consider an eCommerce Application which is well provisioned for a normal workload. You have made arrangements to scale the application for planned high traffic periods, such as Super Bowl Sunday! But what if your ad previews leading up to prime time goes viral and there is a sudden, unexpected, spike in the traffic? Moreover, what if it is in the off hours?

The purpose of moving to the cloud is so don’t have to spend the CapEx for spikes but can scale on demand. Rather it is more efficient if you are alerted when such an unexpected spike happens. Not just that, you would want to filter out false alarms as well. For instance, if you know that your users are coming in from a certain geography/time-zone, you may want to take on scale only in that region – say AWS Region US-West, and automatically scale down as the traffic subsides.

That is not a lot to ask. It is business.

My colleague Luke Walker shared how you can do just that with DAY2 CloudOps Automation Rules. In this blog I will share about the components of the DAY2 platform and how they work together to deliver automated If-This-Then-That (IFTTT) style reactions and remediations to application alerts.

Must Read: Hands off management via automated routine remediations

 

Introduction

DAY2 was born of the cloud, in the Cloud and, for the Cloud. We built DAY2 using the same technologies that power many of your applications today. It runs on the very same infrastructure that it helps manage. We know first-hand, Cloud workloads can be very complicated and daunting. That is why we are on the mission to simplify Cloud Operations by abstracting all the underlying plugs and bolts, giving you the views that matter in the business context, and simplifying complex operations to just a few button clicks.

Simplicity does not mean limited power. To us, it is enabling you with the power & flexibility. This means visibility & control across multiple cloud accounts and regions, simplifying tasks to knobs that can be turned to desired levels, systems that provide instant feedback, and hooks to the data streams as they come to life.

To be able to detect changes and react to them, as they happen, is the core of the DAY2 foundation.

In DAY2, Applications are first-class citizens, since ever. Whether it is an e-Commerce Website Application or a CRM Application or just a group of Cloud resources put together for an analytics job, DAY2 recognizes them as Applications and enables application-level monitoring, operations, and automation.

DAY2 is an event driven platform from ground up. Any state or configuration change in a cloud account is tracked in DAY2 as an event and is available for the platform to observe and react.  DAY2 CloudOps Automation Rules, enables you to configure rules based on events, select actions that are automatically triggered when conditions are met and respond/remediate without having to write any code. These actions could be as simple as alerting the Application Owner or as sophisticated as executing a custom defined task.

We even jokingly called it our own version of If-This-Then-What! 😉

 

Event Driven Design

DAY2 components are designed as discrete domains. The domains that constitute DAY2 use events to interact with each other and even internally.

Every domain publishes to the event stream and listens to the event streams of other domains. As events occur, the event-message is relayed to the event stream, using an event broker. All subscribers are notified about the availability of the event data in the stream. The subscribers can further filter out and respond or act upon events based on the context and event data.

This keeps the functions decoupled, and we plug in or remove components without impacting other functions as need be.

 

The Application Context

The Application context is important as in the real world you manage applications deployed in the cloud, not just cloud resources.

Having the applications context available for our events is imperative to proactively react to these events appropriately. DAY2’s Discovery and Classification engine performs the job of resource recognition, identification and classification. This metadata is available in the system is queried to find the context.

The Context Decorator connects this metadata to the events. The Context Decorator is plugged into the event stream to add the application context to the events as they are occur.

Application context is available as a part of every DAY2 event. With this lens it is easier to see what an event means to the Application.

 

Actions / Reactions

DAY2 has a built in library of common actions that you can either execute on demand or scheduled for execution. We call them Tasks. Apart from the rich DAY2 Tasks Library, you can also import custom Python Scripts or AWS SSM Automation Documents as custom Tasks and create their own task library.

I will write more on the Task Library in another post. In this case, using the DAY2 Tasks to orchestrate actions certain events occurs, was a natural choice for us.

So, we added support for all custom, user imported, Tasks and native DAY2 Tasks, along with the support to trigger alerts in the form of notifications. Needless to mention, the DAY2 Task Library is a fast-growing library of No-Code CloudOps tasks.

 

CloudOps Automation Rules – The Important Ingredient

Real time events with application context are great but can also bring in a lot of noise. To make them actionable we added ‘CloudOps Automation Rules’. These rules help you filter out the noise, focus only on the relevant events and take action as appropriate. These rules are driven by a rules-engine we developed for DAY2.

This rules engine can deal with complicated rules. It is a decoupled service that can be plugged into any part of the platform, but on the surface, it is simple and easy to use for any user, like everything else in the DAY2.

The UX team ensured that we provide the right user experience while the backend team ensured all the heavy lifting happens behind the scenes. We started with simple platform events, and then added the ability to configure metrics-based events for the cloud resources.

Now an IT admin can configure a rule condition that uses real-time metric data of one or more Cloud resources to measure and match, and act if the condition is fulfilled.

We plugged this in DAY2 for cloudops and application context, and so we have DAY2 CloudOps Automation Rules.

Result – Application-level Alerts & Reactions (If This Then What)

With the DAY2 CloudOps Automation Rules, you can achieve this and much more. All you need to do is use our No-Code CloudOps platform to configure rules for your application, add conditions involving resource metrics or DAY2 events and setup actions. Once done, DAY2 will ensure that these rules are evaluated in the real time and action are triggered as needed. And that is not magic, just awesome engineering at work.

DAY2 CloudOps Automation Rules is available in MontyCloud’s DAY2 platform today, and to learn more about this feature and about MontyCloud’s intelligent Cloud Management Platform, you can click here and request a demo