Is It Time to Take AIOps Seriously?

Eric Repec Written by Eric Repec

The term AIOps can be traced back to Gartner to explain the evolution in the I&O monitoring space where vendors and customers began leveraging mature capabilities in the Artificial Intelligence and Machine Learning spaces.  These efforts were primarily driven to lower the total cost of ownership and raise the quality of said monitoring solution.  For now, this is the main thrust for this technology, but it is changing ever so slowly...why?

Currently, Gartner’s Hype Cycle puts AIOps at the "Peak of Inflated Expectations in 2021."  This means that the industry is still adopting this technology and the vendors are trying to define and push the limits of its capability, i.e., build a groundswell of hype! Over 1/3 of organizations have AI implemented and per Enterprise Management Associates, another 40% have it actively underway. IDC predicts that, by 2021, 70% of CIOs will have aggressively adopted AIOps platforms.

Why all the fuss? Should you pay attention to AIOps?

Yes! In short, this will be a game-changing capability that early adopters will cash in on by lowering the total cost of doing business.  This also serves as an enabling capability for companies that are adopting the new mantra, hyperfocus on the customer.  Furthermore, this is an important part of the approach that FAANG vendors are using to redefine customer relationships in the modern world(2).  These vendors are truly using AI to make every customer experience a unique one.  What does this mean for I&O monitoring?  Exponential increase in the Volume, Velocity, and Variety of data needing analysis to derive the health of the application and infrastructure that the business must use to interact with their customers in this new era.  AIOps will drive the conversation that brings IT Operations to the business table when the business decides or is forced to adopt these new FAANG vendor processes, more on this later.  Also, if IT Operations is not at the table, the risk to the business will be extremely high as well as the fact that the business will be missing out on a very rich stream of telemetry to help make decisions in this new, customer-first world. 

What tools are my peers using?

Participate in our short survey below and we'll send you the results and

review your tools in the context of our proprietary reference architecture! 

Tell Me More!

 

What technologies do you need for a well-rounded solution? 

I'd like to break the collective technologies into what I term monitoring domains.  They are comprised of technologies, tools, APIs, data sources, etc., that expose telemetry and events that contain evidence that can be used to answer any question, i.e., What is the health of my application, and, if poor, what is the root cause?

 Following are the four domains covered by monitoring tools:

    • Wire data
    • Machine data
    • Process data
    • Application data

Next, we need a platform to perform the analysis and execute the actions that provide intelligence when all the telemetry and events are looked at as a whole.  This platform is very similar to a modern-day analytics platform used by data scientists.  You must have the ability to wire up data pipelines from all the domain tools, with capabilities to transform, analyze and augment data in flight as well as the ability to store the raw and processed data. Below are the key categories of data platform tools:

    • Data Platform
    • Data Bus/Data Pipeline
    • Automation

Finally, you need a base of algorithms to do the bulk of the work to bring structure to the data. These algorithms can be a part of a vendor’s product or created by the customer.  Both approaches have their pros and cons but ultimately the success of the algorithms is directly related to the maturity of the use case and supporting research. Following are examples of the different algorithmic approaches:

    • Data enrichment
    • Machine learning algorithms
    • Classification
    • Prediction/Projection
    • Grouping
    • Thinning and deduplicate events
    • Dependency mapping
    • Event correlation
    • Anomaly detection
    • Root cause analysis
    • Predictive analysis

According to Gartner all of these technologies can be grouped into the following three categories(3):  

  • 1) Domain-centric AIOps tools:These tools are an integrated collection of domain tools as well as a data platform that is already configured with algorithms to perform fairly well across a well-defined set of technologies.
  • 2) Domain-agnostic AIOps tools: These are normally management technologies that feed from many domain tools to provide a cross-domain view of the data that addresses a broader range of use cases.

  • 3) DIY AIOps: This is where the customer deploys vendor tools, open-source and homegrown technologies to create the needed solution.  Often grow organically over time to solve specific targeted business situations. These solutions are the best at supporting the unique use case for the company, however, they tend to take a long time to develop, are often immature and expensive to create.

Overall, as companies mature in the AIOps space, they often start with a Domain-centric AIOps approach which is focused on a modern, young application built on newer technologies.  The success of this platform tends to drive further investment into AIOps.  However, when rolled out to the enterprise the original Domain-centric solution will not support the added use cases which often involve legacy or niche technologies, so they have to morph into a Domain-agnostic AIOps solution, which includes targeted technologies and integrations to extend support to these poorly supported environments.

Knowing when to move from a domain-centric approach to a wider Domain-agnostic or DIY approach requires experience and an understanding of the limitations of domain-centric systems, as well as the total cost of ownership and ongoing maintenance of the wider approach being considered.

 

Click here for our free AIOps Reference Architecture

 
Where is all of this Artificial Intelligence going and what is the tie in with the FAANG vendor movement? 

In short, all the data collected from I&O and leveraged to answer questions about the health of the application can have further value in other parts of the business.  For starters, many vendors have already used AIOps data in other areas.  For example, SecOps is using wire data and machine data as well as event raised data from domain tools to identify security issues.  According to my contacts at Moogsoft, they have documented that over 80% of the security anomalies seen at one of their customers who does business hosting services are also exposed as a performance problem in their AIOps platform. 

Now let’s get back to the FANG vendor movement and how this data can help beyond the customer delight improvement in lowering the incident mean time to resolve.  Remember how one of the tool domains above is a feed from the application?  This feed is normally based on transaction instrumentation.  These transactions make up the process that the customer uses to execute the interaction with the business.  This, in turn, is a real-time feed of our customer’s usage patterns and how they are working with us.  A company that is looking to take part in the FAANG vendor movement will be faced with the epic hurdle of rewriting all of their customer interfaces to collect this very telemetry.  Why not instrument the existing legacy application and collect it with very little investment and lower risk?  Collecting this data will provide a stopgap and a business proof step with very little investment, to give the business the ability to learn from their existing customer usage patterns and help them to build the final, customer-focused solution with less risk and lower cost. 

From AIOps to FAANG and beyond

Finally, I foresee, AIOps as a steppingstone to another industry trend called Continuous Intelligence.  This is when telemetry from the business and I&O come together to build a way to leverage ubiquitous data from several sources, mated up with machine learning algorithms to perform grouping and classification to describe and report on patterns that can be used to predict business outcomes.  These predictions can furthermore drive automated business decisions and robotic solutions to come ever closer to autonomous driven business.  The business will become one with the technology once used as a tool to augment and accelerate it.  Overall, a business process will become a series of automation steps powered by algorithms which are making decisions.  Executives will direct their companies by adjustment to the hyperparameters which configure these algorithms.