New Paradigm for SecOps
Atones for the Sins of my Past

I’m an advocate for SIEMs, and have been a staunch believer in correlation rules for the past 15 years. So why did I decide to take the leap and join the Respond Software team?

The simplest explanation is that I joined to atone for the sins of my past. In the words of the great philosopher, Inigo Montoya, “Let me explain…No, there is too much. Let me sum up.”

Coming to terms with the reality of SIEMs

For 15 years I’ve been shouting from the rooftops, “SIEMs will solve all your Security Operations challenges!”  But all my proclamations came into question as soon as I learned about the capabilities of the Respond Analyst.

I’ve held a few different roles during this time, including Sales Engineer, Solutions Architect, and Security Operations Specialist. All of these were pre-sales roles, all bound together by one thing—SIEM expertise. I’ve worked with SIEM since it began and I’ve seen it evolve over the years, even working as part of a team that built a Risk Correlation Engine at OpenService/LogMatrix. Make no mistake about it, I’m still a big fan of SIEM and what it can do for an organization. It doesn’t matter whether you are using a commercial or open source solution, or even built your own, SIEMs still provide a lot of value. For years I helped customers gain visibility into their logs and events, worked with them to meet compliance requirements, and pass their audits with ease. I developed use cases, wrote correlation rules, and firmly believed that every time a correlation rule fired, it would be a true incident worthy of escalation and remediation.

Funny thing about that correlation part, it never really worked out. It became a vicious cycle of tuning and tweaking, filtering, and excluding to reduce the number of firings. It didn’t matter the approach or the technique, the cycle never ended and still goes on today. Organizations used to have one or two people that were responsible for the SIEM, but it wasn’t usually their full-time job. Now we have analysts, administrators, incident responders, and content teams and SIEM is just one of the tools these folks use within the SOC. In order to solve the challenges of SIEM, we have added bodies and layered other solutions on top of it, truly unsustainable for all but the largest of enterprises.

In the back of my mind, I knew there had to be a better way to find the needle in a pile of needles. Eventually, I learned about this company called Respond Software, founded by people like me, who have seen the same challenges, committed the same sins, and who eventually found a better way. I hit their website, read every blog, watched numerous videos, and clicked every available link, learning as much as I could about the company and their solution.

The daily grind of a security analyst: Consoles, false positives, data collection—repeat

I think one of the most interesting things I read on our website was the 2017 Cyentia Institute’s Voice of the Analyst Survey. I can’t say I was surprised, but it turns out that analysts spend most of their time monitoring, staring at a console and waiting for something to happen. It’s no surprise that they ranked it as one of their least favorite activities. It reminded me of one of my customers, who had a small SOC, with a single analyst for each shift. The analyst assigned to the morning shift found it mind-numbing to stare at a console for most of the day. In order to make it a little more exciting, the day would start by clearing every alert, every day, without fail. When I asked why, he said the alerts were always deemed as false positives by the IR team, and no matter how much tuning was done, they were all false positives. At least they were actually using their SIEM for monitoring. I’ve seen multiple companies use their SIEM as an expensive (financially and operationally) log collector, using it only to search logs when an incident was detected through other channels.

My Atonement: Filling the SIEM gaps and helping overworked security analysts

Everything I’ve seen over the years combined with what I learned about our mission here, made the decision to join Respond Software an easy one. Imagine a world where you don’t have to write rules or stare at consoles all day long. No more guessing what is actionable or ruling out hundreds of false positives. Respond Software has broken that cycle with software that takes the best of human judgment at scale and consistent analysis, building upon facts to make sound decisions. The Respond Analyst works 24×7, and never takes a coffee break, never goes on vacation and allows your security team to do what they do best—respond to incidents and not chase false positives.

I’ve seen firsthand the limitations of the traditional methods of detecting incidents, and the impact it has on security operations and the business as a whole. I’ve also seen how the Respond Analyst brings real value to overwhelmed teams, ending the constant struggle of trying to find the one true incident in a sea of alerts.

If you would like to talk to our team of experts and learn more about how you can integrate Robotic Decision Automation into your security infrastructure, contact us:

The Science of Detection Part 2: The Role of Integrated Reasoning in Security Analysis Software

Today’s blog is part two in my science of detection series, and we’ll look at how integrated reasoning in security analysis software leads to better decisions. Be sure to check back in the coming weeks to see the next blogs in our series. In part three, I’ll be taking an in-depth look at the signal quality of detectors, such as signatures, anomalies, behaviors, and logs.

If you’ve been reading our blogs lately, you’ve seen the term “integrated reasoning” used before, so it’s time to give you a deeper explanation of what it means. Integrated reasoning combines multiple sensors and sensor types for analysis and better detection. Before making a security decision, you must take into account a large number of different factors simultaneously.

What Is Integrated Reasoning?

Interestingly, when we started using the term, Julie from our marketing team Googled it and pointed out that it was the name of a new test section introduced in the Graduate Management Admission Test (GMAT) in 2012. What the GMAT section is designed to test in potential MBA candidates is exactly what we mean when we refer to integrated reasoning. It consists of the following skills:

  • Two-part analysis: The ability to identify multiple answers as most correct.
  • Multi-source reasoning: The ability to reason from multiple sources and types of information.
  • Graphic interpretation: The ability to interpret statistical distributions and other graphical information.
  • Table analysis: The ability to interpret tabular information such as patterns or historical data and to understand how useful distinct information is to a given decision.

All of these skills provide a combination of perspectives that allow you to reason and reach a well thought out and accurate conclusion. The same reason we are evaluating our potential MBA candidates against this standard is why we would design to this standard for security analysis software, or if you will, a “virtual” security analyst.

What is an MBA graduate but a decision maker? Fortunately, we are training our future business leaders on integrated reasoning skills, but when the number of factors to be considered increases, humans get worse at making decisions — especially when they need to be made rapidly. Whether from lack of attention, lack of time, bias or a myriad of other reasons, people don’t make rational decisions most of the time.

However, when you’re reasoning and using all of the available information in a systematic manner, you have a much greater chance of identifying the best answer. To put this within a security analysis frame of reference, let’s consider some of the information available to us to make effective security decisions.

What Information Should We Consider?

The most effective security analysis software uses anything that is observable within the environment and reduces the uncertainty that any one event should be investigated.

To achieve integrated reasoning, the software should utilize a combination of detectors, including:

  • Signature-based alerts
  • Detection analytics
  • Behaviors
  • Patterns
  • History
  • Threat intelligence
  • Additional contextual information

In order to make the right decisions, security analysis software should take into account three important factors: sensors, perspective and context. When you combine different forms of security telemetry, like network security sensors and host-based sensors, you have a greater chance of detecting maliciousness. Then, if you deliberately overlap that diverse suite of sensors, you now have a form of logical triangulation. Then add context, and you can understand the importance of each alert. Boom, a good decision!

Like our theoretical MBA candidate, security analysts have to hold hundreds of relevant factors in their minds simultaneously and are charged with making a number of critical decisions every hour. A tall order for a mere mortal, indeed.

Imagine this: A user receives a phishing email, clicks on the link a week later and is infected by malware. The system anti-virus reports “cleaned” but only found 1 of 4 pieces of malware installed. The remaining malware communicates to a command-and-control server and is used as an internal waypoint for lateral exploration very low and slow. This generates thousands of events over a period of weeks or months, but all of them have varying levels of fidelity. More likely, this is the backstory that an incident responder would eventually assemble potentially months — or years — after the fact to explain a breach.

Integrated Reasoning is a must for making sound decisions when it comes to deciding which security alerts to escalate for further examination. But with the amount of incoming data increasing by the minute, security teams are having a hard time keeping up. Your best bet is to choose security analysis software, like the Respond Analyst, that has built-in integrated reasoning capabilities to help with decision-making, so teams can focus on highly likely security incidents.

Curious to see how the Respond Analyst’s integrated reasoning capabilities can help your security team make better decisions? Request a demo today.

The Science of Detection Part 1: 5 Essential Elements of a Data Source

I’m passionate about the science of detection. It used to be a black art, like long distance high-frequency radio communication, but with modern cybersecurity technology, we’re putting the science back in. With that in mind, I plan to write a series of blogs about the science of detection with an aim to enable more effective and rapid identification of “maliciousness” in our enterprise technology.

In today’s blog, we’ll look at the key elements of a data source to ensure effective detection. Be sure to check back in the coming weeks to see the next blogs in the series. In parts two and three, I’ll be taking an in-depth look at how integrated reasoning will fundamentally change detection technology and the signal quality of detectors, such as signatures, anomalies, behaviors and logs.

In operational security, we monitor various pieces of technology in our network for hackers and malware. We look at logs and specialized security sensors, and we use context and intelligence to try to identify the “bad guys” from the noise. Often, a lot of this work is “best effort” since we’re collecting data from other technology teams who are only using it to troubleshoot performance, capacity and availability issues — not security. It can be a challenge to configure these sources to make them specific to the needs of security, and this greatly complicates our success rate.

When we look at the data sources or telemetries that we monitor, there are five elements that are important for their effectiveness in detecting malicious activity.

1. Visibility

Visibility is one of the most important elements of a data source. What can you see? Is this a network sensor? Are you decrypting traffic so that you can see the patterns or signatures of an attack? Or is this a system log source where stealthy user or administrator behaviors can be captured? When you’re considering visibility, there are two things that are key: the location of the sensor and the tuning of the events, alerts or logs that it generates. For signature-based data sources, it’s tremendously important that you keep them up to date consistently and tuned for maximum signal.

2. Signal Quality

We look at signal quality to help determine the likelihood that any given signature or alarm will reliably indicate the presence of malicious activity. When you consider network intrusion detection and prevention sensors, things get really complicated. I have seen the same IDS signature alarm between different hosts in one day where one instance was a false-positive, and the other instance was malicious. How are we supposed to separate those two out? Not without deep analysis that considers many additional factors.

3. Architecture

With the advent of autonomous analysis and integrated reasoning, the architecture of your sensor grid can provide significant advantages. The most important is sensor overlap, which means different types of sensors should be implemented in the infrastructure so that attackers must get past more than one detection point.

A good example would be host-based endpoint protection agents in a user environment. By forcing users to then transit a network intrusion detection sensor and maybe even a URL filter in order to conduct business on the internet, you end up with three perspectives and three chances to recognize systems that are behaving maliciously. This means it’s important to deploy internal (East – West) network sensors to corroborate your other sensing platforms in order to reduce false positives and produce high fidelity incidents. You can fool me once, but fooling me twice or a third time gets much harder.

4. Data Management

All of our sensors should be easy to aggregate into a single data platform using common log transport methods. This can be a SIEM or a big data platform. It’s also tremendously important to capture all of the fields that can help us contextualize and understand the alerts we’ve observed. This data platform becomes the incident response and forensic repository for root cause analysis and is a good hunting ground for a hunt team.

5. Event Alignment

Given the complex nature of the modern enterprise, it’s possible for a user’s laptop to have 10 or 15 different IP addresses in any given day. We need to be able to reassemble that information to find the host that’s infected. A good example would be to collect hostname rather than just IP address,where it’s available. Proxies, firewalls and NAT devices can all effectively blind you when looking for malicious internal hosts. In fact, one Security Operations Center I built could not locate 50% of known compromised assets due to a combination of network design and geography.

A combination of perspectives provides the most effective sensor grid. Leveraging multiple forms of visibility, improving the signal quality of your sources, architecting for sensor overlap and key detection chokepoints, and streaming all of this data into an effective big data management system where it can be analyzed and leveraged across the operational security lifecycle can provide a far more effective security operations capability.

How the Respond Analyst Can Help You

The Respond Analyst is able to understand these telemetries and contextual data sources and considers all factors in real-time. This frees you from monitoring a console of alerts, which allows you to focus on higher-value work. It also frees your detection programs from the volume limitations of human monitoring. Putting all of these elements together provides a massive improvement in your ability to detect intruders before they can do major damage to your enterprise technology. We’re putting machines in front of alerts so that humans can focus on situations.

Neither SIEM nor SOAR–Can Security Decisions be Automated? Patrick Gray and Mike Armistead Discuss

We’ve asked the questions before, but we’ll ask it again: how much time does your security team spend staring at monitors? How about investigating false-positives escalated from an MSSP? More importantly, how are small security teams expected to cope with the growing amount of security data?

The world of security operations is changing. Extra processing power combined with faster mathematical computations, means security monitoring and event triage can now be analyzed at machine-scale and speed. With new innovations that leverage decision-automation, security organizations can analyze incidents more efficiently than ever before. Security teams no longer have to tune down or ignore low-signal events. Instead, technologies can now recognize patterns to identify malicious attacks that may have otherwise been overlooked.

So how will these new technologies impact security operations moving forward?
Mike Armistead, Respond Software CEO, recently sat down with Patrick Gray, from Risky Business, to discuss the state of information security today. In the 30-minute podcast, Mike and Patrick shed light on the future of security operations, discussing the limitations of traditional security monitoring/analysis techniques and the power of new technologies, like decision automation to change security forever.

During this podcast you’ll learn to:

  • Identify the biggest mistakes security teams make today and how to avoid it.
  • Manage the onslaught of data.
  • Increase your team’s capacity.
  • Stop wasting time chasing false-positives.

Listen to the full podcast, here!

Learn more about what the Respond Analyst can do for you!

Mid-sized Enterprises: Want Robust, Sustainable SecOps? Remember 3 C’s

Cybersecurity is tricky business for the mid-sized enterprise.

Attacks targeting mid-sized companies are on the rise, but their security teams are generally resource constrained and have a tough time covering all the potential threats.

There are solutions that provide sustainable security infrastructures but the vendor landscape is confusing and difficult to navigate. With smaller teams and more than 1,200 cybersecurity vendors in the market, it’s no wonder mid-sized enterprise IT departments often stick with “status quo” solutions that provide bare-minimum coverage. The IT leaders I talk to, secretly tell me they know bare-bones security is a calculated risk but often executive support for resources is just not there.  These are tradeoffs that smaller security teams should not have to make.

Here’s the good news.  Building a solid enterprise-scale security program without tradeoffs is possible. To get started IT leaders should consider the 3 C’s of a sustainable security infrastructure: Coverage, Context, and Cost.

In part 1 of this 3-part blog series, we will deep-dive into the first “C”: Coverage.

When thinking about coverage, there are two challenges to overcome. The first challenge is to achieve broad visibility into your sensors. There is a wide array of security sensors and it’s easy to get overwhelmed by the avalanche of data they generate. Customers often ask me: Do we have to monitor everything? Where do I begin? Are certain sensor alerts better indications of compromise than others?

Take the first step: Achieve visibility with appropriate sensor coverage

To minimize blind spots, start by achieving basic 24 x 7 coverage with continuous monitoring of Network Intrusion Detection & Prevention (NIDS/NIPS) and Endpoint Protection Platform (EPP) activity. NIDS/NIPS solutions leverage signatures to detect a wide variety of threats within your network, alerting on unauthorized inbound, lateral, and outbound network communications. Vendors like Palo Alto Networks, TrendMicro and Cisco have solid solutions. Suricata and Snort are two popular open-source alternatives. EPP solutions (Symantec, McAfee, Microsoft) also leverage signatures to detect a variety of threats (e.g. Trojans, Ransomware, Spyware, etc) and their alerts are strong indicators of known malware infections.

Both NIDS/NIPS and EPP technologies use signatures to detect threats and provide broad coverage of a variety of attacks, however, they do not cover everything.  To learn more on this topic read our eBook: 5 Ingredients to Help your Security Team Perform at Enterprise-Scale

To gain deeper visibility IT departments can eventually start to pursue advanced coverage.

With advanced coverage, IT teams can augment basic 24 x 7 data sensor coverage by monitoring web proxy, URL filtering, and/or endpoint detection and response (EDR). These augmented data sources offer opportunities to gain deeper visibility into previously unknown attacks because they report on raw activity and do not rely on attack signatures like NIDS/NIPS and EPP. Web proxy and URL filtering solutions log all internal web browsing activity, and as a result, provides in-depth visibility into one of the most commonly exploited channels that attackers use to compromise internal systems. In addition, EDR solutions act as a DVR on the system, recording every operation performed by the operating system—including all operations initiated by adversaries or malware. Of course, the hurdle to overcome with these advanced coverage solutions is managing the vast amounts of data they produce.

This leads to the second coverage challenge to overcome—obtaining the required expertise and capacity necessary to analyze the mountains of data generated.

As sensor coverage grows, more data is generated with each sensor type, creating data with unique challenges. Some sensors are extremely noisy and generate massive amounts of data. Others generate less data but are highly specialized and require a great deal more skill to analyze. To deal with the volume of data, common approaches are to ‘tune down’ sensors (which literally filters out potentially valuable data). This type of filtering is tempting since it essentially reduces the workload of a security team to a more manageable level. In doing so, however, clues to potential threats stay hidden in the data.

Take the second step: Consider security automation to improve coverage with resource-constrained teams.

Automation effectively offers smaller security teams the same capability that a full-scale Security Operations Center (SOC) team provides a larger organization, at a fraction of the investment and hassle.

Automation improves the status quo and stops the tradeoffs that IT organizations make every day. Smaller teams benefit with advanced security operations. Manual monitoring stops. Teams can keep up with the volume of data and can ensure that the analysis of each and every event is thorough and consistent. Security automation also provides continuous and effective network security monitoring and reduces time to respond. Alert collection, analysis, prioritization, and event escalation decisions can be fully or partially automated.

So to close, more Coverage for smaller security teams is, in fact, possible: First, find the right tools to gain more visibility across the network and endpoints. Second, start to think about solutions that automate the expert analysis of the data that increased visibility produces.

But, remember, ‘Coverage’ is just 1 part of this 3-part puzzle. Be sure to check back next month for part 2 of my 3 C’s (Coverage, Context, Cost) blog series. My blog on “Context” will provide a deeper dive into automation and will demonstrate how mid-sized enterprise organizations can gain more insights from their security data—ultimately finding more credible threats.

In the meantime, please reach out if you’d like to talk to one of our Security Architect to discuss coverage in your environment.

Why It’s Time to Go Back To The Basics of SOC Design

The average SOC is no more prepared to solve their cybersecurity issues today, than they were 10 to 20 years ago. Many security applications have been developed to help protect your network, but SOC Design has traditionally remained the same.

Yes, it’s true we have seen advancements like improved management of data with SIEMS and Orchestration of resolutions, but these tools haven’t resolved the fundamental challenges. Data generated from the most basic security alerts and incidents are overwhelming and still plague the most advanced security organizations.

Which begs the question: How are smaller, resource-constrained security organizations expected to keep up when even enterprise-sized organizations can’t?

According to a recent article in Computer Weekly, the issue is that most organizations, even with the tools & the know-how, are still getting the basics all wrong.

“Spending on IT security is at an all-time high. The volume of security offerings to cover every possible facet of security is unparalleled…The reason so many organisations suffer breaches is simply down to a failure in doing the very basics of security. It doesn’t matter how much security technology you buy, you will fail. It is time to get back to basics.”.

The article mentions that security operations teams need to focus these four key areas to really see any impact positively affecting their SOC design:

  1. Security Strategy
  2. Security Policy
  3. User Awareness
  4. User Change

But is it as simple as this?

The answer is a resounding YES!

There is no question that it’s still possible to cover the basics in security strategy and achieve enterprise security results. Our recommendation? Start with the most tedious and time-sucking part of security analyst role — analysis and triage of all collected security data. Let your team focus on higher-priority tasks like cyber threat hunting. It’s where you’ll get the biggest bang for your buck.

How Automating Long Tail Analysis Helps Security Incident Response

Today’s modern cybersecurity solutions must scale to unparalleled levels due to constantly expanding attack surfaces resulting in enormous volumes of diverse data to be processed. Scale issues have migrated from just the sheer volume of traffic, such as IOT led DDoS attacks and the traffic from multiple devices, to the need for absolute speed in identifying and catching the bad guys.

Long tail analysis is narrowed down to looking for very weak signals from attackers who are technologically savvy enough to stay under your radar and remain undetected.

But, what’s the most efficient and best way to accomplish what can be a time-consuming and a highly repetitive tasks?

What is Long Tail Analysis?

You might be wondering what the theory is behind long tail analysis, even though you’re familiar with the term and could already be performing these actions frequently in your security environment.  The term Long Tail first emerged in 2004 and was created by Wired editor-in-chief, Chris Anderson to describe “the new marketplace.” His theory is that our culture and economy is increasingly shifting away from a focus on a relatively small number of “hits” (mainstream products and markets) at the head of the demand curve and toward a huge number of niches in the tail.

In a nutshell and from a visual standpoint, this is how we explain long tail analysis in cybersecurity:  You’re threat hunting for those least common events that will be the most useful in understanding anomalous behaviour in your environments.

Finding Needles in Stacks of Needles

Consider the mountains of data generated from all your security sources. It’s extremely challenging to extract weak signals while avoiding all the false positives. Our attempt to resolve this challenge is to provide analysts with banks of monitors displaying different dashboards they need to be familiar with in order to detect malicious patterns.  As you know, this doesn’t scale.  We cannot expect a person to react to these dashboards consistently.  Nor do we expect them to “do all the things”.

Instead, experienced analysts enjoy digging into the data.  They’ll pivot into one of the many security solutions used to combat cybersecurity threats such as log management solutions, packet analysis platforms, and even some endpoint agents all designed to record and playback a historical record.  We break down common behaviours looking for those outliers.  We zero in on these ‘niche’ activities and understand them one at a time. Unfortunately, we can’t always get to each permutation and they are left unresolved.

Four Long Steps of Long Tail Analysis in the SOC

If you are unfamiliar with long tail analysis, here are 4 steps of how a typical analyst will work through it:

Step 1: First, you identify events of interest like a user authentication or web site connections.  Then, you determine how to aggregate the events in a way that provides enough meaning for analysis. Example:  Graph user account by the number of authentication events or web domains by the number of connections.

Step 2: Once the aggregated data is grouped together, the distribution might be skewed in a particular direction with a long tail either to the left or right.  You might be particularly interested in the objects that fall within that long tail.  These are the objects that are extracted, in table format, for further analysis.

Step 3: For each object, you investigate as required. For authentications, you would look at the account owner, the number of authentication events, the purpose of the account.  All with the intended goal of understanding why that specific behaviour is occurring.

Step 4: You then decide what actions to take and move on to the next object.  Typically, the next steps include working with incident responders or your IT team.  Alternatively, you might decide to simply ignore the event and repeat Step 3 with the next object.

Is There a Better Solution?

At Respond Software, we’re confident that long tail analysis can be automated to make your team more efficient at threat hunting. As we continue to build Respond Analyst modules, we move closer to delivering on that promise — and dramatically improve your ability to defend your business.

PERCEPTION VS. REALITY: The Myth of 100 Security Data Sources

The realities of security monitoring and the promise of SIEM?

In enterprise IT, data is collected from any number of IT and security devices, and then used to monitor, protect, understand and manage our technology-enabled businesses. Due to the ever-expanding attack surface, the amount of data collected today is overwhelmingly unmanageable, and ironically, we only have a very general idea of what value it should provide. One of the most common objections I hear when talking about security operations is, “I have 100 or more data sources to monitor.” What makes this statement a myth, is not that we collect data from more than 100 data sources, but how we use that data in our security operations programs and how it gets reported to executive management.

What’s Your Monitoring Effectiveness?

With decades of direct experience managing many security operations teams that collect mountains of data, it’s evident to me they barely leverage any of it for security monitoring and detection purposes. If you have 100 data sources reporting into your Security Information and Event Management system (SIEM), but you only have rule logic applied to 5-10 of them (and then maybe only one or two rules for most) what is your actual monitoring effectiveness? To understand this more deeply, we’ll review here all the uses for data collected, in addition to what data applies to particular uses within our security programs.

“I have seen the largest most sophisticated companies on the planet report the fact they collect 100+ data sources but fail to explain that only 3-5 of them have any form of rule logic or other use that can support the cost of collecting them all.”

How Enterprise Data is Used in the Security Environment

The diversity of users needing access to this information presents a key friction point for data aggregation and collection, this is very often seen in shared Splunk instances. Previously, business intelligence strategies were implemented to architect a data warehouse to collect everything and then create various datamarts specific to each user’s information requirements. Today, big data and data-bus solutions have solved this issue, however, they are not fully adopted and deployed with all users in mind. IT operations and IT security present the highest conflict situation because they are the most likely to be on a shared infrastructure with very different business requirements. The team responsible for managing the SIEM (or big data solution) and collecting data has the mission to collect everything, since more is better, right? Often the operations team does not have sufficient authority to get engineering to focus narrowly on what matters to them for detection and response. This is the tail wagging the dog!

Monitoring & Analysis

Looking at data in a time sequence from the first time it’s collected until it is discarded at the end of its information lifecycle; we generally start with analytical monitoring in as close to real-time as possible. Within monitoring, we pay attention to performance, health status, availability, configuration, and security. Each of these requires different fields from the logs produced. They are analyzed entirely differently and are all high volume, low signal monitoring problems.

Typically, the security signal comes from malicious signatures, traffic patterns or behaviors. However, another theory exists which is that anomalies are more likely to be malicious than signatures. I do not agree with this theory, but merely add them to the mix along with signatures and all other potential indicators of malicious activity. These have historically proven extremely difficult for humans to monitor at scale effectively, which is why automation is the more important requirement. 

Reporting & Metrics

One of the next uses in the lifecycle for the data we collect is reporting and metrics. These are designed to answer questions of governance, risk, compliance (GRC), operations analysis and overall effectiveness and efficiency of the business infrastructure. They rarely do, the common top 10 report is an example of “data” without “information.”

There is a ubiquitous statement attributed to Edward Deming — “You get what you measure.” A key challenge in operational security is the inability to secure sufficient budget to support security operations adequately. In these cases, many of the metrics reported are used for budget justification rather than measuring effectiveness and efficiency. It is tough to prove a “cost avoidance ROI.”

The lack of nuance and the naked agendas that are so common in reporting and metrics result in the “Myth of 100 Data Sources.” I have seen the largest most sophisticated companies on the planet report the fact they cover 100+ data sources but fail to explain that only 3-5 of them have any form of logic or use that can support the cost of collecting them all. The reasoning that “we might use it someday, forensically,” is a feeble and expensive justification.


Hunting, a relatively new security discipline, and in position to overtake monitoring rapidly. Hunting requires a more extended and more in-depth set of data.  This includes tasks such as identifying hyper-current attack methods, locating the appropriate data sources, doing pivot-and-search from suspicious sources, looking at specific slices of time where enterprise activity was deemed to be suspicious and visualizing large amounts of data.  In the security operation centers I have been involved in building, we always dedicated a four-hour block of time on Friday to conduct a retrospective hunt over the last week.  In that hunt we discovered that, for the most part, more incidents were missed than detected during 24×7 operational monitoring. The highest quality incidents were found during that retrospective hunt.

Forensic Analysis

Once we’ve identified a security situation exists, then some data becomes useful for forensic analysis. This includes determining the extent of an intrusion, performing root cause analysis supporting or confirming the conclusions of incident response teams and supporting investigations as requested by the corporate legal department. Where data is likely to be presented for forensic purposes in a court of law, you must be able to establish that it was either collected as a business record or has been maintained in a forensically approved data store to verify that it could not have been maliciously modified.


One of the most critical applications of IT data, especially as data volumes scale beyond human management, is process automation. The types of automation that are appropriate include things like the application of logic and algorithms to identify specific issues within the data, the addition of context, business process intelligence for operational improvement and the automation/orchestration of appropriate response action based on security situations detected.

There are many data collection automation opportunities. However, one thing is still valid — there is still a ton of useless data that needs sifting through when automating from raw telemetry. This fact is one of the guiding principles of what I call the “small data” movement as opposed to the big data movement; where you find an actual use for the data before you collect and store a ton of it.


The data sources we collect provide visibility across many different technical perspectives. There is network telemetry, security telemetry, application telemetry, host-based telemetry, cloud telemetry, and contextual telemetry, to start. Within each of these categories, we have infrastructure devices that produce logs and sensors that monitor, whether for operations or security. 

Even within a category each vendor or specific implementation contains radically different information in its log files or alert stream. While there are common logging formats and methods, there is very little agreement or commonality in the information contained used for specific purposes. Logs are highly inconsistent and hard to leverage for any purpose effectively.

Let’s walk through a high-level list of the type of sources were talking about:


Network sources include routers, switches, load balancers, and other LAN/WAN network infrastructure equipment. These sources provide visibility on what is transiting your network. Inbound, lateral and outbound are all interesting viewpoints, but these devices mostly provide performance and availability information and are highly repetitive in their log messages. These sources would also include network specific sensors like deep packet inspection and network flow collection these being used for anomaly detection, performance monitoring, and forensic review.


With security devices, there are some blended network infrastructure technologies like firewalls and proxies, and also dedicated security infrastructure like Identity and Access Management systems or Intrusion Prevention sensors. Security focuses on sensors that detect potential attack signatures and behaviors at various chokepoints and critical nodes. These sensors are looking for signatures or anomalies that might indicate suspicious activity in the form of malicious hackers or code. These also have a high noise to signal ratio and are hard to analyze.


Application telemetry is the least mature and has the highest attack rate. This runs the gamut from web server logs to user-experience application monitoring for custom e-commerce applications. Application telemetry is highly focused on performance and availability, and almost wholly ignores all but very basic security uses. This makes them almost useless for detection monitoring. The main exception is the authentication of users but that is a shallow set of use cases.


Host-based also has multiple categories.  The native operating system has defensive and logging facilities that can be monitored, though the volume is extremely high with only a tiny number of events indicating maliciousness. There are also host-based agents; from NexGen Antivirus (NGAV), Endpoint Protection Platforms (EPP), Endpoint Detection and Response (EDR) or simply IT operations instrumentation.


With cloud as the newest critical component in our business IT strategy, it also has most of its focus on performance and availability rather than security controls.  There are still minimal use cases detected through monitoring in cloud tools. The primary argument is that much of the traditional security is handled by the cloud provider and invisible to the user. This situation led to the development of the Cloud Access Security Broker (CASB), so the cloud user could demonstrate compliance and security requirements without resorting to the provider’s controls. Native cloud telemetry is very shallow at the moment with CASBs filling the gap.


Context is a category that many people fail to think about thoroughly. Maintaining a historical record of context is essential to identifying assets impacted but which are transient on the network. For example, what IP address did an offending host have at any given point in time?  Can I map back to the hostname of that asset and how far back in time can I go to locate a malicious incident on a highly mobile endpoint? We commonly refer to this use as event alignment. This same problem applies to users, and all of that contextual information is very critical to making informed decisions. Criticality of every entity in the enterprise is also key to business prioritized risk decisions.


Not surprisingly, data sources are all narcissistic. They only talk about themselves and often repeat things that are not valuable. Routers are notorious for talking about route flapping; route-up, route-down, route-flap, If no one cares, why do we collect it? Unfortunately, an agreed upon set of informational fields across data categories, vendors and applications does not exist.  If it did, this would allow us to derive much higher value from all this data. 

Moral of the story — don’t get sucked into the myth that collecting 100 data sources into a single data platform, is providing real value for operational security or any of the other applications. The reality is, these efforts are not providing much value at all and most likely increasing your cost and volume without reducing your risk significantly. Collect only what matters and focus on deriving deeper value from it via automation. And remember, humans should not monitor consoles for high noise, low signal use cases. 

Ripping off the Bandage: How AI is Changing the SOC Maturity Model

The introduction of virtual analysts, artificial intelligence and other advanced technologies into the Security Operations Center (SOC) is changing how we should think about maturity models. AI is replacing traditional human tasks, and when those tasks are automated the code effectively becomes the procedure. Is that a -1 or a +10 for security operations? Let’s discuss that.

To see the big picture here, we should review what a maturity model is and why we are using them for formal security operations. A maturity model is a process methodology that drives good documentation, repeatability, metrics and continuous improvement. The assumption being that these are a proxy for effectiveness and efficiency. The most common model used in Security Operations is a variant of the Carnegie Mellon, Capability Maturity Model for Integration (CMMI). Many process methods focus on defect management, this is even more evident in the CMMI since it originated in the software industry.

In the early 2000’s, we started using CMMI at IBM, Big Blue insisted that we couldn’t offer a commercial service that wasn’t on a maturity path and they had adopted CMMI across the entire company at that point. We had, at that time, what seemed like a never-ending series of failures in our security monitoring services, and for each failure a new “bandage” in the form of a process or procedure was applied. After a few years we had an enormous list of processes and procedures, each connected to the other in a PERT chart of SOC formality. Most of these “bandages” were intended to provide guidance and support to analysts as they conducted security monitoring and to prevent predictable failures, so we could offer a consistent and repeatable service across shifts and customers.

To understand this better, let’s look at the 5 levels of the CMMI model:

  1. Initial (ad hoc)
  2. Managed (can be repeated)
  3. Defined (is repeated)
  4. Measured (is appropriately measured)
  5. Self-optimizing (measurements leads to improvements)

This well-defined approach seemed to be perfect. It allowed us to take junior analysts and empower them to have a consistent level of service delivery. We could repeat ourselves across customers. We might not deliver the most effective results, but we could at least be reasonably consistent. As it turns out, people don’t like working in such structured roles because there’s little room for creativity or curiosity. Not surprisingly, this gave rise to the 18-24 month security analyst turn-over phenomenon. Many early analysts came from help desk positions and were escaping “call resolution” metrics in the first place.

Our application of SOC maturity morphed over the years from solving consistency problems into consistently repeating the wrong things because they could be easily measured. When failures happened, we were now in the habit of applying the same “bandages” over and over.  Meanwhile, the bad guys had moved on to new and better attack techniques. I have seen security operations teams follow maturity guidelines right down a black hole, when for example, a minor SIEM content change can take months, not the few hours it should take.

According to the HPE Security Operations Maturity report, the industry median maturity score is 1.4, or slightly better than ad-hoc. I’m only aware of 2 SOCs in the world that are CMMI 3.0.  So, while across the industry we are measuring our repeatability and hoping that it equates to effectiveness and efficiency, we are still highly immature, and this is reflected in the almost daily breaches being reported. You can also see this in the multi-year sine wave of SOC capability many organizations experience; it goes something like this:

  1. Breach
  2. Response
  3. New SOC or SOC rebuild
  4. Delivery challenges
  5. Maturity program
  6. Difficulty articulating ROI
  7. Cost reductions
  8. Outsourcing
  9. Breach
  10. Repeat

With a virtual analyst, your SOC can now leap to CMMI level 5 for what was traditionally a human-only task. An AI-based virtual analyst, like the Respond Analyst, conducts deep analysis in a consistent fashion and learns rationally from experience. This approach provides effective monitoring in real time and puts EVERY SINGLE security-relevant event under scrutiny. Not only that, you liberate your people from rigorous process control, and allow them to hunt for novel or persistent attackers using their creativity and curiosity.

This will tip the balance towards the defender and we need all the help we can get!

When Currency is Time, Spend it Threat Hunting

“Time is what we want most, but what we use worst.”
– William Penn

How many valuable cybersecurity tasks have you put aside due to the pressures of time? Time is currency and we spend it every moment we’re protecting our enterprises.

When we are constantly tuning, supporting and maintaining our security controls or chasing down an alert from an MSSP, only to discover it’s yet another false positive, we spend precious currency. When we create new correlation logic in our SIEM or decide which signatures to tune down to lower the volume of events to make it more manageable for our security team, we spend precious currency. When we analyze events from a SIEM to determine if they’re malicious and actionable or if a SIEM rule needs additional refinement, we spend precious currency. When we hire and train new analysts to cover churn, then watch them leave for a new opportunity – we waste currency and the investment hurts.

You can spend your “currency” doing pretty much anything, which is a blessing and a curse. We can (and do) waste an inordinate amount of time going down rabbit holes chasing false positives. We are forced to make choices: do we push back a request while we investigate the MSSP escalations or do we delay an investigation to provide the service agility the enterprise requires?

Both options are important, and both need addressing; forcing us to make a choice. In our gut we think the escalation is another false positive, but as cybersecurity professionals; we wait for the sword of Damocles to fall. It’s only a matter of time before one of these escalations is related to the thing we worry about most in our environments. Either way, something gets delayed…. hopefully just lunch.

Basing decisions on what we can neglect is reactive and unsustainable. It’s a matter of time until we choose to postpone the wrong thing.

We need to use our time more wisely.

Organizations need to spend precious “currency” focusing on higher value tasks, like threat hunting, that motivate their talent and provide value to the organization. But also need to maintain two hands on the wheel of lower value tasks that still need attention.

Organizations should implement automation tools to focus on the lower-value, repetitive tasks such as high-volume network security monitoring. Generating and receiving alerts from your security controls is easy, making sense and determining if they’re malicious and actionable is a different story. The decision to escalate events is typically inconsistent and heavily relies on the analyst making the decision. Factor in the amount of time required to gather supporting evidence and then make a decision, while doing this an additional 75 times an hour. As a defender, you don’t have enough “currency of time” to make consistent, highly-accurate decisions. Security analysts tasked with monitoring high-noise, low-signal event feeds is a misallocation of time that only leads to a lack of job satisfaction and burnout.

There is another way.

Employing Respond Analyst is like adding a virtual team of expert, superhuman analysts and will allow your team to, bring their talent and expertise to threat hunting. Adding Respond Analyst allows your talent to focus on higher value tasks and more engaging work so you can combat analyst burnout, training drains, and churn.

Join our growing community! Subscribe to our newsletter, the "First Responder Notebook," delivered straight to your inbox.