The Science of Detection Part 1: 5 Essential Elements of a Data Source

I’m passionate about the science of detection. It used to be a black art, like long distance high-frequency radio communication, but with modern cybersecurity technology, we’re putting the science back in. With that in mind, I plan to write a series of blogs about the science of detection with an aim to enable more effective and rapid identification of “maliciousness” in our enterprise technology.

In today’s blog, we’ll look at the key elements of a data source to ensure effective detection. Be sure to check back in the coming weeks to see the next blogs in the series. In parts two and three, I’ll be taking an in-depth look at how integrated reasoning will fundamentally change detection technology and the signal quality of detectors, such as signatures, anomalies, behaviors and logs.

In operational security, we monitor various pieces of technology in our network for hackers and malware. We look at logs and specialized security sensors, and we use context and intelligence to try to identify the “bad guys” from the noise. Often, a lot of this work is “best effort” since we’re collecting data from other technology teams who are only using it to troubleshoot performance, capacity and availability issues — not security. It can be a challenge to configure these sources to make them specific to the needs of security, and this greatly complicates our success rate.

When we look at the data sources or telemetries that we monitor, there are five elements that are important for their effectiveness in detecting malicious activity.

1. Visibility

Visibility is one of the most important elements of a data source. What can you see? Is this a network sensor? Are you decrypting traffic so that you can see the patterns or signatures of an attack? Or is this a system log source where stealthy user or administrator behaviors can be captured? When you’re considering visibility, there are two things that are key: the location of the sensor and the tuning of the events, alerts or logs that it generates. For signature-based data sources, it’s tremendously important that you keep them up to date consistently and tuned for maximum signal.

2. Signal Quality

We look at signal quality to help determine the likelihood that any given signature or alarm will reliably indicate the presence of malicious activity. When you consider network intrusion detection and prevention sensors, things get really complicated. I have seen the same IDS signature alarm between different hosts in one day where one instance was a false-positive, and the other instance was malicious. How are we supposed to separate those two out? Not without deep analysis that considers many additional factors. 

3. Architecture

With the advent of autonomous analysis and integrated reasoning, the architecture of your sensor grid can provide significant advantages. The most important is sensor overlap, which means different types of sensors should be implemented in the infrastructure so that attackers must get past more than one detection point.

A good example would be host-based endpoint protection agents in a user environment. By forcing users to then transit a network intrusion detection sensor and maybe even a URL filter in order to conduct business on the internet, you end up with three perspectives and three chances to recognize systems that are behaving maliciously. This means it’s important to deploy internal (East – West) network sensors to corroborate your other sensing platforms in order to reduce false positives and produce high fidelity incidents. You can fool me once, but fooling me twice or a third time gets much harder.

4. Data Management

All of our sensors should be easy to aggregate into a single data platform using common log transport methods. This can be a SIEM or a big data platform. It’s also tremendously important to capture all of the fields that can help us contextualize and understand the alerts we’ve observed. This data platform becomes the incident response and forensic repository for root cause analysis and is a good hunting ground for a hunt team.

5. Event Alignment

Given the complex nature of the modern enterprise, it’s possible for a user’s laptop to have 10 or 15 different IP addresses in any given day. We need to be able to reassemble that information to find the host that’s infected. A good example would be to collect hostname rather than just IP address,where it’s available. Proxies, firewalls and NAT devices can all effectively blind you when looking for malicious internal hosts. In fact, one Security Operations Center I built could not locate 50% of known compromised assets due to a combination of network design and geography.

A combination of perspectives provides the most effective sensor grid. Leveraging multiple forms of visibility, improving the signal quality of your sources, architecting for sensor overlap and key detection chokepoints, and streaming all of this data into an effective big data management system where it can be analyzed and leveraged across the operational security lifecycle can provide a far more effective security operations capability.

How the Respond Analyst Can Help You

The Respond Analyst is able to understand these telemetries and contextual data sources and considers all factors in real-time. This frees you from monitoring a console of alerts, which allows you to focus on higher-value work. It also frees your detection programs from the volume limitations of human monitoring. Putting all of these elements together provides a massive improvement in your ability to detect intruders before they can do major damage to your enterprise technology. We’re putting machines in front of alerts so that humans can focus on situations.