Making the Implicit, Explicit

Recently, I watched a Facebook Research interview with Dr. Daphne Koller where she described the communication challenges that subject matter experts and data scientists face.  She went on to describe how becoming ‘bilingual’ and open minded can help in overcoming these challenges.  As one of the subject matter experts at Respond Software, I can attest that these challenges are real, and I often find myself thinking about how to bridge that gap.

Historically, in SOC monitoring, we’ve collected sensor data correlated them deductively, added context and sent a console alert to suggest a possible conclusion.  We left it up to the analyst to investigate further and decide whether the evidence supported further action or not, 99.999% of the time it didn’t.  The whole process of fusing information from various sources and reasoning to a conclusion was done organically, within our security analyst’s mind, without really knowing exactly why we leaned towards a particular conclusion.  Most importantly, we did this on our own, with very little outside help.

Now, all of a sudden, we are working with data scientists, and working alone is no longer possible.  We need to be able to identify what we are doing and describe our thought process in detail (who, what, when, where, why and how), so it can later be replicated into reasoning probabilistic models.  Simply put, I never thought I’d spend so much time thinking about how I think, nor trying to communicate it so granularly.

This is the future of Security, and it isn’t as scary or difficult as it sounds.  So, to help you along this journey, here are three main concepts that have helped me.

Understand the nuance between evidence and events

As security analyst, we are often presented evidence from various sources.  These sensors typically report what they ‘saw’ happen.  If the evidence is taken as an absolute fact, then every aspect of the observed evidence is considered true.  However, that is not always true because sensors have their own limitations that inhibit their competence and credibility in reporting certain attributes of the event.  Consequently, the sensor is truly providing evidence that an event took place with attributes that may require additional verification.

Take Intrusion Detection sensors as an example.  There are many implicit common issues that security analysts know to work through.  The first is whether to verify that the reported signature is not a common false-positive; this is a credibility check.  The second example is when the reported source and destination IP addresses is masked by gateways or other applications.  In other words, the IDS reports the IP addresses from its perspective, but they are not necessarily the true addresses.

In both cases, we as analysts understand these limitations and resolve them during our analysis without communicating them.  When working with data scientists, we must be aware of these nuances in order to accurately reflect our confidence in the data and thus model our line of reasoning.

How we reason when combining evidence

When a security analyst is presented with pieces of evidence, they can quickly pull the pieces together and formulate a hypothesis.  This happens quickly in our minds and can be difficult to communicate or elaborate and is highly subject to cognitive bias.  I often try to step back and look at the pieces of evidence to see how they affect my perception when combined.

For instance, let us take two pieces of evidence:

1. A binary hash check against the National Software Reference Library (NSRL) is found to be good file.

2. Threat Intelligence service reports that the binary hash may be malware.

To a security analyst, these two pieces of evidence completely contradict each other immediately.  Clearly the NSRL evidence carries more weight because it is more credible as a source.   But we must press further; why is it more credible?  This question is much harder for us to answer, but we can establish supporting evidence based on our expert judgement:

1. Although hash collisions can occur, it is difficult to execute

2. Threat intelligence services are prone to errors.

In this case, we can say that our expert judgement enhances the credibility of NSRL while reducing the credibility of the threat intelligence provider.  Although it takes practice, identifying these semi-unconscious judgements really helps communicate how we reason to our analytical conclusion.

Patience and Respect

One thing is for certain, data scientists and security folks do not think alike.  We don’t share a vocabulary, the way we tackle issues and even the way we communicate is drastically different.  This was evident from the moment I walked into the office. This is okay, actually, I wouldn’t want it any other way.

First, I’ll echo Dr. Koller’s advice, the same advice that is often echoed in our office.  Don’t be afraid to ask questions; and the corollary, don’t chastise your colleagues when asking seemingly stupid questions.

The challenge between SMEs and Data Scientists arises because we know so little about each other’s expertise.  Meanwhile digital communication is cumbersome and easily misunderstood, especially under stressful deadlines.  Taking the time to not only listen to the questions, but to understand the meaning and underlying reason for the question is tremendously important for both parties.  I like to think that for every question asked, both parties have something to learn. The automation of human expert reasoning is important to the future of autonomous security and capturing it rationally and clearly is of paramount importance.