This entry continues a blog series designed to offer experience-based best practices for approaching SOC Automation. Miss part one of this article? You can read it here.
In part two of this article, we will focus on perspectives a bit before we start the use case building process.
In our experience, most organizations first approach security orchestration and automation from one of these perspectives:
- Event Triaging
- Investigation Support
- Threat Intelligence Assessment
Allow me to explain each in more detail.
Over the course of my career and in various roles, from Director of Security Operations to Chief Information Security Officer (CISO), it became laughably predictable that on my first day something would raise an critical alert. It might have been a critical vulnerability, an escalated intrusion alert, or a 1,000 item list of suspicious IP addresses received from a “trusted source.” And guess what? The text or phone call notifying me of this critical alert typically came in at three o’clock in the morning.
So what was my first response? “Do we trust this data? Have we validated the source?” This scenario, or one similar to it, is commonly the genesis for the event triaging perspective.
It the past, team members would be woken up and begin their frenzied process of manual investigation, followed by clicking and pasting their findings into a document that they’d ultimately attach to an email. All of this effort was spent to confirm whether we should be taking an intelligence artifact seriously. Now imagine assessing 1,000 IP addresses using this process. Yuck.
In today’s world, I would automatically create an incident from the critical alert, process the intelligence with my workflow to validate the alert, enrich the incident with details from the investigation, and then present the results to a human analyst for decision making. The whole process is faster than in the past, because it happens before I ever learn of an incident. It’s also smarter than in the past, because I don’t get woken up in the middle of the night for false alarms.
Investigation support largely mirrors the event triaging perspective above, albeit with some slight variations. Extend the threat hunting scenario mentioned above to any member of the SOC that is tasked with investigating an Indicator of Compromise (IOC). The member must determine the level of risk using vulnerability data, log data, network traffic, user activity, and possibly other sources of telemetry. They must decide, “Should this event be critical? Should I launch an incident?” Typical users with this perspective are the incident responders and Tier-3 analysts.
Threat Intelligence Assessment
The threat intelligence team is often the catalyst for the launch of a security operations and automation platform. The members of this team are usually overwhelmed with the volume of data being pushed at them. They need to store, correlate, validate, and report on serious business impacting events. It can take an analyst an average of an hour to perform the initial assessment and reporting on a single threat intelligence alert. The security leadership (e.g. the CISO, Director of Security Operations, or CIO) rely on this team to provide trusted and validated escalations. Missing one critical alert or escalating the wrong ones doesn’t end well and as we are seeing boards and leadership will fire the security leader if they slip up (http://blogs.wsj.com/cio/2017/01/25/compliance-failures-breaches-top-fireable-it-issues-survey/). As I often say, “In security operations, you are only as good as your last incident.”
Developing Use Cases
We have seen users of the Phantom Platform start their automation and orchestration journey with one of the above perspectives. The good news is that all of them deliver success. Don’t think outside your initial box (i.e. scope), however, during this iteration of the exercise.
So far, you focused on:
- The level of automation you are starting with
- The focus group
(And you have gathered your smart people together.)
Now, you can start smart planning and that means developing use cases.
Like all good business planning, the process starts with brainstorming to develop the problems. Business problems. I’m not talking about technology or people or process. I want to you to brainstorm about the business risks in security operations. In this way, you can be sure that you are driving relevant security practices that align with the business. It also means that when you go for funding, you will get the “nodding heads” from the business leaders.
I use a mind maps to help sort out my thoughts. Notice I have tried to limit the use of security terms, so really I should place PUPs under Helping Protect the Users. Ok, so let’s focus on Helping Protect Users since that should be one of your core principles.
Now think about the current steps that you use to respond to this use case.
There should be two strategies in your SOC:
- Proactive Defense
- Reactive Response
Most SOC teams suffer from 80% reactive and 20% proactive defense. Yes, don’t be disheartened, you’re already providing proactive defense. Think of virus definitions, patching monitoring and alerting, SIEM console, log monitoring, etc. But still, most SOC teams are not short of work when it comes to reacting to alerts. This is not a bad thing, since security operations is about preparing and responding to the unanticipated.
The trouble is that for many security teams, the same steps are performed manually for each and every incident, 7 days a week, 24 hours a day. A typical workflow for an incoming event:
- Notice it on one of your consoles
- Assess the source of the event
- Create a ticket in the security operations ticketing system
- Validate the user and/or the device
- Determine if the event is real (not a false positive or the security systems already blocked it).
- Determine if the infrastructure is vulnerable by checking IPS, firewalls, endpoints, vulnerability CMBD, etc.
- Find other IOCs
- Ensure that those indicators of compromise don’t exist anywhere else in the infrastructure
- Worry about those user systems that are offline
- Decide to put an rule to detect suspicious activity
- Remember to remove alert x days/hours later
- Update ticket
This is not a complete list and the actual steps will vary from organization to organization, but the principles should be understood and resonate with you. Also, note that this event didn’t turn into a real incident and was not escalated.
Now I want to ask yourself (or your team) to answer these honestly:
- Do you do this often?
- How often?
- How long does it take?
- How many do you miss or de-prioritize?
- How many times has it re-appeared?
Is this the most common task that is repeated time and time again, every day?
Now, here is a surprise. When you read this or socialize this in the brain storm if might not resonate. Hopefully, it actually generates conversation. It’s designed as a catalyst to cause conversation. I want your team of Smart People to come up with the common tasks used to Protect Users.
Now, for the purpose of this exercise, I going to assume that the use case I articulated above resonates and you decide that this is one use case you should write up for consideration.
Yes, I’m asking to write something before launching the project. I know that we live in a world of instant gratification, but if we are honest we know that the best results starts with a good foundation. In this case, the foundation starts with the use case.
So what is a use case? What are the sections? I’m glad you asked. Here is my security version of a use case definition document:
- Name – Note a short name but something that means something immediately to the reader. For example, “Processing a Event from our SIEM”. You don’t have to open the whole document to understand what it covers. No codes, reference ids. If you really insist, you can add that somewhere else.
- Short Description – A little bit more text explaining what process, the inputs and the results. At a high level.
- Actors – Who is involved in this process? Perhaps in this case, event source, a user, a device, and the security analyst. Notice that Im not trying to describe the whole process of responding to an alert (as in triage, assessment, alert, response, measure, close.). I don’t want to make something too big to build. Remember you can with our platform, connect play-books together. Starting with smaller discrete scenarios delivers results faster and provides additional agility.
- Triggers – This should describe how this particular use case is triggered. Not a technology level, but at a high level. In our case, it could be a ticket assigned to the security operation queue.
- Preconditions – This is criteria that scope of use case. It could be focused on just endpoint device response. You might want a different response strategy for servers since risk assessments and actions generally have a higher impact to the organization as a whole.
- Basic Flow – This is your traditional workflow diagram showing the steps at a high level. It helps understand how data, decision and actions flow in a model that showing the timing and order of the steps.
- Business Risk – This articulates the benefits of optimizing the process. This should be a couple of sentences and if you are showing this to management, include numbers. They like numbers, but make sure that the numbers are relevant and real. They will call you on them.
- Use Case Risk – Articulate risks that you anticipate associated with automating this use case.
- Mitigating Controls – So you have thought through the risks. You can’t ignore them. So how are you going to handle them? Manual intervention? Escalation? Timing SLAs?
- Assumptions – You are going to be making assumptions that systems, people and data is going to be available. Document them.
- Manual Process Time Measurement – OK, more numbers. Estimate how long this does/would take to do manually? How many times a day / a week?
- Results – Document your expected results.
So why build a use case? It makes the team think things through. That’s why it is important to think through the whole process. Don’t skip sections. I can’t think of a scenario where a section would be blank.
This is a team effort. It should be built by at least two people. Developing a use case by a single person might miss a risk or opportunity.
Some team members may say, “I hate writing.” Ok, I understand, different people have different strengths. So instead of text, build a diagram, a mind map, a spreadsheet, or anything that captures the details. But the thing I would ask, is keep it consistent. Reading multiple variations of a use case format can be challenging especially when you are tasked with translating a use case into reality.
Everybody uses different tools, processes, and prioritization methods for the same task. Ok, you know this is wrong. Combine everybody’s approaches. Watch out the duplication but leverage your team’s knowledge and experience. At the end of it, there will be a consistent response model for the same situation across your organization.
So know you understand the process of building a single use case. Incredibly invaluable to start a security automation and orchestration project successfully. It appears to be a lot of work but don’t “try to boil the ocean”. Start off with the obvious use cases, the ones that are repetitive, take time and generate the same response, the majority of the time
To give you an idea of the amount of time is takes:
- 2 – 3 hours brainstorming
- 1 hour to list the top ideas
- 1 hour to pick the first two to look at automating
- 2 hours to document the use case
- 1 hour to review and agree to automate