Advanced Link Analysis: Part 1 - Solving the Challenge of Information Density

Sigbay Link AnalysisLink Analysis is a data analysis approach used to discover relationships and connections between data elements and entities. This is a very visual and interactive technique that can be done in the Splunk platform – and is almost always driven by a person, an analyst or investigator, to understand the data and discover necessary insights specific to the business problem at hand.

The Use Cases for Link Analysis

Link Analysis frequently used in the space of cybersecurity, fraud analytics, crime investigations, finance and in other areas where it is essential to discover hidden anomalies, unusual connections, suspicious relationships and otherwise insights that are important for business. The challenge with any interactive data investigation techniques and visualizations is that the amount of data to be analyzed is growing exponentially yet the amount of screen real estate space remains constant.

To illustrate this problem — here is an example of poor implementation of link analysis:

Examples of poor implementation of Link Analysis
The problem with these layouts is that after trying to squeeze more than 100-200 data points and all the connections between them - the view is like that of a spider web. Thismakes it difficult, if not impossible, to understand and analyze.

It’s popular, and almost standard practice, to represent each data element in some sort of geometrical 2D shapes. This method, however, comes with an inherent problem of organizing them into layouts that not only have to be easy to understand but also have to be easy to interact and pivot on.

The 2D shapes (whether they are circle, rectangle, etc) occupy too much space on the screen and yet important pieces of associated information such as name, IP address, device identity, URL, email, etc — don’t fit inside them. This ultimately creates a usability struggle for business users who need to untangle this visual mess.

The Ideal Link Analysis Visualization

The “Ideal” Link Analysis Visualization should offer business users the ability to quickly uncover the most interesting data points, elements, entities and relationships. The visualization should also clearly show velocities, anomalies, densities, connections. Additionally, it needs to be very interactive with clear abilities to prioritize the data and pivot over any data element into different directions of investigation.

One of our partners, SigBay released the Link Analysis Visualization app on Splunkbase that allows us to overcome many deficiencies and limitations of typical link analysis tools.

Sigbay Link Analysis
I’ve worked with the SigBay link analysis visualization tool for quite a while and found it to be the most flexible tool that allows you to organize large amounts of information on the screen as well as make it intuitively interactive to perform complex investigations. Instead of throwing a bunch of shapes and circles with connections in all directions onto the screen view, SigBay organized data in columns where each column represented a specific data node (like a column in table). Everything comes with a sensitive default where the highest priority given to one task exposes the most interesting data points to the investigator with minimal effort.

Each column provides about a dozen different ways to sort the data — either by custom aggregate functions (like number of events, bandwidth consumed or user accounts accessed) OR by density of connections to left or right nodes. SigBay Link Analysis also gives investigators the ability to change the order of columns to discover specific relationships.

SigBay Link Analysis includes flexible dynamic searching that really helps during investigation. For example, say you want to find all attempts to login to all administrator accounts from specific countries. You can just click on a column and start typing “admin”. Visualization will search for matching account names and return all results. If a specific country is not shown — you search for it the same way and click the [Select All Nodes] button to select results. Then you can click on any selected node to see the requested result.

The way most Splunk dashboard visualizations work is that they are driven by specific SPL query. Dashboard developer creates core static SPL query to represent specific data view. If data needs to be shown differently — the legacy approach is to either create a new dashboard or to modify underlying SPL query.

When an analyst performs an investigation it is hard to predict the direction the data will take him. SigBay tackled this challenge from a very different angle. In fact, heir visualization does not require any initial predefined SPL query at all. It only asks you to pick up a data model name (it feeds off accelerated data models for fast query), and fields of interest.
Say, if we’re dealing with Web traffic - that would be fields like IP address, Country, username, status code, device IDs, page URLs and alike. You may also use tokens to dynamically customize lists of fields of interest and make visualization more dynamic. In addition, visualization shows helpful keyboard shortcuts to perform an operation.

Link Analysis Config
One of the important advantages of SigBay Link Analysis is support for automatic calculation of many aggregate functions of your choosing.

For example:

  • You may want to know the total number of events caused by a certain IP address.
  • How many unique usernames certain IP addresses tried to login to.
  • How much bandwidth was consumed by visitors from a certain country.

And instead of rewriting visualization and rebuilding the dashboard for each of these use cases — the app automatically calculates such aggregates for every data point. In fact this tool recreates a custom SPL query to Accelerated Data Model every time a user clicks on any node.

This is how above aggregates are represented by Link Analysis Viz:

In this example 3 aggregate functions are calculated for every single node: 

  • “Events” (number of events where node is present)
  • “BytesIn” - total sum of all bytes transferred during HTTP request (indicator of incoming bandwidth consumed by the actor) 
  • “Usernames” - total number of unique usernames touched by the node

The value of each aggregate function is displayed above each node in a color coded way.

Each column by default is sorted independently in descending order according to the results of the first aggregate function — in this case, “Events”.

This way an investigator without doing anything immediately sees most “active” IP addresses, most active Countries, http_user_agents, status codes, requests, pages, and so on. By design the most interesting and active entities are “bubbled” up on top of each column.

What if you want to see the “shape” and patterns of potential account takeover attacks? To visualize this activity we can select all unsuccessful login attempts and see where traffic is originating from and what kind of actors are more active.

One of the most important aspects of link analysis is to discover relationships between entities. Say we need to discover which country is responsible for the majority of possible account takeovers? Specifically, which country is trying to attack most of our user accounts as well as which country is generating the majority of malicious traffic.

The SigBay Link Analysis app allows us to dynamically reposition columns when we need to discover relationships. And then through sorting by the number of connections we can find answers:

We have just touched on a few capabilities of SigBay Link Analysis visualization. 

We haven’t discussed yet:

  • Local, global, math- and regular expression driven searching
  • Custom filtering
  • Multi-selection with include and exclude logic
  • Multi-level undo and redo logic
  • Node appearance and theme customization
  • Filter tokens and Interactions with other panels and visualizations.

I will cover these features in Part 2 of this blog series.

SigBay really delivered a powerhouse with their capable link analysis visualization and they continue the development.

They elegantly solved the challenge of modern information density allowing to interactively represent close to 4000-5000 data and information points within the single view.

Besides full integration with Splunk Enterprise, this Link Analysis visualization is also fully vetted in Splunk Cloud and is currently being used by multiple customers and other Splunk partners who are building highly capable and competitive data analytics solutions.

According to SigBay they are in the process of integrating their interactive Link Analysis technology with more databases, repositories and data stores to help customers overcome limitations of visual investigation tools offered by other data analytics vendors.

Gleb Esman
Posted by

Gleb Esman

Gleb Esman is Sr. Product Manager for Fraud Detection at Splunk.

With a technical background in analytics, security research and development, Gleb helps to guide product development efforts in the areas of fraud detection, analytics and investigations.

With experience in security research and building fraud detection, analytics and investigation applications at a major financial institution, Gleb helps ensure that Splunk customers will get the best of breed, cutting edge solutions to tackle costly challenges with fraud across multiple industry verticals.

Gleb is an author of patent applications in the area of deep learning, security and behavior biometrics.