Where has Splunk’s Business Development Team been lately? This week we were present at TMForum 2013 Americas, held in San Jose, CA. This is THE show to be at when you want to make a splash into the Service Provider or Telecom industry. While we have a number of customers in this space, I wanted to find some creative ways that Splunk was being used in the field, and then demonstrate that to the TMForum audience.
So how creative can you get with Splunk and Service Providers? How about identifying ‘at-risk’ customers in an effort to reduce customer churn, while at the same time being able to compare nodes within a network to allow operators to quickly determine root cause? This is just the tip of the iceberg when it comes to what some are doing with the platform.
I know that personally, I would love it if my provider were able to view my quality of service as an individual and react based upon those results.
So that is exactly what we demonstrated.*
In the below screen shot we can see an example of outages across the network, those results displayed on a map, as well as how many people are effected based on PPM (parts per million).
Operators typically select an outage to begin their investigation. Once they have an overview of the outage, users are able to either look at the situation from a ‘operational’ perspective, or by looking at the results from a business/customer advocate view. You will notice at the top of this screen there is a customer identified, and on the right the node that is in error along with the number of alerts existing on the parent.
If you’re familiar with networks and infrastructure that has a parent/child relationship, you know that if an error is present on the parent, all of the children will issue an error as well. This has a tendency to create large event storms. Handling event storms has long been the holy grail of root-cause analysis, and there are really two approaches to this. The first is to create and maintain a very detailed and meticulous knowledge base, with a lot of ‘if-this-then-supress-that’. These require a lot of effort to stand up, they are difficult to maintain, and have the potential to be error prone or to hide the error that operators really need to see.
The second method is simply to do a time-based comparison. Why not look at the parent and the child side by side to determine which node was the first to complain?
In the below screen shot we can see that the parent node and child nodes are grouped, and below that grouping, we have;
A) a nice columnar timeline showing us when errors were present on the child, the parent, or both.
B) a comparison of the logs from both the parent and the child
Now that the operator has been able to take appropriate action in correcting the issue, they are able to turn their attention to the customer. Using the amazing power of Splunk, we can identify the MDNs (the phone number) connected to this node at the time of the outage, and if their were abnormal or non-user initiated disconnects. Those numbers are then run through a CRM system to identify if they are ‘At-Risk’. By ‘At-Risk’ I mean to say that the customer is within 45 days of their two year contract expiring, or within the 30 day return window of a new contract. Operators can then look at the customers complete records (in this case one number was disconnected, but the account actually has four mobile devices and several in home services such as video-on-demand and internet service.
Whomever is helping the customer out, they can see their quality of service in all areas, as well as if that customer has had any recent complaints or opened incidents.
Why isn’t my provider doing this for me?
**Please note that while the data format is authentic, phone numbers, IP addresses and other identifying information was all randomly created by Splunk with our in-house event generators. This is not data that we obtained from anywhere other than our own machines and with guidance to the data format from our customers and partners. Either way, I have removed anything that could be considered a personal identifier, even if it was never real to begin with.