Introducing New Deep Learning NLP Assistants for DSDL

The Splunk App for Data Science and Deep Learning (DSDL) now has two new assistant features for Natural Language Processing. DSDL has been offering basic natural language processing (NLP) capabilities using the spaCy library. The new features now add ever-new capabilities to the Splunk platform using Transformers libraries that utilize deep neural network technologies to provide intelligent and accurate results in text classifications and summarizations.

The new DSDL assistants provide an interface for any user to develop deep learning-based models without writing any Python code and help to standardize the SPL. You can leverage cutting-edge NLP tasks for your use cases using the new DSDL assistants.

Text Classification

Classical machine learning (ML) techniques use, for example, TFIDF, PCA, and Random Forest Classification to classify the text. The new Assistant for Deep Learning (DL) Text Classification uses Transformers BERT models to achieve text classifications that had never been done before. It will provide five advantages over traditional ML text classification that can help your business. I'll introduce each by comparing the outcomes from traditional ML, and DL approaches to classify text data comprising about 150 English words into its category. The training was done using 7,352 texts with ten epochs, and the other collection of 100 texts was applied to the test.


Traditional Machine Learning Text Classification (TFIDF, PCA, and RFC)

Accuracy Precision Recall F1
0.29 0.23 0.26 0.23

Deep Learning Text Classification (Transformers BERT)

Accuracy Precision Recall F1
0.72 0.66 0.57 0.60

The higher accuracy is the first advantage in the DL text classification. The accuracy will improve when you train the model more. Also, you can have this accuracy without considering what text values are useful features to predict the class. You just need to specify your text data and its classes in your training data set using the Assistant UI. 

The Assistant generates a fine-tuning SPL for you, runs it in the background, and shows the progress. A big difference between DL and ML is that DL takes much more time to train the model. The Assistant will show you how long the fine-tuning will take so you don't have to wait in front of your display.

 a fine-tuning SPL

Ability to reduce false positives

Fitting a new model takes a long time in DL. But, from the nature of the text classification, you would probably not need to fine-tune it frequently. Once a week or even once a month would be enough depending on your business use case. And, you'll find a longer fit duration makes sense when you know the second advantage. That is the capability to show you how much the prediction can be positive with the probability field.

probability field

False positives (FP) are a significant issue in most ML implementations. DL text classification can help you to reduce the FP besides its high accuracy. You can filter out the low-probability predictions from your results to make your result set with fewer FP predictions. Below is an example showing how much you can improve the score by filtering out the predictions with a lower than 90% probability.

Deep Learning Text Classification (Transformers BERT with probability filter at 90%)

Accuracy Precision Recall F1
0.77 0.71 0.58 0.61


You will need to examine the low-probability predictions by the human eye. The third advantage in the DL text classification will shine in this phase. The low probability prediction typically has two or more high scores in different predicted classes. By adjusting the SPL the assistant generates, you can easily compare the high-scored classes to see why your model failed in the prediction.

text classification

The example above indicates that the scores between the computer science and statistics classes are so close. You can see why the probability of the positive prediction was low by comparing the high-score classes. It should sometimes be challenging, even for a human, to judge which class can be suitable. You can use such data to reinforce your model.


You can retrain your model using the same data or new data. It resembles the way a human learns language. The latest training data set will influence the predictions most, and the old data can be forgotten; however, training results will be accumulated on the data that the model has already learned. You can effectively reinforce your model when you use the dataset that your model failed to predict.

Multilabel classification

You may have recognized that the SPL in the screenshot looks tricky. Yes, it is. But don't worry; it is not so difficult. DL text classification requires class fields to hold 0 or 1 to indicate which classes to the text. So you have to convert your data between two different formats shown below.

Text and Class pattern A:

text class
Each team has 11 players, and they kick the ball. soccer
Each team has 9 players, and they use a bat to hit the ball. baseball

Text and Class pattern B:

text soccer baseball
Each team has 11 players, and they kick the ball. 1 0
Each team has 9 players, and they use a bat to hit the ball. 0 1

The assistant generates the SPL to convert from pattern B to pattern A.

| eval high_score = 0
| foreach cat1_*
    [ eval high_score = if('<<FIELD>>' > high_score, '<<FIELD>>', high_score)
    | eval cat1 = if('<<FIELD>>' = high_score, "<<FIELD>>", cat1) ]

Rename all the class fields to start with the prefix "cat1_" before you fine-tune or evaluate them to meet this SPL

Converting patterns A to B will be much easier in SPL.

| eval {category} = 1
| fillnull value=0

You would have realized this would bring you yet another power in text classification that would be difficult to achieve using ML approaches. Yes, it enables handling multilabel classifications like the example below.

text soccer baseball
A team suport uses a ball 1 1

Evaluation of Classification

The evaluation shows you the following scores.

  • Confusion matrix
    Shows you the table showing the combination of the actual values and the predicted values.
  • Accuracy score
    Shows you how much of the predicted values are true. Precision score
  • Precision score
    Shows you how much of the predicted "A" is really "A".
  • Recall score
    Shows you how much of the real "A" is predicted as "A".
  • F1 score
    Shows the harmonic mean of precision and recall.

All the scores range from 0 to 1. 0 is the worst, and 1 is the best. If your classes are well-balanced in your data set, you can use accuracy to evaluate your model. If your classes are imbalanced in your data set, use F1 to assess your model.

Text Summarization

Text summarization is another feature of the new DSDL Assistants. It can extract the summary from your text according to the training data format using Transformers T5 models.

Text Summarization

There could be a better way to show an excellent example; I'll do my best to tell its value in a very short example.

text summary
I want a TV. Customer wants TV.
I have a radio, and I like the radio. But I want a TV now. Customer wants TV.
I watched a cute dog on TV. I wish I had a dog. Customer wants Dog.

If the model has been trained well with enough summary training data teaching "Customer wants <...>," the model will extract what your customer wants from the text as we understand the text. Classification requires pre-defined classes; however, summarization doesn't. It can solve your use cases where you cannot pre-define the classes to extract the values from your text data, which the traditional ML approaches never could.

Evaluation of Summarization

The evaluation shows rouge scores, which indicate how much the automatically produced summaries recall against the human-produced summary. 0 is the worst, and 1 is the best.

Use cases of DL NLP

While deep learning text classification and summarization can be used for a wide range of use cases, this feature set was initially designed for call center requirements to process call center transcript data and extract three fundamental values.

  • Customer's issues
  • Operator's suggestions
  • Results (Did that answer work?)

It would be of great value for the call center if you could extract these from the data. I had an opportunity to test this with the actual customer data. The accuracy was so high that the customer is now considering using the outputs to build a knowledge base database.

Call center knowledge base

The cost to implement the Deep Learning NLP in Splunk will be relatively cheap, though the price to implement the voice-to-text solution is still not cheap today. The communication method for the call center is shifting from phones to text-based devices. The solution will be becoming more adaptive to call center users.

However, the call center is just one of many use cases for DL NLP. You can apply it to any use cases where you want to extract values from the text.

For instance, DL text classification can help you route the ticket to the best team or person; or add a category for better or even automated ticket routing in the IT ticketing system. In a more advanced use case, it can help triage the ticket to predict the severity and nail down the suspected failure in the system that you should look at first. And the summarization will help you to take a solution note in the retrospection.

IT Service Ticket Triage Automation

For the security area, the most straightforward idea would be to apply phishing email detection. ML-based solutions already exist in this area, but if you are unsatisfied with them, you can try the DL-based solution.

Phishing Email Detection

If you are a data scientist familiar with Transformers and Python

You should not need to use DSDL assistants. Just jump into the Jupyter Lab notebooks in the Transformers container and refer to below two notebooks for better adaption to your use case.

  • transformers_classification.ipynb
  • transformers_summarization.ipynb

At the last

Deep Learning NLP Assistant for DSDL is implemented with the demo data. You can try to see how it works with it before you apply your data. We'd like to learn about how it can help your business. If you are interested in the DL NLP solution, download the DSDL 5.1 and let us know your use cases.

Tatsu Murata
Posted by

Tatsu Murata

Tatsu has experience in the IT industry since 1988 starting as a COBOL programmer for mainframe computers. After that, he worked as a support, PS, and pre-sales engineer for Sybase, Excite@Home, RiverSoft, Interwoven, Dell, SupportSoft, CA, Webroot, Citrix, and SOASTA before he joined Splunk. Those provided Tatsu the opportunities to learn deeply about RDBMS, Networks, Hardware, APM, Security, and 4G/5G Mobile Communication, which is helping him truly understand the requirements of Splunk use cases.

Show All Tags
Show Less Tags