Deep Learning in Security: Text-based Phishing Email Detection with BERT Model

What is a Phishing Email?

Phishing emails are fraudulent or malicious emails that are designed to deceive recipients and trick them into revealing sensitive information, such as login credentials, financial details, or personal data.

Phishing email contents usually employ various social engineering techniques that are likely to manipulate recipients, leading to significant damage to personal or corporate information security. Therefore, it is of great necessity to implement detections on the phishing emails in order to provide precautions in the security sector.

Why use text to detect phishing?

In the past, phishing detection algorithms relied on predefined rules and patterns to flag suspicious emails, such as checking sender settings against blacklists, verifying URLs, and analyzing textual features like misspellings, grammatical errors, or specific keywords.

However, as phishing attacks become increasingly sophisticated, it is crucial for these algorithms to evolve and utilize the actual text contents while understanding the semantic structures of phishing emails. By focusing on the message itself, phishing detectors can adapt rapidly to new tactics, efficiently detect emerging threats, and significantly reduce false positives. This makes them an indispensable tool for safeguarding organizations and individuals against cyber threats.

BERT-based phishing email detector

In this project, a phishing email detection model was constructed using the BERT model (Bidirectional Encoder Representations from Transformers), a neural network architecture widely used in natural language processing (NLP). BERT is pre-trained on a massive corpus, enabling it to learn high-level representations of natural language and be easily fine-tuned for downstream tasks like text classification. The figure below illustrates BERT's multi-layer transformer encoder architecture, comprising 12 transformer blocks. Each block utilizes self-attention layers to model complex bidirectional dependencies between words in sentences, capturing both local and global context.

BERT model

We conducted fine-tuning of the BERT model using a dataset comprising 181,781 email text utterances labeled as phishing or non-phishing, sourced from various public benchmarks for text classification tasks. For fine-tuning, we updated the top 3 Transformer layers and the linear layers (as shown in the figure above) for 20 epochs, selecting the model with the lowest validation loss.

To evaluate the fine-tuned model's real-world performance, we tested it on a separate dataset of 32,681 email text utterances from sources not used in training. The BERT-based phishing detection model achieved an impressive F1 score of 0.99, with a true positive rate of 99.06% and a true negative rate of 98.5%. We compared the BERT model with other deep learning models and traditional machine learning algorithms, including DistilBERT, LSTM, Support Vector Machine, Random Forest, and Logistic Regression. Furthermore, we conducted experiments using both email text and other feature data, such as links and domain names, which are commonly used in conventional methods. The experimental results are plotted in the figure below, where the BERT model outperformed all the machine learning algorithms significantly and demonstrated the best performance among deep learning models. The usage of auxiliary input features did not affect the performance of the BERT model, indicating the adequacy of using text only input in the BERT model.

F1 scores of various ML and DL algorithms

Operationalize the phishing email detector

With the integration of the Transformers library in the latest release of Splunk App for Data Science and Deep Learning (DSDL), deploying the phishing email detection model has become seamless. The model can now be easily integrated into a container environment and deployed through a Splunk search. The deployment script has been added to the two recent containers: mltk-container-transformers-cpu and mltk-container-transformers-gpu.

The phishing email detection model can be deployed through Splunk DSDL with the following steps:

  1. Make he mltk-container-transformers-cpu or mltk-container-transformers-gpu is running on your Splunk DSDL.
  2. Download the model checkpoint from the Huggingface interface or using the wget command:
    wget https://huggingface.co/Huaibo/phishing_bert/resolve/main/pytorch_model.pt
    
  3. Create a folder /srv/app/model/data/classification/en/bert_phishing in the container using the JupyterLab interface and upload the downloaded model file under this folder.
  4. Place the input email texts in a field named "text" and deploy the model with a single line of SPL command:
    | fit MLTKContainer algo=bert_phishing text from text into app:bert_phishing as prediction
    

The provided screenshots demonstrate the model's performance on Splunk search & reporting app, using randomly generated email text inputs. The label field indicates whether the email is phishing or not, and the model's outputs are shown in the field prediction. Impressively, our model correctly labeled all the input samples, showcasing its effectiveness in phishing email detection.

Considering that real-world deployment often involves email text data in various formats, we further evaluated the model's performance using web-scraped data. The dataset in the following screenshot contained random tokens and letter repetitions, simulating noisy and less structured inputs. Despite the additional complexity, the model demonstrated its robustness by accurately labeling the samples, showcasing its capability to handle challenging and diverse input texts.

performance using web-scraped data

Conclusion

In this project, we developed an advanced phishing email detector based on the BERT model. Fine-tuning on a large dataset resulted in impressive performance, surpassing other text-based methods. We also successfully demonstrated model deployment through Splunk DSDL, and this feature will be officially supported in the next release of the app.

Note: This project is conducted in collaboration with the Splunk Machine Learning for Security team (SMLS).

Related Articles

Predicting Cyber Fraud Through Real-World Events: Insights from Domain Registration Trends
Security
12 Minute Read

Predicting Cyber Fraud Through Real-World Events: Insights from Domain Registration Trends

By analyzing new domain registrations around major real-world events, researchers show how fraud campaigns take shape early, helping defenders spot threats before scams surface.
When Your Fraud Detection Tool Doubles as a Wellness Check: The Unexpected Intersection of Security and HR
Security
4 Minute Read

When Your Fraud Detection Tool Doubles as a Wellness Check: The Unexpected Intersection of Security and HR

Behavioral analytics can spot fraud and burnout. With UEBA built into Splunk ES Premier, one data set helps security and HR reduce risk, retain talent, faster.
Splunk Security Content for Threat Detection & Response: November Recap
Security
1 Minute Read

Splunk Security Content for Threat Detection & Response: November Recap

Discover Splunk's November security content updates, featuring enhanced Castle RAT threat detection, UAC bypass analytics, and deeper insights for validating detections on research.splunk.com.
Security Staff Picks To Read This Month, Handpicked by Splunk Experts
Security
2 Minute Read

Security Staff Picks To Read This Month, Handpicked by Splunk Experts

Our Splunk security experts share their favorite reads of the month so you can follow the most interesting, news-worthy, and innovative stories coming from the wide world of cybersecurity.
Behind the Walls: Techniques and Tactics in Castle RAT Client Malware
Security
10 Minute Read

Behind the Walls: Techniques and Tactics in Castle RAT Client Malware

Uncover CastleRAT malware's techniques (TTPs) and learn how to build Splunk detections using MITRE ATT&CK. Protect your network from this advanced RAT.
AI for Humans: A Beginner’s Field Guide
Security
12 Minute Read

AI for Humans: A Beginner’s Field Guide

Unlock AI with the our beginner's field guide. Demystify LLMs, Generative AI, and Agentic AI, exploring their evolution and critical cybersecurity applications.
Splunk Security Content for Threat Detection & Response: November 2025 Update
Security
5 Minute Read

Splunk Security Content for Threat Detection & Response: November 2025 Update

Learn about the latest security content from Splunk.
Operation Defend the North: What High-Pressure Cyber Exercises Teach Us About Resilience and How OneCisco Elevates It
Security
3 Minute Read

Operation Defend the North: What High-Pressure Cyber Exercises Teach Us About Resilience and How OneCisco Elevates It

The OneCisco approach is not about any single platform or toolset; it's about fusing visibility, analytics, and automation into a shared source of operational truth so that teams can act decisively, even in the fog of crisis.
Data Fit for a Sovereign: How to Consider Sovereignty in Your Digital Resilience Strategy
Security
5 Minute Read

Data Fit for a Sovereign: How to Consider Sovereignty in Your Digital Resilience Strategy

Explore how digital sovereignty shapes resilient strategies for European organisations. Learn how to balance control, compliance, and agility in your data infrastructure with Cisco and Splunk’s flexible, secure solutions for the AI era.