SECURITY

How Good is ClamAV at Detecting Commodity Malware?

"People tell you who they are, but we ignore it, because we want them to be who we want them to be.” 
- Don Draper


Earlier this year we announced some security enhancements to how we handle submissions to Splunkbase. The simple statement is we are making things faster/cheaper/better where Splunkbase security is concerned. 

Faster  in that it takes less time for a developer to get an app into our platform. 

Cheaper  in that it’s more automated. 

Better  in that the security controls applied to each submission offer better protection from cyber-nasty than before.

In the course of our endeavors to level-up we discovered something interesting. In the tradition of tabs vs spaces, there was a sharp divide among my colleagues regarding ClamAV. Some were convinced it was no good. Others that it was free and did a good job of catching low hanging fruit. Everyone felt confident in their view. Nobody had the data. 

So, we conducted a study and got the data.

(Shout out to our awesome-sauce summer intern Neel Bhavsar who did literally all of the hands-on work!)

What Did We Do?

Fundamentally there were two prevailing opinions about ClamAV in our org. Some of us felt that ClamAV wasn’t worthwhile to implement because, like any signature-based AV engine, it lacked the ability to detect modern forms of malware. The other was that ClamAV was free, open-source, and provided a good baseline of protection.

To sort this out we conducted an efficacy study wherein we applied ClamAV to over 400,000 malware samples from MalwareBazaar, bucketed as follows:

Count

Name

Description

106135

Banking Trojan

Trojans targeted towards stealing financial information

26875

Botnets

Malware for making the victim a part of a botnet

190371

Information Stealer

Programs designed to steal client information. E.g. Keyloggers

52422

Loaders

Program that loads one or more other malicious programs. That is, a stager that fetches nasty things directly into memory.

1321

Miners

Crypto currency miners

30251

RATs

Remote access tools. E.g. Backdoors

8273

Trojan

A generic multipurpose malware that harms the user in different ways. Generally disguises itself and delivered by tricking the user

From there we ran the entire data set through ClamAV, Avast, Defender, and Falcon. Also, it should be noted that Falcon is not the same kind of general-purpose tool ClamAV and Avast are. Falcon focuses on executables and dlls. Here are the efficacy results from ClamAV. 

Results

Overall Accuracy on All files

All in all Clam detected just under 60% of the malware in the sample. 249696 / 416561 (59.94%) to be exact. 

Accuracy By File Type

As can be seen from the chart above, ClamAV did well on docx, dlls and elf malware. However, it has missed quite a few important file types such as exe, xls* and zip.

Accuracy by Top Level Category 

ClamAV does OK for a few top level categories like Trojans & Botnets. However, it does poorly on other malware types like Crypto Miners, RATs and Info Stealers.

Note: Two of the categories are not present in the chart (Adware & Ransomware) due to the limited number of samples present in the dataset.

So is ClamAV Any Good at Detecting Commodity Malware or What?

As far as this data goes, the answer is “it depends”. The results indicate ClamAV is highly reliable at detecting certain types of malware in certain types of files. If your use case for ClamAV involves inspecting those things, then ClamAV is an amazing, free, tool. Conversely if your use case involves, say, looking exclusively at jar files the data indicates you’d likely fail to detect quite a lot of nasty.

One additional point worth mentioning, in favor of ClamAV, is the fact that it’s highly customizable by way of yara integration. This makes it particularly valuable for security organizations that want to dogfood data from their internal threat hunting activities or are looking to supply custom detection rules to detect anomalies.

So there's the data. May it help you and your org make efficacious decisions where AV tooling is concerned. And, as always, Splunk Tools are pretty handy for turning data into useful doing. 

David Holiday
Posted by

David Holiday

David Holiday is what Watson referred to as a "Wild Duck". He is the embodiment of the deviance from the norm that foments progress. David's professional journey has taken him from Kierkegaard to Dijkstra, Kay to PKD, to Hofstadter and beyond. David is passionate about good writing, loves to code, and views himself as a writer who happens to write things computers like to read.

David has degrees in Philosophy and Computer Science as well as patents involving modeling "what's next" given a corpus of data. He lives in Colorado with is wife Mara and two instupituous children, Story and Hazel.

TAGS
Show All Tags
Show Less Tags