Detecting Obfuscated Scripts With Machine Learning Techniques

Loading...
Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu | Master's thesis
Date
2020-03-19
Department
Major/Subject
Security and Mobile Computing
Mcode
T3011
Degree programme
Master's Degree Programme in Security and Mobile Computing (NordSecMob)
Language
en
Pages
50+5
Series
Abstract
Complex operating system administration tasks can be automated and simplified by using scripting languages. For the Windows operating system, one of the most commonly used scripting languages is PowerShell. The PowerShell scripting language provides vast functionality for the system administrators. At the same time, it leaves a large attack surface for adversaries to bypass the OS protections. Signature and supervised machine learning based intrusion detection systems (IDS) can be used for monitoring and detecting such malicious scripts. However, the detection can be evaded by obfuscating the scripts. As the next step in the defense, we can use obfuscation itself as a reliable sign of malicious code. This thesis investigates the methods of detecting obfuscated PowerShell scripts with machine learning (ML) techniques. We trained the logistic regression, random forest and gradient boosting models on a balanced dataset. To generate the dataset, unobfuscated scripts were taken from open-source projects and they were obfuscated by open-source obfuscators. We then selected the most important independent features for obfuscation detection. The ML methods were compared using their ROC curves and AUC values. The best method turns out to be the gradient boosting model, which has the AUC close to one for the used dataset. Moreover, the model can classify a script faster than in one millisecond. Thus, the model can replace existing approaches to obfuscation detection, and it can be used by antivirus vendors in the process of detecting malicious PowerShell scripts.
Description
Supervisor
Aura, Tuomas
Thesis advisor
Aura, Tuomas
Kraemer, Frank Alexander
Keywords
machine learning, PowerShell, malware, scripting language, obfuscation detection
Other note
Citation