\ Abstract—In the era of the internet and smart devices, the detection of malware has become crucial for system security. Malware authors increasingly employ obfuscation techniques to evade advanced security solutions, making it challenging to detect and eliminate threats. Obfuscated malware, adept at hiding itself, poses a significant risk to various platforms, including computers, mobile devices, and IoT devices. Conventional methods like heuristic-based or signature-based systems struggle against this type of malware, as it leaves no discernible traces on the system. In this research, we propose a simple and cost-effective obfuscated malware detection system through memory dump analysis, utilizing diverse machine-learning algorithms. The study focuses on the CIC-MalMem-2022 dataset, designed to simulate real-world scenarios and assess memory-based obfuscated malware detection. We evaluate the effectiveness of machine learning algorithms, such as decision trees, ensemble methods, and neural networks, in detecting obfuscated malware within memory dumps. Our analysis spans multiple malware categories, providing insights into algorithmic strengths and limitations. By offering a comprehensive assessment of machine learning algorithms for obfuscated malware detection through memory analysis, this paper contributes to ongoing efforts to enhance cybersecurity and fortify digital ecosystems against evolving and sophisticated malware threats. The source code is made open access for reproducibility and future research endeavors. It can be accessed at https://bit.ly/MalMemCode

I. INTRODUCTION

The rise of internet connectivity and smart devices has transformed various sectors, but it has also led to an evolving threat landscape, including sophisticated malware targeting interconnected systems. Obfuscated malware, adept at concealing itself, presents a significant challenge to conventional cybersecurity methods. Traditional heuristic-based or signature-based systems struggle to identify such elusive threats, necessitating a shift towards innovative and adaptive detection mechanisms.

\ This paper explores obfuscated malware detection through multiclass classification, aiming to bridge the gap between evolving threats and advanced detection methods using machine learning. We analyze various algorithms, including decision trees, ensemble methods, support vector machines, and neural networks, to uncover their capabilities and limitations in identifying obfuscated malware.

\ Acknowledging the significance of class imbalance in real-world datasets, especially in malware detection, we investigate techniques such as under sampling (Edited Nearest Neighbor Rule, Near Miss Rule, Random Under sampling, and All KNN Under sampling) and synthetic data generation using the ADASYN method to address this challenge.

\ Our research, based on the CIC-MalMem-2022 dataset, simulates real-world scenarios for memory-based obfuscated malware detection. By meticulously analyzing machine learning algorithms and data balancing techniques, we contribute to fortifying cybersecurity against evolving malware threats.

\ In the following sections, we delve into our dataset, methodologies, and results, aiming to provide valuable insights that can shape the future of malware detection and cybersecurity strategies amidst the challenges posed by obfuscated malware and class imbalance.

:::info This paper is available on arxiv under CC BY-SA 4.0 DEED license.

:::

Feed: Hacker Noon - Medium

View: Original article

Tags: digital internet mobile

Media