An AI based Malware Detection App built with Detection Engine part implemented in TensorFlow and TFLite and the on-device Malware Detection part in Java.
This is a project undertaken during my work at the Devices & Network Security Lab under National Center for Cyber Security (NCCS) at Air University, Islamabad. This project has two major contributions:
- Design and Development of a CNN based on features extracted during a static analysis of the Android Apps
- Deployment of the trained model into the Mobile Device for on-device detection of Malicious Android Apps.
In this phase, we have used only static features extracted from Android apps. Static feature-based malware detection means that the Application is not run/executed on any device but the source code is extracted and observed for malicious patterns. There are multiple techniques/features to be monitored but Wei Wang [1] provided a brief summary of the static features and highlighted Permissions and Intent-filters (available within manifest file of apk package) to be most promising. Therefore, we monitored these features to train a Neural Network model for malware detection.
Next step was identification of a comprehensive dataset that can be used to train the AI model. We considered the following properties in the dataset:
- Publicly/easily available
- Targeted for our features
- Large enough to provide appropriate results
- Should cover latest/recent malware as well
This search resulted in the CIC’s dataset named InvesAndMal2019 (release date: 2019) to be selected for this AI model training.
https://www.unb.ca/cic/datasets/invesandmal2019.html
Targeted dataset has provided the results in their paper using Machine Learning techniques such as Random Forest, Decision Tree etc. We initially tried to re-train the same models; which was successful in the colab notebooks (Python based notebook and Google’s cloud service to experiment with AI models) but couldn’t be translated into Tensor Flow lite files (the files that contain the AI model information and is included into the app for prediction by providing an input). After a lot of failed attempts, we moved on to experiment with Neural Networks (using the Keras library), which was easy to use once the input (tensor) creation was understood. We initially experimented with approx. 300 features to train a simple 3 layered Multi Layered Perceptron (MLP); which was successfully loaded into the mobile application and experimented with multiple test inputs. Later, feature reduction activity (using chi-square technique) was performed, resulting in a dataset which contain 1024 features reduced from originally 8000+ features. These reduced features were then used to train a different AI Model that was Convolutional Neural Network (CNN). This was also successfully loaded into the mobile app. In parallel, the task of obtaining feature extraction at run-time was also completed, which is further discussed in next section.
On-device feature extraction was required for an app to be predicted by the AI model. These features included the following:
Permissions are extracted from the manifest file by using the package manager and built in APIs. The details of the permission extraction mechanism is explained on Android’s developer guide. Link: https://developer.android.com/reference/android/content/pm/PackageManager#GET_PERMISSIONS
Extraction of this features was relatively tough to implement. There doesn’t exist a built-in API to access the intent-filters from the manifest directly. Therefore, we had to resolve to the extracting the manifest file, then using the XML-parsers to extract intent-filters. We identified an open-source app that covered the extraction of manifest file and catered the XML parsing part, we included the intent filter extraction part into it ourselves and generated a list of intent-filters for a specific app. Link to Opensource App: [Not Available]
The app’s front end is very simple, the app starts to display all the non-system apps into a recycler view as list items. User can click on any list item to calculate that app’s Malicious/Benign status and its confidence level. The functionality of the apps is birefly discussed later in the document.
The app consists of the following four activities java files:
- Main
- List Apps
- List Adapter
- Malware Model
These function of each file is explained below:
This is the launching pad into the list apps activity. This includes an AI model and some user interface elements that was used in previous experimentation and aren’t currently used except the button on lower left corner. This button is used to move into the List Apps Activity.
This activity is used to host the recycler view (list generator item in Android) and gets the installed packages (initially all installed packages are listed then they are short-listed to only include non-system apps only). These installed packages are then handed over to the List Adapter class to generate the list.
This class performs most of the functionality that includes:
- Feature Extraction Permissions
- Feature Extraction Intent Filters
- Creation and update of the View holders (list items)
Note: This app doesn’t load the AI model and perform prediction. There was issue in placing the Load Model function into the list adapter therefore it was placed separately into the Malware Model Activity. This app after extracting features for an app (after the click) creates an input tensor that is handed over to the next activity to perform the prediction and acquire result.
This activity mainly contains the following tasks:
- Loading of the AI model
- Input of the tensor to the loaded model
- Perform prediction
- Acquire predicted value and display it on screen with Benign/Malicious label
This activity provides user with the result/status of an app. Results: The result of the installed app is currently displayed on-click.
[Other related information]
Any TFLite model file can be viewed in Netron (not neutron) named (open-source) software. Link: https://github.com/lutzroeder/netron