SEC4ML part-1: Model Stealing Attack on Locally Deployed ML Models
This is the SEC4ML subsection of the Machine Learning series. Here we will discuss potential vulnerabilities in Machine Learning applications. SEC4ML will cover attacks like Adversarial Learning, Model Stealing, Model Inversion, Data poisoning, etc. Most of these attacks are backed by strong literature from researchers. Some of these attacks are proven to be effectively working under very specific settings in labs. Hence might not be directly considered as a risk to ML application. But with relevant advances it can give rise to real world risks.
Training ML models requires a considerable amount of economic and human resources. A lot of businesses treat trained models as their intellectual property. Trained models are also at the center of their revenue generation. Users can be charged on a per-query basis or to get access to different models. It is almost all the time assumed that an attacker will never get access to these trained models. Shortly, you will understand how easy it can be to steal the trained model from ML applications. Also, you don’t need to be a ML practitioner to understand and perform this attack.
The ML pipeline
Above figure displays a very abstract pipeline of how an ML application works in production. A dataset is used to train the model. Trained models are then evaluated and deployed to be used by the end user. Models can be deployed either on cloud or on the user’s machine based on the use case. Sometimes it is the requirement of application to work without the availability of internet, this requires the model to be deployed offline on a user’s device. If the model is deployed online(on cloud), then APIs are used by end users who query the model. There could also be a feedback loop which can be leveraged to generate better models in future. Different parts in the above pipeline can be abused by the attacker. In this blog we will see how prediction APIs and local model deployment can be abused to steal the trained model.
Offline Model Stealing Attack
Here we will look into how Model Stealing attack works when the model is deployed on the end user’s device. According to my experience of testing applications against this attack, I would say that a lot of applications would not even bother about securing their trained models. And it does not require more than a basic knowledge of reverse engineering to get complete access to these models. Just to give you an idea of how the attack works, we will attack a very simple android application. This is an example app built by tensorflow and the details regarding how to build the app are here
Offline Model Stealing Steps
There is no rule on how to approach this attack but following is the flow i prefer to follow.
Get the .apk file
You can use adb to pull .apk of installed application from android device. In our case we have already built the apk so we can skip this step.
Reverse engineer the application
You can use your favourite tool to reverse engineer the apk. Obfuscation can sometimes make it difficult to reverse engineer but it will not be impossible. I have used jadx.
By looking at highlighted the code, we can understand following details about the model
- Input shape accepted by model: (224,224,3)
- Name and Location of trained model: ‘assets/mobilenet.tflite’
- Location of labels: ‘assets/labels.txt’
- What libs are used for model interpretation: tensorflow lite
Here is how a part of labels.txt file looks.
It contains 1001 number of lines. Line number is the index for corresponding label. Which means the model can classify an image to 1001 categories. When we pass the image to model, we will get a tensor (just a fancy name for list) containing 1001 probability values. If the probability at 279th index is highest then it means that the input image has a red fox inside.
Analyse the serialized model
It may happen that you may not identify what library was used to train the model. So you can have a look at hex dump of serialized model and identify the strings belonging to particular library
For example, in the following screenshot you can see that the torch library was used to build the target model. In our target application wee can clearly identify the library by the extension of serialized model file to be tensorflow lite.
Now that we know the required details about the model, we can go ahead with writing a script to load the trained model and run predictions on sample data. You can also take help of the documentation to build the script. You don’t need to be an ML expert to pull this off.
Let’s have a look at the code to load and run interpretations on the target model. Starting with importing libs
import numpy as np import tensorflow as tf from keras.preprocessing import image from PIL import Image
Then we can load the labels from labels.txt file to a list. It will be used to interpret probabilities predicted by model for given input image.
# load labels labels = np.loadtxt('assets/labels.txt',delimiter='\n', dtype='object')
Lets load out input image and convert it to a shape acceptable by target model.
# load data img = image.load_img('image_to_be_infered.png', target_size=(224, 224, 3)) # normalize img = image.img_to_array(img)/255. img = np.expand_dims(img, axis=0)
I have passed the following image for prediction.
The most important part, loading the model. tf.lite.Interpreter is used to load the .tflite model.
# Load TFLite model and allocate tensors. interpreter = tf.lite.Interpreter(model_path="assets/mobilenet.tflite") interpreter.allocate_tensors()
Just to test if we are doing everything right, we can also print the input and output details required by the model before passing the image for prediction.
# Get input and output tensors. input_details = interpreter.get_input_details() output_details = interpreter.get_output_details()
Below you can see the values of input_details and output_details variables respectively. Where we can confirm the input and output shapes and their data types.
Now the best part, pass the image to model for prediction. It should print top 5 predictions based on the probabilities generated by the model.
# Test model on input image interpreter.set_tensor(input_details['index'], img) interpreter.invoke() output_data = interpreter.get_tensor(output_details['index']) output_indices = output_data.argsort()[-5:][::-1] print('\033[94mPredictions: ',str([labels[_] for _ in output_indices])) print('\033[0m')
And here are the predictions generated by our stolen model for the given image. Apparently, the image contains a Persian cat
This is how an attacker can steal the model when it is deployed on the user’s machine.
There is no perfect mitigation against this attack. But as a developer, we can make it harder for attackers to reverse the application. Using traditional methods of code obfuscation it could be pretty difficult for an attacker to identify details like where the model is stored, what is the required shape of input, how the activations in output layer are interpreted, etc. One can also encrypt the trained model but in the end it needs to be decrypted to perform predictions. This is still an open ended research problem looking for industry level solutions.
The next obvious question will be But can we steal models if they are deployed remotely, somewhere on the cloud?. The answer is YES. Lets leave this for an upcoming blog post.
That is all for this blog. Feel free to leave your suggestions and ideas in the comments. More SEC4ML blogs are coming with juicy attacks!!