Home  ›  All Blogs  ›  nikhilj  › 

Security and privacy of AI or ML applications - A layman's guide


machine learning security skull

Machine Learning(ML) is under exponential growth these days. Businesses, Academia and tech enthusiasts are really hyped about trying out ML to solve their problems. Students are driven to learn this new cool tech. Just like every other technology, ML comes with awesome applications topped with some serious implications.

Incorrectly implemented ML systems can lead to security and privacy issues. The severity of which depends on how critical the use case is. You might not want face recognition software to authenticate an adversary while a banana misclassified as apple would not bother you much. It is critical to understand these types of flaws to help us build more secure and privacy centered ML applications.

Our experience with testing ML applications and frequently discovering low hanging yet critical flaws in ML systems is an inspiration to write this blog. We will take a look at some commonly occuring flaws in ML systems in layman language. In upcoming blogs we will discuss every vulnerability while doing justice to its technical details.

Adversarial Learning Attack

It involves generating a specially crafted input with an objective to be misclassified by the target model. Theoretically it is always possible for an attacker to generate adversarial samples. Complexity of attack may differ based on the abstraction at which the model is operating. Specially designed algorithms are used to perturb a sample input. These perturbations cause the input to be misclassified to unwanted class. Adversaries can choose to do targeted Adversarial learning attacks where the input will be classified to a specific class.


The above example targets an image classification model. The stop sign image presented on the left is introduced with some perturbation which causes the model to classify the stop sign as mailbox. It is important to note that Adversarial attacks are not computer vision domain, they are demonstrated on audio processing (fooling google home and alexa) and NLP domains too.

Whitebox Adversarial Learning Attacks

It is assumed that attackers have access to the prediction pipeline of target applications such as input details, access to layers, model weights, details regarding inference of outputs. This information can make it easier to generate adversarial samples.

Blackbox Adversarial Learning Attacks

Most of the time the model is deployed on cloud and the user has only API access to the model. Not having direct access to model information indeed makes it difficult to generate adversarial samples but it’s not impossible. Research has shown that the adversarial samples are transferable.


Which means an adversarial sample for one model similar to target model can also be applicable to target model. So the attacker can train a model locally for similar use-case and generate adversarial sample for that, which in turn are potentially effective against the target model.

Model Stealing

Trained models are the most important assets of ML based businesses. Generally the users are charged based on the number of requests they make to a trained model. This is enough inspiration for adversaries to steal the model and get unlimited free access to it. It can also cause a considerable financial impact on the companies/organizations who are heavily dependent on these models to generate revenue. Trained models can be deployed on cloud or on devices based on the use-case. For example, an antivirus using deep learning must deploy their classifier model on a device for user convenience. On the other hand an image classification application may deploy the trained model to the cloud and provide access to the end user via web APIs.

Stealing Locally Deployed Models

Models deployed locally on a user’s device can be extracted by attackers. From our experience, almost all the times the models that are stored locally have no protection against these attacks. Locally deployed models without any encryption or obfuscated code are very easy for attackers to steal. Simple reverse engineering techniques can be leveraged to understand input required and output inference from the model. More details about this attack are explained in this blog post.

Duplicating Remotely Deployed Models

But you may ask what if the models are deployed on cloud. No direct access to application or model means no model stealing right? But this is certainly not the case.


Models Deployed on cloud has API access provided to supporting applications or to the end users. Attackers can abuse this API access to generate a labeled dataset. This dataset can then be used to generate a duplicate copy of the target model. Yes, this attack has varying complexity based on factors like availability of dataset, dimensionality of input space, complexity of use-case, complexity of model, etc. But one important thing to notice here is, if a model is a simple linear model and the dimensionality of input is considerably low then the attacker can simply solve a few equations and create an exact replica of the deployed model. Our research demonstrates a method called GDALR that makes it even easier to steal remotely deployed models while optimizing the number of queries and model duplication costs. This attack is certainly not one of the most concerning threat unless your model has less parameters thus very easy to duplicate

Data Poisoning/Model Skewing

Some applications could use end user’s feedback to retrain the model. This feedback interface can be fed with incorrect feedback to skew the target model in the required direction.


This attack can again be automated by creating bots that simulate the user feedback or by running campaigns to feed intended feedback to the model. For example, consider an ML application that flags phishing mails but also takes feedback from user to unflag the false positives. This feedback can be abused to mark certain mail as “not phishing” mail. From our experience on testing applications against Model Skewing attacks, the number of abused feedbacks to skew the model is surprisingly low.

Unintended Memorization and Training Data Inference

Trained models can be used to re-generate samples from training sets. Research has shown that face recognition models can be used to generate faces from training sets. Also, natural language processing models have a tendency to overfit or memorize unintended data. These models can be given crafted input to leak sensitive information from the training set. This could be a potential privacy issue. Also, serialized vectorizers in NLP applications can be easily reverse engineered to leak sensitive information that was vectorized.


Image: XKCD 2169

Miscellaneous Attacks

It is not always required to use sophisticated heuristics to generate adversarial samples, we have seen that simple non heuristic perturbations are also sufficient to fool the model. Adversarial patches can be developed and physically printed and used to perform adversarial attacks. Particularly in CNN models, filters can be visualized to get an insight of adjacent classes in hyperspace and this information can be used to create adversarial samples. Commonly known bias identification frameworks like LIME and SHAP can be used to understand the bias in target models and can be used to craft an input that can be misclassified.

Hybrid Attacks

All above discussed attacks can be paired with traditional web/mobile/IoT attacks and may lead to more severe effects. For example, weaker encryption implemented while transmitting the model to the user’s device can be exploited to extract the model, hence helpful for attackers to perform targeted adversarial attacks. Zero day exploits in commonly used frameworks like opencv can be leveraged to leak/manipulate the training dataset.

Traditional pen-testing methodology will not cover assessment against these attacks because most of these attacks are highly domain specific. Testing a ML system against above discussed vulnerabilities requires domain expertise. At payatu, we have orchestrated ways to test the ML systems against these attacks and identify potential security and privacy threats. We also provide hands-on training programs which are specifically designed for security researchers and ML practitioners to educate them on above topics.

In upcoming posts, we will take a deeper and more technical dive into above discussed attacks. We will also be discussing the strategies to prevent adversaries from exploiting these flaws. Follow Payatu on social media handles to get the notification for upcoming posts. Feel free to reach out to me or out Payatu team for any queries and suggestions.

Get to know more about our process, methodology & team!

Close the overlay

I am looking for
Please click one!