Defense System aganist backdoor attacks on DNN

backdoor detector for BadNets trained on the YouTube Face dataset

How should you run the project

https://drive.google.com/drive/folders/1mf9UHHPq6tg8kZGlFTrFpCJfkg4zhWiB?usp=sharing

Add this google drive folder to your Drive and follow the Notebook snippets, In the google Drive folder you can check results in

results folder
repaired-networks folder

There are similar kinda folders (results_we_got and repaired-networks_we_got) in which you can see the results we got..

Introduction

The project is about detecting the backdoor attacks via input filters, neuron pruning and unlearning. So with the trained DNN model we have to find if there is any input trigger that would produce misleading classifications when trigger is added to input i.e(adversarial images)

What is this backdoor attack ?

To know this first we have to know what doesn't fall into this category,

It is not image specific modification (not Adversarial attack)
It isn't adversarial poisoning (where an incorrect label assosiation is done at training time or modifications on a trained model)

Thie Backdoor attack is where unexpected results will happen when a trigger is added to input. So if there is no trigger then this model is perfectly fine.

Bad Net: generated by training the model with the adversarial images and actual images, which gives 99% success rate. One other approach is Troajan Attack (latest one) is far more efficient and requires less data.

Now about Defense System aganist Backdoors

Part 1 The given attack model:

The given model is backdoored DNN and it only reveals trigger(collection of pixels and its associated colors) when it's used to predict (stealth)

Part 2 What we are gonna accomplish:

Detecting backdoor and label it as separate class.
Identifying the trigger used
Lastly we gonna make Backdoor DNN right

How we're gonna detect backdoors

First we find the minimal trigger to misclassify all labels into this target label
We're gonna do that to all labels and then we use outllier detection to find the real trigger, so the real trigger is very small compared to others.
Now as we have found which neurons get activated by the trigger, we gonna remove the newrons that are related to the backdoor approach (Patching DNN via Neuron Pruning) OR We can unlearn the neurons by adding reveresed trigger.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
models		models
repaired-networks		repaired-networks
results		results
LICENSE		LICENSE
MLSProject.ipynb		MLSProject.ipynb
MLSProjectPDFCollabBOOk.pdf		MLSProjectPDFCollabBOOk.pdf
ML_Security.pdf		ML_Security.pdf
README.md		README.md
architecture.py		architecture.py
eval.py		eval.py
results.zip		results.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Defense System aganist backdoor attacks on DNN

How should you run the project

Introduction

What is this backdoor attack ?

Now about Defense System aganist Backdoors

Part 1 The given attack model:

Part 2 What we are gonna accomplish:

How we're gonna detect backdoors

About

Releases

Packages

Contributors 3

Languages

License

mahaV503/CSAW-HackML-using-Neural-Cleanse

Folders and files

Latest commit

History

Repository files navigation

Defense System aganist backdoor attacks on DNN

How should you run the project

Introduction

What is this backdoor attack ?

Now about Defense System aganist Backdoors

Part 1 The given attack model:

Part 2 What we are gonna accomplish:

How we're gonna detect backdoors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages