Skip to content

Finding the Outliers in a given dataset using the Gaussian Distribution Algorithm

Notifications You must be signed in to change notification settings

PrateekDey/Anomaly-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Anomaly-Detection

Firstly the Generation of the algorithm is done on the small dataset with specified number of outliers present in it. Represented below is a Scatter graph of the data values Xi which is Throughput(mb/s) vs Latency(ms). alt text

Estimating the Parameter of the Gaussian curve from the Data values of Xi, i.e, mean and Standard deviation for plotting the Gaussian Curve. Represented below is a Visual Fit of the Gaussian Curve density over the data values estimating the highest density of the values. alt text

After this , estimating the threshold value for cross validation the data values. The probability of the data value is normal or an anomaly is detected and marked in red circle in the represented graphs below.

alt text

Result, A total of 17 outliers are found from the small dimensional data set. Now using the high dimensional dataset, estimating the parameters and selecting a threshold value for the cross validation is done. Result , A total of 117 outliers are found in the high dimensional dataset.