Skip to content

Exploratory Data Analysis of a target variable to a set of predictors

License

Notifications You must be signed in to change notification settings

DaymondLing/ExploratoryDataAnalysis.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ExploratoryDataAnalysis

Stable Dev Build Status Coverage

The phrase Exploratory Data Analysis can have many meanings, in this context, it means assess the degree of association between a target and a large number of predictors.

ExploratoryDataAnalysis calculates the following metrics from the contigency table of predictor vs. target:

  • Mutual Information: Kullback-Liebler Divergence of the observed probabilities from the conditionally independent probabilities constructed from the observed row and column marginals

  • Phi coefficient: ϕ is sqrt(χ² / n)

If target is binary,

  • Information Value: Symmetric Kullback-Liebler Divergence between the Class 1 distribution and Class 0 distribution

Computationally, entropy based metrics such as Mutual Information and Information Value need to take care of 0 probabilities as they result in Infinite entropy. Many literature and implementations add a small positive number to the frequency table to avoid log of 0, this is because the software isn't capable of dealing with infinities. Julia, however, does handle infinities gracefully, thus this package use Infinities when there are 0 probabilities. To avoid infinities and also not artificially adjust probabillities, re-bin the data so that there are no 0's in the frequency table.

About

Exploratory Data Analysis of a target variable to a set of predictors

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages