Skip to content

Generative models for creating synthetic data from Boston housing dataset

Notifications You must be signed in to change notification settings

mcoric96/Generative-modelling-Boston-dataset

Repository files navigation

Generative-modelling-Boston-dataset

Generative models for creating synthetic data from Boston housing dataset.

Boston dataset is preprocessed in data_preparation.ipynb file.
Load preprocessed data from boston_dataset_data.mat file.

Abstract

The Boston Housing Dataset (https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html) is small size dataset for benchmark machine learning algorithms.
Dataset contains 506 cases, each with 14 attributes (13 numerical/categorical predictive variables and 1 one target variable: median value of owner-occupied homes in $1000's).

Second and fourth column from predictors are deleted and target variable is joined to final dataset for generative modelling.
Shape of final dataset boston_dataset_data.mat is (506,12).

Load preprocessed data with:

boston_data = loadmat('boston_dataset_data')['boston_dataset_data']

Generative models included:

  • Gaussian mixture models

Dataset

Distributions of 12 variables used for generative modelling: image

Correlation
image

DBSCAN clustering analysis of preprocessed data
image

About

Generative models for creating synthetic data from Boston housing dataset

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published