Skip to content

The purpose of this project is to promote understanding -- my own and others' -- of fundamental data science and machine learning concepts and tools. It currently consists of one notebook that classifies fruit types based on weight, volume, and image data.

Notifications You must be signed in to change notification settings

gtrunz/fruit_dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Fruit Dataset

Methods Used

  • Machine Learning with K-Nearest Neighbors, Decision Tree, and Random Forest Classification algorithms
  • Cross-validation using pipelines with column transformers
  • Dimensionality reduction with Principal Component Analysis (PCA)
  • Exploratory data analysis and data visualization

Purpose

The purpose of this project is to promote understanding -- my own and others' -- of fundamental data science and machine learning concepts and tools.

Files

  • data.csv: A dataset consisting of 400 fruits with attributes related to the fruit type, subtype, weight, volume, and a set of 28,224 variables that represent RGB color values of an image of the fruit. The dataset is relatively simple and clean, but rich enough to explore classification, regression, dimensionality reduction, and other tasks.

  • classification.ipynb: Jupyter notebook providing an analysis of how machine learning models can accurately classify fruits according to their type based on the image, weight and volume data, with explanations of the methods being used and links to external resources along the way.

  • toy.py: a script used in classification.ipynb to show examples of certain concepts using toy data.

Technologies and Packages Used

  • Jupyer Notebook
  • Python (3.10.8)
  • Scikit-Learn (1.1.3)
  • Pandas (1.5.2)
  • NumPy (1.23.5)
  • SciPy (1.9.3)
  • Matplotlib (3.6.2)
  • Seaborn (0.12.1)
  • Bokeh (3.0.3)
  • Pillow (9.3.0)
  • regex (2.2.1)

Status

This project currently consists of one notebook (classification.ipynb), but it is possible that additional notebooks will be added in the future.

About

The purpose of this project is to promote understanding -- my own and others' -- of fundamental data science and machine learning concepts and tools. It currently consists of one notebook that classifies fruit types based on weight, volume, and image data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published