Skip to content

Angione-Lab/Multi-omic_ML_risk_prediction_breast_cancer

Repository files navigation

Uncovering the breast cancer metabolic landscape through multi-modal machine learning and metabolic modelling

This repository contains the code and data to reproduce the results presented in the paper “Uncovering breast cancer metabolic landscape through multi-modal machine learning and metabolic modelling"

The framework integrates machine learning with patient-specific metabolic modelling to predict risk for breast cancer patients. The repository contains 3 main folders:

  • Data preprocessing: providing the preprocessing code, including feature selection techniques for transcriptomic and fluxomic data;
  • Metabolic modelling: providing the Matlab code for GSMM to generate patient-specific flux rates (fluxomic data);
  • ML models: providing the Jupyter notebook with the code to run the machine learning (ML) models. The code is reproducible with different number of selected omic features; hence, we provide the ML results for the optimal selected omic features for each data modality and their combinations.
  • An end-to-end tutorial in a Google Colab notebook allowing users to easily analyse clinical and transcriptomic data and investigate significant alternations both at the single-cell and spatial levels.
  • The data used in this study can be downloaded at TCGA website: https://portal.gdc.cancer.gov/. We provide all data used in this study, including raw and preprocessed clinical, raw transcriptomic, and fluxomic data generated by metabolic model (https://figshare.com/articles/dataset/Data/22337722).

    How to run

    The following steps are required to run the code:

  • Python 3.9.x and R version 4.2.x are required, a check on the specification for the used packages (requirement.txt) is required before running the code.
  • Jupyter notebook server is required.
  • Ensure all pip dependencies are installed as listed in requirements.txt.
  • Run through the steps laid out in the notebook in the order of folders (starting with the preprocessing folder, metabolic modelling, to ML models).
  • License

    This code is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Public License for more details.

    Le Minh Thao Doan - May 2024

    About

    No description, website, or topics provided.

    Resources

    License

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published

    Languages