Skip to content

Latest commit

 

History

History
25 lines (17 loc) · 2.3 KB

File metadata and controls

25 lines (17 loc) · 2.3 KB

Uncovering the breast cancer metabolic landscape through multi-modal machine learning and metabolic modelling

This repository contains the code and data to reproduce the results presented in the paper “Uncovering breast cancer metabolic landscape through multi-modal machine learning and metabolic modelling"

The framework integrates machine learning with patient-specific metabolic modelling to predict risk for breast cancer patients. The repository contains 3 main folders:

  • Data preprocessing: providing the preprocessing code, including feature selection techniques for transcriptomic and fluxomic data;
  • Metabolic modelling: providing the Matlab code for GSMM to generate patient-specific flux rates (fluxomic data);
  • ML models: providing the Jupyter notebook with the code to run the machine learning (ML) models. The code is reproducible with different number of selected omic features; hence, we provide the ML results for the optimal selected omic features for each data modality and their combinations.
  • An end-to-end tutorial in a Google Colab notebook allowing users to easily analyse clinical and transcriptomic data and investigate significant alternations both at the single-cell and spatial levels.
  • The data used in this study can be downloaded at TCGA website: https://portal.gdc.cancer.gov/. We provide all data used in this study, including raw and preprocessed clinical, raw transcriptomic, and fluxomic data generated by metabolic model (https://figshare.com/articles/dataset/Data/22337722).

    How to run

    The following steps are required to run the code:

  • Python 3.9.x and R version 4.2.x are required, a check on the specification for the used packages (requirement.txt) is required before running the code.
  • Jupyter notebook server is required.
  • Ensure all pip dependencies are installed as listed in requirements.txt.
  • Run through the steps laid out in the notebook in the order of folders (starting with the preprocessing folder, metabolic modelling, to ML models).
  • License

    This code is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Public License for more details.

    Le Minh Thao Doan - May 2024