TBD
We use MLFlow
to centralize all training performance.
To reproduce our examples in your MLFlow store, use the
following instructions
- Build the database (remove
--max_pages 20
if you want to build the whole database)
python run_build_database.py --max_pages 20 --experiment_name "BUILD_CHROMA_TEST"
- Evaluate model performance
python run_evaluation.py --experiment_name BUILD_CHROMA_TEST
Build complete INSEE dataset based on parquet files stored in S3 bucket (Need S3 credential and SSP Cloud Access)
cd llm-open-data-insee pip install -r requirements.txt pre-commit install
python src/db_building/insee_data_processing.py
mc cp -r s3/projet-llm-insee-open-data/data/chroma_database/chroma_db/ data/chroma_db