Skip to content

InseeFrLab/llm-open-data-insee

Repository files navigation

SSPCloud open data chatbot

Running the app in local

TBD

Evaluating best performing model

We use MLFlow to centralize all training performance. To reproduce our examples in your MLFlow store, use the following instructions

  1. Build the database (remove --max_pages 20 if you want to build the whole database)
python run_build_database.py --max_pages 20 --experiment_name "BUILD_CHROMA_TEST"
  1. Evaluate model performance
python run_evaluation.py --experiment_name BUILD_CHROMA_TEST

Build complete INSEE dataset based on parquet files stored in S3 bucket (Need S3 credential and SSP Cloud Access)

cd llm-open-data-insee pip install -r requirements.txt pre-commit install

python src/db_building/insee_data_processing.py

To load a first version of Vectorial Database from S3 bucket

mc cp -r s3/projet-llm-insee-open-data/data/chroma_database/chroma_db/ data/chroma_db

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published