CONTEXI

Contexi let you interact with entire codebase or data with context using a local LLM on your system.

Contexi uses:

Multi Prompt Contextually Guided Retrieval-Augmented Generation
Self-Critique & Self-Corrective using Chain-of-Thoughts
Document Re-Ranking

techniques to provide the most relevant context-aware responses to questions about your code/data.

Key Features

✅ Analyzes and understands your entire codebase and data, not just isolated code snippets.
✅ Answers questions about potential security vulnerabilities anywhere in the code.
✅ Import code using git url for analysis.
✅ Learns from follow-up questions and continuously answers based on chat history context
✅ Runs entirely on your local machine for free, No Internet is required.

Web UI

How it works?

🚀 Get started with Wiki

Pre-requisites

Ollama - Preferred models: qwen2.5 (for more precise results)
Recommended 16 GB RAM and plenty of free disk space
Python 3.7+
Various Python dependencies (see requirements.txt)

Supported Programming Languege/Data:

Tested in Java codebase (You can configure config.yml to load other code/file formats)

Installation

We'd recommend installing app on python virtual environment

Clone this repository:

git clone https://github.com/AI-Security-Research-Group/Contexi.git
cd Contexi

Install the required Python packages:
```
pip install -r requirements.txt
```
Edit config.yml parameters based on your requirements.
Run
```
python3 main.py
```

Usage

Upon running main.py just select any of the below options:

(venv) coder@system ~/home/Contexi $

Welcome to Contexi!
Please select a mode to run:
1. Interactive session
2. UI
3. API
Enter your choice (1, 2, or 3):

You are ready to use the magic stick. 🪄

API Mode

Send POST requests to http://localhost:8000/ask with your questions.

Example using curl:

curl -X POST "http://localhost:8000/ask" -H "Content-Type: application/json" -d '{"question": "What is the purpose of the Login class?"}'

Response format:

{
  "answer": "The Login class is responsible for handling user authentication..."
}

Open an Issue if you're having problem with running or installing this script. (Script is tested in mac environment.)

Customization

You can customize various aspects of the script:

Adjust the chunk_size and chunk_overlap in the split_documents_into_chunks function to change how documents are split.
Modify the PROMPT_TEMPLATE to alter how the LLM interprets queries and generates responses.
Change the max_iterations in perform_crag to adjust how many times the system will attempt to refine an answer.
Modify the num_ctx in initialize_llm to adjust the llm context window for better results.
Adjust n_ideas parameter to define the depth of accuracy and completeness you need in the answers.

Troubleshooting

If you encounter memory issues, try reducing the chunk_size and num_ctx or the number of documents processed at once.
Ensure that Ollama is running and the correct model name is mentioned in config.yml file.

Use Cases

Codebase Analysis: Understand and explore large code repositories by asking natural language questions.
Security Auditing: Identify potential security vulnerabilities by querying specific endpoints or functions.
Educational Tools: Help new developers understand codebases by providing detailed answers to their questions.
Documentation Generation: Generate explanations or documentation for code segments. AND MORE..

To-Do List for Contributors

Security Workflow (To-Do)

Use Semgrep to identify potential vulnerabilities based on patterns.
Pass the identified snippets to a data flow analysis tool to determine if the input is user-controlled.
Provide the LLM with the code snippet, data flow information, and any relevant AST representations.
Ask the LLM to assess the risk based on this enriched context.
Use the LLM's output to prioritize vulnerabilities, focusing on those where user input reaches dangerous functions.
Optionally, perform dynamic analysis or manual code review on high-risk findings to confirm exploitability.

Contributing

Contributions to Contexi are welcome! Please submit pull requests or open issues on the GitHub repository.

Acknowledgments

This project uses Ollama for local LLM inference.
Built with LangChain, Streamlit and FastAPI.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
LICENSE		LICENSE
README.md		README.md
app_ui.py		app_ui.py
config.yml		config.yml
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CONTEXI

Key Features

Web UI

How it works?

Pre-requisites

Supported Programming Languege/Data:

Installation

Usage

API Mode

Customization

Troubleshooting

Use Cases

To-Do List for Contributors

Security Workflow (To-Do)

Contributing

Acknowledgments

About

Releases

Packages

Languages

License

AI-Security-Research-Group/contexi

Folders and files

Latest commit

History

Repository files navigation

CONTEXI

Key Features

Web UI

How it works?

Pre-requisites

Supported Programming Languege/Data:

Installation

Usage

API Mode

Customization

Troubleshooting

Use Cases

To-Do List for Contributors

Security Workflow (To-Do)

Contributing

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages