Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike/PoC utilizing local LLM to enable better understanding OpenTDF #239

Open
jrschumacher opened this issue Jul 25, 2024 · 1 comment · May be fixed by #236
Open

Spike/PoC utilizing local LLM to enable better understanding OpenTDF #239

jrschumacher opened this issue Jul 25, 2024 · 1 comment · May be fixed by #236

Comments

@jrschumacher
Copy link
Member

As we approach running a Spike/PoC we want to break it down into steps.

Hypothesis

We believe that if we can create the ability to embed an CPU based LLM into the CLI we can enable users to get tailored support with running the OpenTDF platform. This support will enable them to deploy and administrate the platform quickly without needing specific guidance from a human.

The benefit of this approach is that it enables humans with limited knowledge to quickly learn how to do a process without having to invest vast quantities of time reading or scouring resources. This is especially true for platforms that have limited documentation and/or examples that may not fit the exact problem at hand. Additionally, this approach will satisfy the environmental constraints such as air-gapped environments, need-to-know limitations, and limited connectivity.

Solution

Implement a LLM solution based on the work of https://github.com/ollama/ollama to load a user provided pre-installed model.

Approach

  1. Utilizing langchaingo and ollama get a chat interface working in the CLI
  2. Implement some simple prompt-engineering to focus the LLM
  3. Investigate RAG with an embedded vector db
@jrschumacher jrschumacher linked a pull request Jul 25, 2024 that will close this issue
10 tasks
@andrewrust-virtru
Copy link

andrewrust-virtru commented Jul 30, 2024

Current Status

Currently ollama models are functional, and accessible from the otdfctl chat command.

Configurations are managed in a chat_config.json file located in the home directory, and are being loaded in via the chat_config.go file. This is temporary as there is certainly a more graceful way of storing and managing chat parameters. Currently the parameters are:

{
    "model": "llama3",
    "verbose": true,
    "apiURL": "http://localhost:11434/api/generate"
}

This could technically be model agnostic so long as it runs on the same port and url, and has the same REST-like structure for handling queries that ollama supports. Verbosity could also be changed to a string ("high", "med", "low") but it is simpler to start with a bool.

Currently, verbose controls if the entire sanitized prompt is shown to the user before a response. This can and should include much more especially during initial startup.

TODOs:

  • Graceful startup in initial loading of configs using chat_config.go
  • Graceful exits and additional error checks for if the model is not running or if there are other trivial issues
  • Test secondary model and refine configurations to make implementation more model-agnostic (Gemma, TinyChatEngine)
  • Organize sanitization prompts for different levels of user familiarity
  • Collect and disseminate generalized Q&As to quality-test our prompt engineering efforts (vibe-check the model and prompting for our use-case)
  • Open-ended: Investigate improved prompt engineering efforts for ollama models
  • Open-ended: Benchmark both 'performance' as well as speed between model types but more explicitly on prompt engineering efforts

Links to explore for improved prompt engineering:
llama2 prompt engineering
How to prompt llama3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants