Skip to content

The Pictionary app uses LLaMA 3.1 to generate random drawing prompts and LLaMA 3.2 Vision to predict and judge user drawings based on these prompts. It provides an interactive and fun way to test your drawing skills within a set time limit.

License

Notifications You must be signed in to change notification settings

AdritPal08/Multimodal-Pictionary-using-LLaMA-3.1-and-LLaMA-3.2-Vision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal Project: A Pictionary App Using Python, Streamlit, Llama 3.1, and Llama 3.2 Vision

Welcome to the Pictionary app! When you open the app, you’ll receive a drawing prompt (for example: “Draw this: Smiling sun”). This prompt is generated using the LLM model - Llama 3.1.

On the left side, you’ll find various drawing tools. You can choose any tool based on your needs, adjust the stroke width, select stroke colors, and even change the background color of your drawing canvas.

You will have a set amount of time to complete your drawing. If you finish before the timer runs out, you can press the ‘Predict Image’ button to proceed to the next step. Here, the LLM Vision Model - Llama 3.2 Vision will predict what the image is.

Based on the Vision Model’s prediction and the original drawing prompt, the LLM model - Llama 3.1 will judge whether you have drawn the image correctly. If your drawing matches the prompt, you’ll see a “PASS” message. If it doesn’t, you’ll see a “FAIL” message.

App Demo

Key Features:

  1. Random Drawing Prompts : Generates random, easy-to-draw image concepts using the LLM model - Llama 3.1.
  2. Customizable Drawing Tools : Provides a variety of drawing tools, including options to adjust stroke width, stroke color, and background color.
  3. Timer Functionality : Includes a countdown timer to ensure drawings are completed within a set time limit.
  4. Image Prediction : Uses the LLM Vision Model - Llama 3.2 Vision to predict what the drawn image represents.
  5. Judging Accuracy : Compares the Vision Model’s prediction with the original drawing prompt using Llama 3.1 to determine if the drawing matches the prompt. Displays a “PASS” message if the drawing matches the prompt and a “FAIL” message if it does not.
  6. Interactive and User-Friendly Interface : Features an intuitive layout with easy-to-use drawing tools and clear instructions.

It leverages the following technologies:

Python: Python is a popular, versatile programming language known for its simplicity and readability. It is widely used for various applications, including web development, data analysis, machine learning, and automation tasks. Python's extensive ecosystem of libraries and frameworks makes it a powerful tool for developers.

LLaMA 3.1 (70b): LLaMA (Lean Large-Language Model) is a family of large language models developed by Meta AI. The 3.1 (70b) version refers to a specific model variant with 70 billion parameters. Large language models like LLaMA are trained on vast amounts of text data, allowing them to understand and generate human-like text for various natural language processing tasks.

LLaMA 3.2 Vision: LLaMA (Lean Large-Language Model) Vision is an advanced variant of the LLaMA family developed by Meta AI, specifically designed for multimodal tasks. The 3.2 Vision model integrates visual understanding capabilities with natural language processing, enabling it to interpret and generate descriptions for images. This model leverages extensive training on both text and image data, allowing it to perform tasks such as image captioning, visual question answering, and image-based predictions with high accuracy.

Groq API: Groq API provides access to Groq's powerful AI inference platform. It enables developers to leverage their advanced hardware and software for rapid and efficient AI model execution.

Streamlit: Streamlit is an open-source Python library that simplifies the process of building interactive data visualization and machine learning web applications. It allows developers to create user interfaces by writing Python scripts, making it easier to share data-driven applications with others.

License :

GNU GENERAL PUBLIC LICENSE Version 3

Follow Me :

GitHub LinkedIn Kaggle

  • If you like my work and it helped you in anyway then please do ⭐ the repository it will motivate me to make more amazing projects

About

The Pictionary app uses LLaMA 3.1 to generate random drawing prompts and LLaMA 3.2 Vision to predict and judge user drawings based on these prompts. It provides an interactive and fun way to test your drawing skills within a set time limit.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages