llm-serving

Here are 66 public repositories matching this topic...

ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Updated Oct 12, 2024
Python

vllm-project / vllm

Sponsor

Star

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda inference pytorch transformer llama gpt rocm model-serving tpu mlops xpu llm inferentia llmops llm-serving trainium

Updated Oct 13, 2024
Python

bentoml / OpenLLM

Star

Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud.

llama mistral fine-tuning mlops bentoml vicuna llm model-inference llmops llm-serving llm-inference open-source-llm llama2 openllm llm-ops llama3-1 llama3-2 llama3-2-vision

Updated Oct 7, 2024
Python

liguodongiot / llm-action

Star

本项目旨在分享大模型相关技术原理以及实战经验。

llm llmops llm-serving llm-training llm-inference

Updated Oct 13, 2024
HTML

bentoml / BentoML

Star

The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering ai-inference llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Oct 12, 2024
Python

skypilot-org / skypilot

Star

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

Updated Oct 13, 2024
Python

sgl-project / sglang

Star

SGLang is a fast serving framework for large language models and vision language models.

cuda inference pytorch transformer moe llama vlm llm llm-serving llava llama2 llama3 llama3-1

Updated Oct 13, 2024
Python

Superduper: Integrate AI models and machine learning workflows with your database to implement custom AI applications, without moving your data. Including streaming inference, scalable model hosting, training and vector search.

Updated Oct 11, 2024
Jupyter Notebook

predibase / lorax

Star

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

transformers pytorch llama gpt lora model-serving fine-tuning llm llmops llm-serving llm-inference

Updated Oct 11, 2024
Python

microsoft / aici

Star

AICI: Prompts as (Wasm) Programs

rust ai wasm inference transformer language-model model-serving wasmtime llm llmops llm-serving llm-inference llm-framework

Updated Oct 9, 2024
Rust

ray-project / ray-llm

Star

RayLLM - LLMs on Ray

distributed-systems transformers ray serving large-language-models llm llmops llm-serving llm-inference

Updated May 28, 2024
Python

mosecorg / mosec

Star

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

python rust machine-learning deep-learning mxnet tensorflow gpu cv pytorch tts hacktoberfest model-serving nerual-network machine-learning-platform jax mlops llm llm-serving

Updated Oct 12, 2024
Python

efeslab / Nanoflow

Star

A throughput-oriented high-performance serving framework for LLMs

cuda inference model-serving llm llm-serving llama2

Updated Sep 21, 2024
Cuda

alibaba / rtp-llm

Star

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

inference llama gpt model-serving llm llmops llm-serving

Updated Oct 12, 2024
C++

rohan-paul / LLM-FineTuning-Large-Language-Models

Star

LLM (Large Language Model) FineTuning

pytorch gpt-3 large-language-models llm llm-serving gpt3-turbo llm-training llm-inference open-source-llm llama2 llm-finetuning mistral-7b

Updated May 19, 2024
Jupyter Notebook

hpcaitech / SwiftInfer

Star

Efficient AI Inference & Serving

deep-learning inference artificial-intelligence llama gpt llm-serving llm-inference llama2

Updated Jan 8, 2024
Python

ray-project / ray-educational-materials

Star

This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.

deep-learning ray distributed-machine-learning ray-tune ray-train ray-distributed llm generative-ai ray-serve ray-data llm-serving llm-inference

Updated Feb 13, 2024
Jupyter Notebook

helixml / helix

Star

Multi-node production AI stack. Run the best of open source AI easily on your own servers. Create your own AI by fine-tuning open source models. Integrate LLMs with APIs. Run gptscript securely on the server

Updated Oct 13, 2024
Go

substratusai / runbooks

Star

Finetune LLMs on K8s by using Runbooks

kubernetes kubernetes-operator mlops ml-platform llmops llm-serving llm-training llm-inference

Updated Aug 28, 2024
Go

galeselee / Awesome_LLM_System-PaperList

Star

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!

system papers paperlist llm-serving llm-inference

Updated Sep 18, 2024

Improve this page

Add a description, image, and links to the llm-serving topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-serving topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-serving

Here are 66 public repositories matching this topic...

ray-project / ray

vllm-project / vllm

bentoml / OpenLLM

liguodongiot / llm-action

bentoml / BentoML

skypilot-org / skypilot

sgl-project / sglang

superduper-io / superduper

predibase / lorax

microsoft / aici

ray-project / ray-llm

mosecorg / mosec

efeslab / Nanoflow

alibaba / rtp-llm

rohan-paul / LLM-FineTuning-Large-Language-Models

hpcaitech / SwiftInfer

ray-project / ray-educational-materials

helixml / helix

substratusai / runbooks

galeselee / Awesome_LLM_System-PaperList

Improve this page

Add this topic to your repo