Skip to content

neulab/lti-bloom-deployment

 
 

Repository files navigation

Fast Inference Solutions for BLOOM

This repo provides demos and packages to perform fast inference solutions for BLOOM. Some of the solutions have their own repos in which case a link to the corresponding repos is provided instead.

Some of the solutions provide both half-precision and int8-quantized solution.

Client-side solutions

Solutions developed to perform large batch inference locally:

Pytorch:

JAX:

Server solutions

Solutions developed to be used in a server mode (i.e. varied batch size, varied request rate):

Pytorch:

Rust:

About

Fast Inference Solutions for BLOOM

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.5%
  • Shell 0.5%