Fast Inference Solutions for BLOOM

This repo provides demos and packages to perform fast inference solutions for BLOOM. Some of the solutions have their own repos in which case a link to the corresponding repos is provided instead.

Some of the solutions provide both half-precision and int8-quantized solution.

Client-side solutions

Solutions developed to perform large batch inference locally:

Pytorch:

Accelerate, DeepSpeed-Inference and DeepSpeed-ZeRO
Thomas Wang is working on a Custom Fused Kernel solution - will link once it's ready for a general use.

JAX:

BLOOM Inference in JAX

Server solutions

Solutions developed to be used in a server mode (i.e. varied batch size, varied request rate):

Pytorch:

Accelerate and DeepSpeed-Inference based solutions

Rust:

Bloom-server

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
bloom-inference-scripts		bloom-inference-scripts
bloom-inference-server		bloom-inference-server
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fast Inference Solutions for BLOOM

Client-side solutions

Server solutions

About

Releases

Packages

Languages

License

neulab/lti-bloom-deployment

Folders and files

Latest commit

History

Repository files navigation

Fast Inference Solutions for BLOOM

Client-side solutions

Server solutions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages