Philipp Schmid • 8/31/2023

Optimize open LLMs using GPTQ and Hugging Face Optimum

This technical tutorial explains how to apply GPTQ post-training quantization to open-source large language models (LLMs) using the Hugging Face Optimum and AutoGPTQ libraries. It covers setting up the environment, preparing a quantization dataset, loading and quantizing a model, and testing performance and inference speed, enabling models to run on less hardware with minimal performance loss.

0 comments

#llm #Hugging Face #Quantization