From 78057ccf7cbb69ada19b3ad6cc30387a0c7e7271 Mon Sep 17 00:00:00 2001 From: Hongji Zhu Date: Fri, 17 Jan 2025 15:19:21 +0800 Subject: [PATCH] readme add usage --- README.md | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/README.md b/README.md index 6c2d795..13eac9f 100644 --- a/README.md +++ b/README.md @@ -28,3 +28,42 @@ tags: This is the int4 quantized version of [**MiniCPM-o 2.6**](https://modelscope.cn/models/OpenBMB/MiniCPM-o-2_6). Running with int4 version would use lower GPU memory (about 9GB). +### Prepare code and install AutoGPTQ + +We are submitting PR to officially support minicpm-o 2.6 inference + +```python +git clone https://github.com/OpenBMB/AutoGPTQ.git && cd AutoGPTQ +git checkout minicpmo + +# install AutoGPTQ +pip install -vvv --no-build-isolation -e . +``` + +### Usage of **MiniCPM-o-2_6-int4** + +Change the model initialization part to `AutoGPTQForCausalLM.from_quantized` + +```python +import torch +from transformers import AutoModel, AutoTokenizer +from auto_gptq import AutoGPTQForCausalLM + +model = AutoGPTQForCausalLM.from_quantized( + 'openbmb/MiniCPM-o-2_6-int4', + torch_dtype=torch.bfloat16, + device="cuda:0", + trust_remote_code=True, + disable_exllama=True, + disable_exllamav2=True +) +tokenizer = AutoTokenizer.from_pretrained( + 'openbmb/MiniCPM-o-2_6-int4', + trust_remote_code=True +) + +model.init_tts() + +``` + +Usage reference [MiniCPM-o-2_6#usage](https://huggingface.co/openbmb/MiniCPM-o-2_6#usage)