mirror of
https://www.modelscope.cn/OpenBMB/MiniCPM-o-2_6-int4.git
synced 2025-08-18 14:05:52 +08:00
readme add usage
This commit is contained in:
parent
47d04c823c
commit
78057ccf7c
39
README.md
39
README.md
@ -28,3 +28,42 @@ tags:
|
|||||||
This is the int4 quantized version of [**MiniCPM-o 2.6**](https://modelscope.cn/models/OpenBMB/MiniCPM-o-2_6).
|
This is the int4 quantized version of [**MiniCPM-o 2.6**](https://modelscope.cn/models/OpenBMB/MiniCPM-o-2_6).
|
||||||
Running with int4 version would use lower GPU memory (about 9GB).
|
Running with int4 version would use lower GPU memory (about 9GB).
|
||||||
|
|
||||||
|
### Prepare code and install AutoGPTQ
|
||||||
|
|
||||||
|
We are submitting PR to officially support minicpm-o 2.6 inference
|
||||||
|
|
||||||
|
```python
|
||||||
|
git clone https://github.com/OpenBMB/AutoGPTQ.git && cd AutoGPTQ
|
||||||
|
git checkout minicpmo
|
||||||
|
|
||||||
|
# install AutoGPTQ
|
||||||
|
pip install -vvv --no-build-isolation -e .
|
||||||
|
```
|
||||||
|
|
||||||
|
### Usage of **MiniCPM-o-2_6-int4**
|
||||||
|
|
||||||
|
Change the model initialization part to `AutoGPTQForCausalLM.from_quantized`
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from transformers import AutoModel, AutoTokenizer
|
||||||
|
from auto_gptq import AutoGPTQForCausalLM
|
||||||
|
|
||||||
|
model = AutoGPTQForCausalLM.from_quantized(
|
||||||
|
'openbmb/MiniCPM-o-2_6-int4',
|
||||||
|
torch_dtype=torch.bfloat16,
|
||||||
|
device="cuda:0",
|
||||||
|
trust_remote_code=True,
|
||||||
|
disable_exllama=True,
|
||||||
|
disable_exllamav2=True
|
||||||
|
)
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained(
|
||||||
|
'openbmb/MiniCPM-o-2_6-int4',
|
||||||
|
trust_remote_code=True
|
||||||
|
)
|
||||||
|
|
||||||
|
model.init_tts()
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
Usage reference [MiniCPM-o-2_6#usage](https://huggingface.co/openbmb/MiniCPM-o-2_6#usage)
|
||||||
|
Loading…
x
Reference in New Issue
Block a user