mirror of
https://www.modelscope.cn/OpenBMB/MiniCPM-o-2_6-int4.git
synced 2025-08-17 13:35:52 +08:00
readme add usage
This commit is contained in:
parent
47d04c823c
commit
78057ccf7c
39
README.md
39
README.md
@ -28,3 +28,42 @@ tags:
|
||||
This is the int4 quantized version of [**MiniCPM-o 2.6**](https://modelscope.cn/models/OpenBMB/MiniCPM-o-2_6).
|
||||
Running with int4 version would use lower GPU memory (about 9GB).
|
||||
|
||||
### Prepare code and install AutoGPTQ
|
||||
|
||||
We are submitting PR to officially support minicpm-o 2.6 inference
|
||||
|
||||
```python
|
||||
git clone https://github.com/OpenBMB/AutoGPTQ.git && cd AutoGPTQ
|
||||
git checkout minicpmo
|
||||
|
||||
# install AutoGPTQ
|
||||
pip install -vvv --no-build-isolation -e .
|
||||
```
|
||||
|
||||
### Usage of **MiniCPM-o-2_6-int4**
|
||||
|
||||
Change the model initialization part to `AutoGPTQForCausalLM.from_quantized`
|
||||
|
||||
```python
|
||||
import torch
|
||||
from transformers import AutoModel, AutoTokenizer
|
||||
from auto_gptq import AutoGPTQForCausalLM
|
||||
|
||||
model = AutoGPTQForCausalLM.from_quantized(
|
||||
'openbmb/MiniCPM-o-2_6-int4',
|
||||
torch_dtype=torch.bfloat16,
|
||||
device="cuda:0",
|
||||
trust_remote_code=True,
|
||||
disable_exllama=True,
|
||||
disable_exllamav2=True
|
||||
)
|
||||
tokenizer = AutoTokenizer.from_pretrained(
|
||||
'openbmb/MiniCPM-o-2_6-int4',
|
||||
trust_remote_code=True
|
||||
)
|
||||
|
||||
model.init_tts()
|
||||
|
||||
```
|
||||
|
||||
Usage reference [MiniCPM-o-2_6#usage](https://huggingface.co/openbmb/MiniCPM-o-2_6#usage)
|
||||
|
Loading…
x
Reference in New Issue
Block a user