Update modeling_internlm3.py (#18)

- Update modeling_internlm3.py (94cd46f35e87e1b3b2b82df73230bdb5275cd652)
- Update tokenization_internlm3.py (0f3d7019880c0b6f7a9d35b392d21cbfca07478b)
This commit is contained in:
ai-modelscope 2025-02-26 21:06:53 +08:00
parent 2ecb8953b0
commit 03ffaab065

148
README.md
View File

@ -1,7 +1,7 @@
---
license: apache-2.0
pipeline_tag: text-generation
---
# InternLM
@ -23,7 +23,7 @@ license: apache-2.0
[![evaluation](https://github.com/InternLM/InternLM/assets/22529082/f80a2a58-5ddf-471a-8da4-32ab65c8fd3b)](https://github.com/internLM/OpenCompass/)
[💻Github Repo](https://github.com/InternLM/InternLM) • [🤔Reporting Issues](https://github.com/InternLM/InternLM/issues/new) • [📜Technical Report](https://arxiv.org/abs/2403.17297)
[💻Github Repo](https://github.com/InternLM/InternLM) • [🤗Demo](https://huggingface.co/spaces/internlm/internlm3-8b-instruct) • [🤔Reporting Issues](https://github.com/InternLM/InternLM/issues/new) • [📜Technical Report](https://arxiv.org/abs/2403.17297)
</div>
@ -48,25 +48,26 @@ InternLM3 supports both the deep thinking mode for solving complicated reasoning
We conducted a comprehensive evaluation of InternLM using the open-source evaluation tool [OpenCompass](https://github.com/internLM/OpenCompass/). The evaluation covered five dimensions of capabilities: disciplinary competence, language competence, knowledge competence, inference competence, and comprehension competence. Here are some of the evaluation results, and you can visit the [OpenCompass leaderboard](https://rank.opencompass.org.cn) for more evaluation results.
| Benchmark | | InternLM3-8B-Instruct | Qwen2.5-7B-Instruct | Llama3.1-8B-Instruct | GPT-4o-mini(close source) |
| ------------ | ------------------------------- | --------------------- | ------------------- | -------------------- | ------------------------- |
| General | CMMLU(0-shot) | **83.1** | 75.8 | 53.9 | 66.0 |
| | MMLU(0-shot) | 76.6 | **76.8** | 71.8 | 82.7 |
| | MMLU-Pro(0-shot) | **57.6** | 56.2 | 48.1 | 64.1 |
| Reasoning | GPQA-Diamond(0-shot) | **37.4** | 33.3 | 24.2 | 42.9 |
| | DROP(0-shot) | **83.1** | 80.4 | 81.6 | 85.2 |
| | HellaSwag(10-shot) | **91.2** | 85.3 | 76.7 | 89.5 |
| | KOR-Bench(0-shot) | **56.4** | 44.6 | 47.7 | 58.2 |
| MATH | MATH-500(0-shot) | **83.0*** | 72.4 | 48.4 | 74.0 |
| | AIME2024(0-shot) | **20.0*** | 16.7 | 6.7 | 13.3 |
| Coding | LiveCodeBench(2407-2409 Pass@1) | **17.8** | 16.8 | 12.9 | 21.8 |
| | HumanEval(Pass@1) | 82.3 | **85.4** | 72.0 | 86.6 |
| Instrunction | IFEval(Prompt-Strict) | **79.3** | 71.7 | 75.2 | 79.7 |
| Long Context | RULER(4-128K Average) | 87.9 | 81.4 | **88.5** | 90.7 |
| Chat | AlpacaEval 2.0(LC WinRate) | **51.1** | 30.3 | 25.0 | 50.7 |
| | WildBench(Raw Score) | **33.1** | 23.3 | 1.5 | 40.3 |
| | MT-Bench-101(Score 1-10) | **8.59** | 8.49 | 8.37 | 8.87 |
| | Benchmark | InternLM3-8B-Instruct | Qwen2.5-7B-Instruct | Llama3.1-8B-Instruct | GPT-4o-mini(closed source) |
| ------------ | ------------------------------- | --------------------- | ------------------- | -------------------- | -------------------------- |
| General | CMMLU(0-shot) | **83.1** | 75.8 | 53.9 | 66.0 |
| | MMLU(0-shot) | 76.6 | **76.8** | 71.8 | 82.7 |
| | MMLU-Pro(0-shot) | **57.6** | 56.2 | 48.1 | 64.1 |
| Reasoning | GPQA-Diamond(0-shot) | **37.4** | 33.3 | 24.2 | 42.9 |
| | DROP(0-shot) | **83.1** | 80.4 | 81.6 | 85.2 |
| | HellaSwag(10-shot) | **91.2** | 85.3 | 76.7 | 89.5 |
| | KOR-Bench(0-shot) | **56.4** | 44.6 | 47.7 | 58.2 |
| MATH | MATH-500(0-shot) | **83.0*** | 72.4 | 48.4 | 74.0 |
| | AIME2024(0-shot) | **20.0*** | 16.7 | 6.7 | 13.3 |
| Coding | LiveCodeBench(2407-2409 Pass@1) | **17.8** | 16.8 | 12.9 | 21.8 |
| | HumanEval(Pass@1) | 82.3 | **85.4** | 72.0 | 86.6 |
| Instrunction | IFEval(Prompt-Strict) | **79.3** | 71.7 | 75.2 | 79.7 |
| Long Context | RULER(4-128K Average) | 87.9 | 81.4 | **88.5** | 90.7 |
| Chat | AlpacaEval 2.0(LC WinRate) | **51.1** | 30.3 | 25.0 | 50.7 |
| | WildBench(Raw Score) | **33.1** | 23.3 | 1.5 | 40.3 |
| | MT-Bench-101(Score 1-10) | **8.59** | 8.49 | 8.37 | 8.87 |
- Values marked in bold indicate the **highest** in open source models
- The evaluation results were obtained from [OpenCompass](https://github.com/internLM/OpenCompass/) (some data marked with *, which means evaluating with Thinking Mode), and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/internLM/OpenCompass/).
- The evaluation data may have numerical differences due to the version iteration of [OpenCompass](https://github.com/internLM/OpenCompass/), so please refer to the latest evaluation results of [OpenCompass](https://github.com/internLM/OpenCompass/).
@ -85,8 +86,9 @@ To load the InternLM3 8B Instruct model using Transformers, use the following co
```python
import torch
from modelscope import snapshot_download, AutoTokenizer, AutoModelForCausalLM
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm3-8b-instruct')
from transformers import AutoTokenizer, AutoModelForCausalLM
model_dir = "internlm/internlm3-8b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error.
model = AutoModelForCausalLM.from_pretrained(model_dir, trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
@ -161,6 +163,7 @@ Find more details in the [LMDeploy documentation](https://lmdeploy.readthedocs.i
#### Ollama inference
First install ollama,
```python
# install ollama
curl -fsSL https://ollama.com/install.sh | sh
@ -199,17 +202,14 @@ stream = ollama.chat(
for chunk in stream:
print(chunk['message']['content'], end='', flush=True)
```
#### vLLM inference
We are still working on merging the PR(https://github.com/vllm-project/vllm/pull/12037) into vLLM. In the meantime, please use the following PR link to install it manually.
Refer to [installation](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) to install the latest code of vllm
```python
git clone -b support-internlm3 https://github.com/RunningLeon/vllm.git
# and then follow https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source to install
cd vllm
python use_existing_torch.py
pip install -r requirements-build.txt
pip install -e . --no-build-isolatio
pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
```
inference code:
@ -306,8 +306,9 @@ Focus on clear, logical progression of ideas and thorough explanation of your ma
#### Transformers inference
```python
import torch
from modelscope import snapshot_download, AutoTokenizer, AutoModelForCausalLM
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm3-8b-instruct')
from transformers import AutoTokenizer, AutoModelForCausalLM
model_dir = "internlm/internlm3-8b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error.
model = AutoModelForCausalLM.from_pretrained(model_dir, trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
@ -403,14 +404,10 @@ for chunk in stream:
#### vLLM inference
We are still working on merging the PR(https://github.com/vllm-project/vllm/pull/12037) into vLLM. In the meantime, please use the following PR link to install it manually.
Refer to [installation](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) to install the latest code of vllm
```python
git clone https://github.com/RunningLeon/vllm.git
# and then follow https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source to install
cd vllm
python use_existing_torch.py
pip install -r requirements-build.txt
pip install -e . --no-build-isolatio
pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
```
inference code
@ -474,25 +471,26 @@ InternLM3支持通过长思维链求解复杂推理任务的深度思考模式
我们使用开源评测工具 [OpenCompass](https://github.com/internLM/OpenCompass/) 从学科综合能力、语言能力、知识能力、推理能力、理解能力五大能力维度对InternLM开展全面评测部分评测结果如下表所示欢迎访问[ OpenCompass 榜单 ](https://rank.opencompass.org.cn)获取更多的评测结果。
| 评测集\模型 | | InternLM3-8B-Instruct | Qwen2.5-7B-Instruct | Llama3.1-8B-Instruct | GPT-4o-mini(close source) |
| ------------ | ------------------------------- | --------------------- | ------------------- | -------------------- | ------------------------- |
| General | CMMLU(0-shot) | **83.1** | 75.8 | 53.9 | 66.0 |
| | MMLU(0-shot) | 76.6 | **76.8** | 71.8 | 82.7 |
| | MMLU-Pro(0-shot) | **57.6** | 56.2 | 48.1 | 64.1 |
| Reasoning | GPQA-Diamond(0-shot) | **37.4** | 33.3 | 24.2 | 42.9 |
| | DROP(0-shot) | **83.1** | 80.4 | 81.6 | 85.2 |
| | HellaSwag(10-shot) | **91.2** | 85.3 | 76.7 | 89.5 |
| | KOR-Bench(0-shot) | **56.4** | 44.6 | 47.7 | 58.2 |
| MATH | MATH-500(0-shot) | **83.0*** | 72.4 | 48.4 | 74.0 |
| | AIME2024(0-shot) | **20.0*** | 16.7 | 6.7 | 13.3 |
| Coding | LiveCodeBench(2407-2409 Pass@1) | **17.8** | 16.8 | 12.9 | 21.8 |
| | HumanEval(Pass@1) | 82.3 | **85.4** | 72.0 | 86.6 |
| Instrunction | IFEval(Prompt-Strict) | **79.3** | 71.7 | 75.2 | 79.7 |
| LongContext | RULER(4-128K Average) | 87.9 | 81.4 | **88.5** | 90.7 |
| Chat | AlpacaEval 2.0(LC WinRate) | **51.1** | 30.3 | 25.0 | 50.7 |
| | WildBench(Raw Score) | **33.1** | 23.3 | 1.5 | 40.3 |
| | MT-Bench-101(Score 1-10) | **8.59** | 8.49 | 8.37 | 8.87 |
| | 评测集\模型 | InternLM3-8B-Instruct | Qwen2.5-7B-Instruct | Llama3.1-8B-Instruct | GPT-4o-mini(闭源) |
| ------------ | ------------------------------- | --------------------- | ------------------- | -------------------- | ----------------- |
| General | CMMLU(0-shot) | **83.1** | 75.8 | 53.9 | 66.0 |
| | MMLU(0-shot) | 76.6 | **76.8** | 71.8 | 82.7 |
| | MMLU-Pro(0-shot) | **57.6** | 56.2 | 48.1 | 64.1 |
| Reasoning | GPQA-Diamond(0-shot) | **37.4** | 33.3 | 24.2 | 42.9 |
| | DROP(0-shot) | **83.1** | 80.4 | 81.6 | 85.2 |
| | HellaSwag(10-shot) | **91.2** | 85.3 | 76.7 | 89.5 |
| | KOR-Bench(0-shot) | **56.4** | 44.6 | 47.7 | 58.2 |
| MATH | MATH-500(0-shot) | **83.0*** | 72.4 | 48.4 | 74.0 |
| | AIME2024(0-shot) | **20.0*** | 16.7 | 6.7 | 13.3 |
| Coding | LiveCodeBench(2407-2409 Pass@1) | **17.8** | 16.8 | 12.9 | 21.8 |
| | HumanEval(Pass@1) | 82.3 | **85.4** | 72.0 | 86.6 |
| Instrunction | IFEval(Prompt-Strict) | **79.3** | 71.7 | 75.2 | 79.7 |
| LongContext | RULER(4-128K Average) | 87.9 | 81.4 | **88.5** | 90.7 |
| Chat | AlpacaEval 2.0(LC WinRate) | **51.1** | 30.3 | 25.0 | 50.7 |
| | WildBench(Raw Score) | **33.1** | 23.3 | 1.5 | 40.3 |
| | MT-Bench-101(Score 1-10) | **8.59** | 8.49 | 8.37 | 8.87 |
- 表中标粗的数值表示在对比的开源模型中的最高值。
- 以上评测结果基于 [OpenCompass](https://github.com/internLM/OpenCompass/) 获得(部分数据标注`*`代表使用深度思考模式进行评测),具体测试细节可参见 [OpenCompass](https://github.com/internLM/OpenCompass/) 中提供的配置文件。
- 评测数据会因 [OpenCompass](https://github.com/internLM/OpenCompass/) 的版本迭代而存在数值差异,请以 [OpenCompass](https://github.com/internLM/OpenCompass/) 最新版的评测结果为主。
@ -515,8 +513,9 @@ transformers >= 4.48
```python
import torch
from modelscope import snapshot_download, AutoTokenizer, AutoModelForCausalLM
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm3-8b-instruct')
from transformers import AutoTokenizer, AutoModelForCausalLM
model_dir = "internlm/internlm3-8b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error.
model = AutoModelForCausalLM.from_pretrained(model_dir, trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
@ -634,17 +633,13 @@ for chunk in stream:
####
##### vLLM 推理
我们还在推动PR(https://github.com/vllm-project/vllm/pull/12037) 合入vllm现在请使用以下PR链接手动安装
参考[文档](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) 安装 vllm 最新代码
```python
git clone https://github.com/RunningLeon/vllm.git
# and then follow https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source to install
cd vllm
python use_existing_torch.py
pip install -r requirements-build.txt
pip install -e . --no-build-isolatio
```bash
pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
```
推理代码
@ -740,11 +735,12 @@ Focus on clear, logical progression of ideas and thorough explanation of your ma
```python
import torch
from modelscope import snapshot_download, AutoTokenizer, AutoModelForCausalLM
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm3-8b-instruct')
from transformers import AutoTokenizer, AutoModelForCausalLM
model_dir = "internlm/internlm3-8b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error.
model = AutoModelForCausalLM.from_pretrained(model_dir, trust_remote_code=True, torch_dtype=torch.float16).cuda()
model = AutoModelForCausalLM.from_pretrained(model_dir, trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
# (Optional) If on low resource devices, you can load model in 4-bit or 8-bit to further save GPU memory via bitsandbytes.
# InternLM3 8B in 4bit will cost nearly 8GB GPU memory.
# pip install -U bitsandbytes
@ -837,15 +833,10 @@ for chunk in stream:
##### vLLM 推理
我们还在推动PR(https://github.com/vllm-project/vllm/pull/12037) 合入vllm现在请使用以下PR链接手动安装
参考[文档](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) 安装 vllm 最新代码
```python
git clone https://github.com/RunningLeon/vllm.git
# and then follow https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source to install
cd vllm
python use_existing_torch.py
pip install -r requirements-build.txt
pip install -e . --no-build-isolatio
```bash
pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
```
推理代码
@ -895,5 +886,4 @@ print(outputs)
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
```