From 03ffaab065ed738d400302ed1142e2f90704f1a1 Mon Sep 17 00:00:00 2001 From: ai-modelscope Date: Wed, 26 Feb 2025 21:06:53 +0800 Subject: [PATCH] Update modeling_internlm3.py (#18) - Update modeling_internlm3.py (94cd46f35e87e1b3b2b82df73230bdb5275cd652) - Update tokenization_internlm3.py (0f3d7019880c0b6f7a9d35b392d21cbfca07478b) --- README.md | 148 +++++++++++++++++++++++++----------------------------- 1 file changed, 69 insertions(+), 79 deletions(-) diff --git a/README.md b/README.md index 10227fb..58ca276 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ --- license: apache-2.0 +pipeline_tag: text-generation --- - # InternLM @@ -23,7 +23,7 @@ license: apache-2.0 [![evaluation](https://github.com/InternLM/InternLM/assets/22529082/f80a2a58-5ddf-471a-8da4-32ab65c8fd3b)](https://github.com/internLM/OpenCompass/) -[💻Github Repo](https://github.com/InternLM/InternLM) • [🤔Reporting Issues](https://github.com/InternLM/InternLM/issues/new) • [📜Technical Report](https://arxiv.org/abs/2403.17297) +[💻Github Repo](https://github.com/InternLM/InternLM) • [🤗Demo](https://huggingface.co/spaces/internlm/internlm3-8b-instruct) • [🤔Reporting Issues](https://github.com/InternLM/InternLM/issues/new) • [📜Technical Report](https://arxiv.org/abs/2403.17297) @@ -48,25 +48,26 @@ InternLM3 supports both the deep thinking mode for solving complicated reasoning We conducted a comprehensive evaluation of InternLM using the open-source evaluation tool [OpenCompass](https://github.com/internLM/OpenCompass/). The evaluation covered five dimensions of capabilities: disciplinary competence, language competence, knowledge competence, inference competence, and comprehension competence. Here are some of the evaluation results, and you can visit the [OpenCompass leaderboard](https://rank.opencompass.org.cn) for more evaluation results. -| Benchmark | | InternLM3-8B-Instruct | Qwen2.5-7B-Instruct | Llama3.1-8B-Instruct | GPT-4o-mini(close source) | -| ------------ | ------------------------------- | --------------------- | ------------------- | -------------------- | ------------------------- | -| General | CMMLU(0-shot) | **83.1** | 75.8 | 53.9 | 66.0 | -| | MMLU(0-shot) | 76.6 | **76.8** | 71.8 | 82.7 | -| | MMLU-Pro(0-shot) | **57.6** | 56.2 | 48.1 | 64.1 | -| Reasoning | GPQA-Diamond(0-shot) | **37.4** | 33.3 | 24.2 | 42.9 | -| | DROP(0-shot) | **83.1** | 80.4 | 81.6 | 85.2 | -| | HellaSwag(10-shot) | **91.2** | 85.3 | 76.7 | 89.5 | -| | KOR-Bench(0-shot) | **56.4** | 44.6 | 47.7 | 58.2 | -| MATH | MATH-500(0-shot) | **83.0*** | 72.4 | 48.4 | 74.0 | -| | AIME2024(0-shot) | **20.0*** | 16.7 | 6.7 | 13.3 | -| Coding | LiveCodeBench(2407-2409 Pass@1) | **17.8** | 16.8 | 12.9 | 21.8 | -| | HumanEval(Pass@1) | 82.3 | **85.4** | 72.0 | 86.6 | -| Instrunction | IFEval(Prompt-Strict) | **79.3** | 71.7 | 75.2 | 79.7 | -| Long Context | RULER(4-128K Average) | 87.9 | 81.4 | **88.5** | 90.7 | -| Chat | AlpacaEval 2.0(LC WinRate) | **51.1** | 30.3 | 25.0 | 50.7 | -| | WildBench(Raw Score) | **33.1** | 23.3 | 1.5 | 40.3 | -| | MT-Bench-101(Score 1-10) | **8.59** | 8.49 | 8.37 | 8.87 | +| | Benchmark | InternLM3-8B-Instruct | Qwen2.5-7B-Instruct | Llama3.1-8B-Instruct | GPT-4o-mini(closed source) | +| ------------ | ------------------------------- | --------------------- | ------------------- | -------------------- | -------------------------- | +| General | CMMLU(0-shot) | **83.1** | 75.8 | 53.9 | 66.0 | +| | MMLU(0-shot) | 76.6 | **76.8** | 71.8 | 82.7 | +| | MMLU-Pro(0-shot) | **57.6** | 56.2 | 48.1 | 64.1 | +| Reasoning | GPQA-Diamond(0-shot) | **37.4** | 33.3 | 24.2 | 42.9 | +| | DROP(0-shot) | **83.1** | 80.4 | 81.6 | 85.2 | +| | HellaSwag(10-shot) | **91.2** | 85.3 | 76.7 | 89.5 | +| | KOR-Bench(0-shot) | **56.4** | 44.6 | 47.7 | 58.2 | +| MATH | MATH-500(0-shot) | **83.0*** | 72.4 | 48.4 | 74.0 | +| | AIME2024(0-shot) | **20.0*** | 16.7 | 6.7 | 13.3 | +| Coding | LiveCodeBench(2407-2409 Pass@1) | **17.8** | 16.8 | 12.9 | 21.8 | +| | HumanEval(Pass@1) | 82.3 | **85.4** | 72.0 | 86.6 | +| Instrunction | IFEval(Prompt-Strict) | **79.3** | 71.7 | 75.2 | 79.7 | +| Long Context | RULER(4-128K Average) | 87.9 | 81.4 | **88.5** | 90.7 | +| Chat | AlpacaEval 2.0(LC WinRate) | **51.1** | 30.3 | 25.0 | 50.7 | +| | WildBench(Raw Score) | **33.1** | 23.3 | 1.5 | 40.3 | +| | MT-Bench-101(Score 1-10) | **8.59** | 8.49 | 8.37 | 8.87 | +- Values marked in bold indicate the **highest** in open source models - The evaluation results were obtained from [OpenCompass](https://github.com/internLM/OpenCompass/) (some data marked with *, which means evaluating with Thinking Mode), and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/internLM/OpenCompass/). - The evaluation data may have numerical differences due to the version iteration of [OpenCompass](https://github.com/internLM/OpenCompass/), so please refer to the latest evaluation results of [OpenCompass](https://github.com/internLM/OpenCompass/). @@ -85,8 +86,9 @@ To load the InternLM3 8B Instruct model using Transformers, use the following co ```python import torch -from modelscope import snapshot_download, AutoTokenizer, AutoModelForCausalLM -model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm3-8b-instruct') +from transformers import AutoTokenizer, AutoModelForCausalLM + +model_dir = "internlm/internlm3-8b-instruct" tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True) # Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error. model = AutoModelForCausalLM.from_pretrained(model_dir, trust_remote_code=True, torch_dtype=torch.bfloat16).cuda() @@ -161,6 +163,7 @@ Find more details in the [LMDeploy documentation](https://lmdeploy.readthedocs.i #### Ollama inference First install ollama, + ```python # install ollama curl -fsSL https://ollama.com/install.sh | sh @@ -199,17 +202,14 @@ stream = ollama.chat( for chunk in stream: print(chunk['message']['content'], end='', flush=True) ``` + + #### vLLM inference -We are still working on merging the PR(https://github.com/vllm-project/vllm/pull/12037) into vLLM. In the meantime, please use the following PR link to install it manually. +Refer to [installation](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) to install the latest code of vllm ```python -git clone -b support-internlm3 https://github.com/RunningLeon/vllm.git -# and then follow https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source to install -cd vllm -python use_existing_torch.py -pip install -r requirements-build.txt -pip install -e . --no-build-isolatio +pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly ``` inference code: @@ -306,8 +306,9 @@ Focus on clear, logical progression of ideas and thorough explanation of your ma #### Transformers inference ```python import torch -from modelscope import snapshot_download, AutoTokenizer, AutoModelForCausalLM -model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm3-8b-instruct') +from transformers import AutoTokenizer, AutoModelForCausalLM + +model_dir = "internlm/internlm3-8b-instruct" tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True) # Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error. model = AutoModelForCausalLM.from_pretrained(model_dir, trust_remote_code=True, torch_dtype=torch.bfloat16).cuda() @@ -403,14 +404,10 @@ for chunk in stream: #### vLLM inference -We are still working on merging the PR(https://github.com/vllm-project/vllm/pull/12037) into vLLM. In the meantime, please use the following PR link to install it manually. +Refer to [installation](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) to install the latest code of vllm + ```python -git clone https://github.com/RunningLeon/vllm.git -# and then follow https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source to install -cd vllm -python use_existing_torch.py -pip install -r requirements-build.txt -pip install -e . --no-build-isolatio +pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly ``` inference code @@ -474,25 +471,26 @@ InternLM3支持通过长思维链求解复杂推理任务的深度思考模式 我们使用开源评测工具 [OpenCompass](https://github.com/internLM/OpenCompass/) 从学科综合能力、语言能力、知识能力、推理能力、理解能力五大能力维度对InternLM开展全面评测,部分评测结果如下表所示,欢迎访问[ OpenCompass 榜单 ](https://rank.opencompass.org.cn)获取更多的评测结果。 -| 评测集\模型 | | InternLM3-8B-Instruct | Qwen2.5-7B-Instruct | Llama3.1-8B-Instruct | GPT-4o-mini(close source) | -| ------------ | ------------------------------- | --------------------- | ------------------- | -------------------- | ------------------------- | -| General | CMMLU(0-shot) | **83.1** | 75.8 | 53.9 | 66.0 | -| | MMLU(0-shot) | 76.6 | **76.8** | 71.8 | 82.7 | -| | MMLU-Pro(0-shot) | **57.6** | 56.2 | 48.1 | 64.1 | -| Reasoning | GPQA-Diamond(0-shot) | **37.4** | 33.3 | 24.2 | 42.9 | -| | DROP(0-shot) | **83.1** | 80.4 | 81.6 | 85.2 | -| | HellaSwag(10-shot) | **91.2** | 85.3 | 76.7 | 89.5 | -| | KOR-Bench(0-shot) | **56.4** | 44.6 | 47.7 | 58.2 | -| MATH | MATH-500(0-shot) | **83.0*** | 72.4 | 48.4 | 74.0 | -| | AIME2024(0-shot) | **20.0*** | 16.7 | 6.7 | 13.3 | -| Coding | LiveCodeBench(2407-2409 Pass@1) | **17.8** | 16.8 | 12.9 | 21.8 | -| | HumanEval(Pass@1) | 82.3 | **85.4** | 72.0 | 86.6 | -| Instrunction | IFEval(Prompt-Strict) | **79.3** | 71.7 | 75.2 | 79.7 | -| LongContext | RULER(4-128K Average) | 87.9 | 81.4 | **88.5** | 90.7 | -| Chat | AlpacaEval 2.0(LC WinRate) | **51.1** | 30.3 | 25.0 | 50.7 | -| | WildBench(Raw Score) | **33.1** | 23.3 | 1.5 | 40.3 | -| | MT-Bench-101(Score 1-10) | **8.59** | 8.49 | 8.37 | 8.87 | +| | 评测集\模型 | InternLM3-8B-Instruct | Qwen2.5-7B-Instruct | Llama3.1-8B-Instruct | GPT-4o-mini(闭源) | +| ------------ | ------------------------------- | --------------------- | ------------------- | -------------------- | ----------------- | +| General | CMMLU(0-shot) | **83.1** | 75.8 | 53.9 | 66.0 | +| | MMLU(0-shot) | 76.6 | **76.8** | 71.8 | 82.7 | +| | MMLU-Pro(0-shot) | **57.6** | 56.2 | 48.1 | 64.1 | +| Reasoning | GPQA-Diamond(0-shot) | **37.4** | 33.3 | 24.2 | 42.9 | +| | DROP(0-shot) | **83.1** | 80.4 | 81.6 | 85.2 | +| | HellaSwag(10-shot) | **91.2** | 85.3 | 76.7 | 89.5 | +| | KOR-Bench(0-shot) | **56.4** | 44.6 | 47.7 | 58.2 | +| MATH | MATH-500(0-shot) | **83.0*** | 72.4 | 48.4 | 74.0 | +| | AIME2024(0-shot) | **20.0*** | 16.7 | 6.7 | 13.3 | +| Coding | LiveCodeBench(2407-2409 Pass@1) | **17.8** | 16.8 | 12.9 | 21.8 | +| | HumanEval(Pass@1) | 82.3 | **85.4** | 72.0 | 86.6 | +| Instrunction | IFEval(Prompt-Strict) | **79.3** | 71.7 | 75.2 | 79.7 | +| LongContext | RULER(4-128K Average) | 87.9 | 81.4 | **88.5** | 90.7 | +| Chat | AlpacaEval 2.0(LC WinRate) | **51.1** | 30.3 | 25.0 | 50.7 | +| | WildBench(Raw Score) | **33.1** | 23.3 | 1.5 | 40.3 | +| | MT-Bench-101(Score 1-10) | **8.59** | 8.49 | 8.37 | 8.87 | +- 表中标粗的数值表示在对比的开源模型中的最高值。 - 以上评测结果基于 [OpenCompass](https://github.com/internLM/OpenCompass/) 获得(部分数据标注`*`代表使用深度思考模式进行评测),具体测试细节可参见 [OpenCompass](https://github.com/internLM/OpenCompass/) 中提供的配置文件。 - 评测数据会因 [OpenCompass](https://github.com/internLM/OpenCompass/) 的版本迭代而存在数值差异,请以 [OpenCompass](https://github.com/internLM/OpenCompass/) 最新版的评测结果为主。 @@ -515,8 +513,9 @@ transformers >= 4.48 ```python import torch -from modelscope import snapshot_download, AutoTokenizer, AutoModelForCausalLM -model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm3-8b-instruct') +from transformers import AutoTokenizer, AutoModelForCausalLM + +model_dir = "internlm/internlm3-8b-instruct" tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True) # Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error. model = AutoModelForCausalLM.from_pretrained(model_dir, trust_remote_code=True, torch_dtype=torch.bfloat16).cuda() @@ -634,17 +633,13 @@ for chunk in stream: #### + ##### vLLM 推理 -我们还在推动PR(https://github.com/vllm-project/vllm/pull/12037) 合入vllm,现在请使用以下PR链接手动安装 +参考[文档](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) 安装 vllm 最新代码 -```python -git clone https://github.com/RunningLeon/vllm.git -# and then follow https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source to install -cd vllm -python use_existing_torch.py -pip install -r requirements-build.txt -pip install -e . --no-build-isolatio +```bash +pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly ``` 推理代码 @@ -740,11 +735,12 @@ Focus on clear, logical progression of ideas and thorough explanation of your ma ```python import torch -from modelscope import snapshot_download, AutoTokenizer, AutoModelForCausalLM -model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm3-8b-instruct') +from transformers import AutoTokenizer, AutoModelForCausalLM + +model_dir = "internlm/internlm3-8b-instruct" tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True) # Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error. -model = AutoModelForCausalLM.from_pretrained(model_dir, trust_remote_code=True, torch_dtype=torch.float16).cuda() +model = AutoModelForCausalLM.from_pretrained(model_dir, trust_remote_code=True, torch_dtype=torch.bfloat16).cuda() # (Optional) If on low resource devices, you can load model in 4-bit or 8-bit to further save GPU memory via bitsandbytes. # InternLM3 8B in 4bit will cost nearly 8GB GPU memory. # pip install -U bitsandbytes @@ -837,15 +833,10 @@ for chunk in stream: ##### vLLM 推理 -我们还在推动PR(https://github.com/vllm-project/vllm/pull/12037) 合入vllm,现在请使用以下PR链接手动安装 +参考[文档](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) 安装 vllm 最新代码 -```python -git clone https://github.com/RunningLeon/vllm.git -# and then follow https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source to install -cd vllm -python use_existing_torch.py -pip install -r requirements-build.txt -pip install -e . --no-build-isolatio +```bash +pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly ``` 推理代码 @@ -895,5 +886,4 @@ print(outputs) archivePrefix={arXiv}, primaryClass={cs.CL} } -``` - +``` \ No newline at end of file