Updated max_tokens descriptions (#6751)

### What problem does this PR solve?

#6721 

### Type of change


- [x] Documentation Update
This commit is contained in:
writinwaters 2025-04-02 13:56:55 +08:00 committed by GitHub
parent fc02929946
commit 2471a6e115
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
6 changed files with 30 additions and 17 deletions

View File

@ -33,7 +33,7 @@ Click the dropdown menu of **Model** to show the model configuration window.
- **Model**: The chat model to use. - **Model**: The chat model to use.
- Ensure you set the chat model correctly on the **Model providers** page. - Ensure you set the chat model correctly on the **Model providers** page.
- You can use different models for different components to increase flexibility or improve overall performance. - You can use different models for different components to increase flexibility or improve overall performance.
- **Preset configurations**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. - **Freedom**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**.
This parameter has three options: This parameter has three options:
- **Improvise**: Produces more creative responses. - **Improvise**: Produces more creative responses.
- **Precise**: (Default) Produces more conservative responses. - **Precise**: (Default) Produces more conservative responses.
@ -52,9 +52,6 @@ Click the dropdown menu of **Model** to show the model configuration window.
- **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. - **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text.
- A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens. - A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens.
- Defaults to 0.7. - Defaults to 0.7.
- **Max tokens**: Sets the maximum length of the model's output, measured in the number of tokens.
- Defaults to 512.
- If disabled, you lift the maximum token limit, allowing the model to determine the number of tokens in its responses.
:::tip NOTE :::tip NOTE
- It is not necessary to stick with the same model for all components. If a specific model is not performing well for a particular task, consider using a different one. - It is not necessary to stick with the same model for all components. If a specific model is not performing well for a particular task, consider using a different one.

View File

@ -34,7 +34,7 @@ Click the dropdown menu of **Model** to show the model configuration window.
- **Model**: The chat model to use. - **Model**: The chat model to use.
- Ensure you set the chat model correctly on the **Model providers** page. - Ensure you set the chat model correctly on the **Model providers** page.
- You can use different models for different components to increase flexibility or improve overall performance. - You can use different models for different components to increase flexibility or improve overall performance.
- **Preset configurations**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. - **Freedom**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**.
This parameter has three options: This parameter has three options:
- **Improvise**: Produces more creative responses. - **Improvise**: Produces more creative responses.
- **Precise**: (Default) Produces more conservative responses. - **Precise**: (Default) Produces more conservative responses.
@ -53,9 +53,6 @@ Click the dropdown menu of **Model** to show the model configuration window.
- **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. - **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text.
- A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens. - A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens.
- Defaults to 0.7. - Defaults to 0.7.
- **Max tokens**: Sets the maximum length of the model's output, measured in the number of tokens.
- Defaults to 512.
- If disabled, you lift the maximum token limit, allowing the model to determine the number of tokens in its responses.
:::tip NOTE :::tip NOTE
- It is not necessary to stick with the same model for all components. If a specific model is not performing well for a particular task, consider using a different one. - It is not necessary to stick with the same model for all components. If a specific model is not performing well for a particular task, consider using a different one.

View File

@ -32,7 +32,7 @@ Click the dropdown menu of **Model** to show the model configuration window.
- **Model**: The chat model to use. - **Model**: The chat model to use.
- Ensure you set the chat model correctly on the **Model providers** page. - Ensure you set the chat model correctly on the **Model providers** page.
- You can use different models for different components to increase flexibility or improve overall performance. - You can use different models for different components to increase flexibility or improve overall performance.
- **Preset configurations**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. - **Freedom**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**.
This parameter has three options: This parameter has three options:
- **Improvise**: Produces more creative responses. - **Improvise**: Produces more creative responses.
- **Precise**: (Default) Produces more conservative responses. - **Precise**: (Default) Produces more conservative responses.
@ -51,9 +51,6 @@ Click the dropdown menu of **Model** to show the model configuration window.
- **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. - **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text.
- A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens. - A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens.
- Defaults to 0.7. - Defaults to 0.7.
- **Max tokens**: Sets the maximum length of the model's output, measured in the number of tokens.
- Defaults to 512.
- If disabled, you lift the maximum token limit, allowing the model to determine the number of tokens in its responses.
:::tip NOTE :::tip NOTE
- It is not necessary to stick with the same model for all components. If a specific model is not performing well for a particular task, consider using a different one. - It is not necessary to stick with the same model for all components. If a specific model is not performing well for a particular task, consider using a different one.

View File

@ -48,10 +48,25 @@ You start an AI conversation by creating an assistant.
4. Update **Model Setting**: 4. Update **Model Setting**:
- In **Model**: you select the chat model. Though you have selected the default chat model in **System Model Settings**, RAGFlow allows you to choose an alternative chat model for your dialogue. - In **Model**: you select the chat model. Though you have selected the default chat model in **System Model Settings**, RAGFlow allows you to choose an alternative chat model for your dialogue.
- **Preset configurations** refers to the level that the LLM improvises. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. - **Freedom**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**.
- **Temperature**: Level of the prediction randomness of the LLM. The higher the value, the more creative the LLM is. This parameter has three options:
- **Top P** is also known as "nucleus sampling". See [here](https://en.wikipedia.org/wiki/Top-p_sampling) for more information. - **Improvise**: Produces more creative responses.
- **Max Tokens**: The maximum length of the LLM's responses. Note that the responses may be curtailed if this value is set too low. - **Precise**: (Default) Produces more conservative responses.
- **Balance**: A middle ground between **Improvise** and **Precise**.
- **Temperature**: The randomness level of the model's output.
Defaults to 0.1.
- Lower values lead to more deterministic and predictable outputs.
- Higher values lead to more creative and varied outputs.
- A temperature of zero results in the same output for the same prompt.
- **Top P**: Nucleus sampling.
- Reduces the likelihood of generating repetitive or unnatural text by setting a threshold *P* and restricting the sampling to tokens with a cumulative probability exceeding *P*.
- Defaults to 0.3.
- **Presence penalty**: Encourages the model to include a more diverse range of tokens in the response.
- A higher **presence penalty** value results in the model being more likely to generate tokens not yet been included in the generated text.
- Defaults to 0.4.
- **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text.
- A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens.
- Defaults to 0.7.
5. Now, let's start the show: 5. Now, let's start the show:

View File

@ -39,4 +39,4 @@ _After accepting the team invite, you should be able to view and update the team
## Leave a joined team ## Leave a joined team
![Image](https://github.com/user-attachments/assets/4e4c6971-131b-490b-85d8-b362e0811b86) ![quit](https://github.com/user-attachments/assets/a9d812a9-382d-4913-83b9-d72cb5e7c953)

View File

@ -11,6 +11,13 @@ Key features, improvements and bug fixes in the latest releases.
Released on March 13, 2025. Released on March 13, 2025.
### Compatibility changes
- Removes the **Max_tokens** setting from **Chat configuration**.
- Removes the **Max_tokens** setting from **Generate**, **Rewrite**, **Categorize**, **Keyword** agent components.
From this release onwards, if you still see RAGFlow's responses being cut short or truncated, check the **Max_tokens** setting of your model provider.
### Improvements ### Improvements
- Adds OpenAI-compatible APIs. - Adds OpenAI-compatible APIs.