DOC: Miscellaneous UI and editorial updates (#7324)

### What problem does this PR solve?



### Type of change


- [x] Documentation Update
This commit is contained in:
writinwaters 2025-04-27 11:44:08 +08:00 committed by GitHub
parent 3da8776a3c
commit dadd8d9f94
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
16 changed files with 97 additions and 68 deletions

View File

@ -99,7 +99,7 @@ def update():
if req.get("parser_id", "") == "tag" and os.environ.get('DOC_ENGINE', "elasticsearch") == "infinity": if req.get("parser_id", "") == "tag" and os.environ.get('DOC_ENGINE', "elasticsearch") == "infinity":
return get_json_result( return get_json_result(
data=False, data=False,
message='The chunk method Tag has not been supported by Infinity yet.', message='The chunking method Tag has not been supported by Infinity yet.',
code=settings.RetCode.OPERATING_ERROR code=settings.RetCode.OPERATING_ERROR
) )

View File

@ -26,6 +26,38 @@ The "garbage in garbage out" status quo remains unchanged despite the fact that
--- ---
### Differences between RAGFlow full edition and RAGFlow slim edition?
Each RAGFlow release is available in two editions:
- **Slim edition**: excludes built-in embedding models and is identified by a **-slim** suffix added to the version name. Example: `infiniflow/ragflow:v0.18.0-slim`
- **Full edition**: includes built-in embedding models and has no suffix added to the version name. Example: `infiniflow/ragflow:v0.18.0`
---
### Which embedding models can be deployed locally?
RAGFlow offers two Docker image editions, `v0.18.0-slim` and `v0.18.0`:
- `infiniflow/ragflow:v0.18.0-slim` (default): The RAGFlow Docker image without embedding models.
- `infiniflow/ragflow:v0.18.0`: The RAGFlow Docker image with embedding models including:
- Built-in embedding models:
- `BAAI/bge-large-zh-v1.5`
- `BAAI/bge-reranker-v2-m3`
- `maidalun1020/bce-embedding-base_v1`
- `maidalun1020/bce-reranker-base_v1`
- Embedding models that will be downloaded once you select them in the RAGFlow UI:
- `BAAI/bge-base-en-v1.5`
- `BAAI/bge-large-en-v1.5`
- `BAAI/bge-small-en-v1.5`
- `BAAI/bge-small-zh-v1.5`
- `jinaai/jina-embeddings-v2-base-en`
- `jinaai/jina-embeddings-v2-small-en`
- `nomic-ai/nomic-embed-text-v1.5`
- `sentence-transformers/all-MiniLM-L6-v2`
---
### Where to find the version of RAGFlow? How to interpret it? ### Where to find the version of RAGFlow? How to interpret it?
You can find the RAGFlow version number on the **System** page of the UI: You can find the RAGFlow version number on the **System** page of the UI:
@ -55,6 +87,14 @@ Where:
--- ---
### Differences between demo.ragflow.io and a locally deployed open-source RAGFlow service?
demo.ragflow.io demonstrates the capabilities of RAGFlow Enterprise. Its DeepDoc models are pre-trained using proprietary data and it offers much more sophisticated team permission controls. Essentially, demo.ragflow.io serves as a preview of RAGFlow's forthcoming SaaS (Software as a Service) offering.
You can deploy an open-source RAGFlow service and call it from a Python client or through RESTful APIs. However, this is not supported on demo.ragflow.io.
---
### Why does it take longer for RAGFlow to parse a document than LangChain? ### Why does it take longer for RAGFlow to parse a document than LangChain?
We put painstaking effort into document pre-processing tasks like layout analysis, table structure recognition, and OCR (Optical Character Recognition) using our vision models. This contributes to the additional time required. We put painstaking effort into document pre-processing tasks like layout analysis, table structure recognition, and OCR (Optical Character Recognition) using our vision models. This contributes to the additional time required.
@ -73,29 +113,6 @@ We officially support x86 CPU and nvidia GPU. While we also test RAGFlow on ARM6
--- ---
### Which embedding models can be deployed locally?
RAGFlow offers two Docker image editions, `v0.18.0-slim` and `v0.18.0`:
- `infiniflow/ragflow:v0.18.0-slim` (default): The RAGFlow Docker image without embedding models.
- `infiniflow/ragflow:v0.18.0`: The RAGFlow Docker image with embedding models including:
- Built-in embedding models:
- `BAAI/bge-large-zh-v1.5`
- `BAAI/bge-reranker-v2-m3`
- `maidalun1020/bce-embedding-base_v1`
- `maidalun1020/bce-reranker-base_v1`
- Embedding models that will be downloaded once you select them in the RAGFlow UI:
- `BAAI/bge-base-en-v1.5`
- `BAAI/bge-large-en-v1.5`
- `BAAI/bge-small-en-v1.5`
- `BAAI/bge-small-zh-v1.5`
- `jinaai/jina-embeddings-v2-base-en`
- `jinaai/jina-embeddings-v2-small-en`
- `nomic-ai/nomic-embed-text-v1.5`
- `sentence-transformers/all-MiniLM-L6-v2`
---
### Do you offer an API for integration with third-party applications? ### Do you offer an API for integration with third-party applications?
The corresponding APIs are now available. See the [RAGFlow HTTP API Reference](./references/http_api_reference.md) or the [RAGFlow Python API Reference](./references/python_api_reference.md) for more information. The corresponding APIs are now available. See the [RAGFlow HTTP API Reference](./references/http_api_reference.md) or the [RAGFlow Python API Reference](./references/python_api_reference.md) for more information.

View File

@ -71,7 +71,7 @@ As mentioned earlier, the **Begin** component is indispensable for an agent. Sti
### Is the uploaded file in a knowledge base? ### Is the uploaded file in a knowledge base?
No. Files uploaded to an agent as input are not stored in a knowledge base and hence will not be processed using RAGFlow's built-in OCR, DLR or TSR models, or chunked using RAGFlow's built-in chunk methods. No. Files uploaded to an agent as input are not stored in a knowledge base and hence will not be processed using RAGFlow's built-in OCR, DLR or TSR models, or chunked using RAGFlow's built-in chunking methods.
### How to upload a webpage or file from a URL? ### How to upload a webpage or file from a URL?

View File

@ -22,22 +22,22 @@ _Each time a knowledge base is created, a folder with the same name is generated
## Configure knowledge base ## Configure knowledge base
The following screenshot shows the configuration page of a knowledge base. A proper configuration of your knowledge base is crucial for future AI chats. For example, choosing the wrong embedding model or chunk method would cause unexpected semantic loss or mismatched answers in chats. The following screenshot shows the configuration page of a knowledge base. A proper configuration of your knowledge base is crucial for future AI chats. For example, choosing the wrong embedding model or chunking method would cause unexpected semantic loss or mismatched answers in chats.
![knowledge base configuration](https://github.com/infiniflow/ragflow/assets/93570324/384c671a-8b9c-468c-b1c9-1401128a9b65) ![knowledge base configuration](https://github.com/infiniflow/ragflow/assets/93570324/384c671a-8b9c-468c-b1c9-1401128a9b65)
This section covers the following topics: This section covers the following topics:
- Select chunk method - Select chunking method
- Select embedding model - Select embedding model
- Upload file - Upload file
- Parse file - Parse file
- Intervene with file parsing results - Intervene with file parsing results
- Run retrieval testing - Run retrieval testing
### Select chunk method ### Select chunking method
RAGFlow offers multiple chunking template to facilitate chunking files of different layouts and ensure semantic integrity. In **Chunk method**, you can choose the default template that suits the layouts and formats of your files. The following table shows the descriptions and the compatible file formats of each supported chunk template: RAGFlow offers multiple chunking template to facilitate chunking files of different layouts and ensure semantic integrity. In **Chunking method**, you can choose the default template that suits the layouts and formats of your files. The following table shows the descriptions and the compatible file formats of each supported chunk template:
| **Template** | Description | File format | | **Template** | Description | File format |
|--------------|-----------------------------------------------------------------------|-----------------------------------------------------------------------------------------------| |--------------|-----------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|
@ -54,9 +54,9 @@ RAGFlow offers multiple chunking template to facilitate chunking files of differ
| One | Each document is chunked in its entirety (as one). | DOCX, XLSX, XLS (Excel97~2003), PDF, TXT | | One | Each document is chunked in its entirety (as one). | DOCX, XLSX, XLS (Excel97~2003), PDF, TXT |
| Tag | The knowledge base functions as a tag set for the others. | XLSX, CSV/TXT | | Tag | The knowledge base functions as a tag set for the others. | XLSX, CSV/TXT |
You can also change a file's chunk method on the **Datasets** page. You can also change a file's chunking method on the **Datasets** page.
![change chunk method](https://github.com/infiniflow/ragflow/assets/93570324/ac116353-2793-42b2-b181-65e7082bed42) ![change chunking method](https://github.com/infiniflow/ragflow/assets/93570324/ac116353-2793-42b2-b181-65e7082bed42)
### Select embedding model ### Select embedding model
@ -76,13 +76,13 @@ While uploading files directly to a knowledge base seems more convenient, we *hi
### Parse file ### Parse file
File parsing is a crucial topic in knowledge base configuration. The meaning of file parsing in RAGFlow is twofold: chunking files based on file layout and building embedding and full-text (keyword) indexes on these chunks. After having selected the chunk method and embedding model, you can start parsing a file: File parsing is a crucial topic in knowledge base configuration. The meaning of file parsing in RAGFlow is twofold: chunking files based on file layout and building embedding and full-text (keyword) indexes on these chunks. After having selected the chunking method and embedding model, you can start parsing a file:
![parse file](https://github.com/infiniflow/ragflow/assets/93570324/5311f166-6426-447f-aa1f-bd488f1cfc7b) ![parse file](https://github.com/infiniflow/ragflow/assets/93570324/5311f166-6426-447f-aa1f-bd488f1cfc7b)
- Click the play button next to **UNSTART** to start file parsing. - Click the play button next to **UNSTART** to start file parsing.
- Click the red-cross icon and then refresh, if your file parsing stalls for a long time. - Click the red-cross icon and then refresh, if your file parsing stalls for a long time.
- As shown above, RAGFlow allows you to use a different chunk method for a particular file, offering flexibility beyond the default method. - As shown above, RAGFlow allows you to use a different chunking method for a particular file, offering flexibility beyond the default method.
- As shown above, RAGFlow allows you to enable or disable individual files, offering finer control over knowledge base-based AI chats. - As shown above, RAGFlow allows you to enable or disable individual files, offering finer control over knowledge base-based AI chats.
### Intervene with file parsing results ### Intervene with file parsing results

View File

@ -9,7 +9,7 @@ Generate a knowledge graph for your knowledge base.
--- ---
To enhance multi-hop question-answering, RAGFlow adds a knowledge graph construction step between data extraction and indexing, as illustrated below. This step creates additional chunks from existing ones generated by your specified chunk method. To enhance multi-hop question-answering, RAGFlow adds a knowledge graph construction step between data extraction and indexing, as illustrated below. This step creates additional chunks from existing ones generated by your specified chunking method.
![Image](https://github.com/user-attachments/assets/1ec21d8e-f255-4d65-9918-69b72dfa142b) ![Image](https://github.com/user-attachments/assets/1ec21d8e-f255-4d65-9918-69b72dfa142b)

View File

@ -67,8 +67,8 @@ It defaults to 0.1, with a maximum limit of 1. A higher **Threshold** means fewe
### Max cluster ### Max cluster
The maximum number of clusters to create. Defaults to 108, with a maximum limit of 1024. The maximum number of clusters to create. Defaults to 64, with a maximum limit of 1024.
### Random seed ### Random seed
A random seed. Click the **+** button to change the seed value. A random seed. Click **+** to change the seed value.

View File

@ -11,7 +11,7 @@ Conduct a retrieval test on your knowledge base to check whether the intended ch
After your files are uploaded and parsed, it is recommended that you run a retrieval test before proceeding with the chat assistant configuration. Running a retrieval test is *not* an unnecessary or superfluous step at all! Just like fine-tuning a precision instrument, RAGFlow requires careful tuning to deliver optimal question answering performance. Your knowledge base settings, chat assistant configurations, and the specified large and small models can all significantly impact the final results. Running a retrieval test verifies whether the intended chunks can be recovered, allowing you to quickly identify areas for improvement or pinpoint any issue that needs addressing. For instance, when debugging your question answering system, if you know that the correct chunks can be retrieved, you can focus your efforts elsewhere. For example, in issue [#5627](https://github.com/infiniflow/ragflow/issues/5627), the problem was found to be due to the LLM's limitations. After your files are uploaded and parsed, it is recommended that you run a retrieval test before proceeding with the chat assistant configuration. Running a retrieval test is *not* an unnecessary or superfluous step at all! Just like fine-tuning a precision instrument, RAGFlow requires careful tuning to deliver optimal question answering performance. Your knowledge base settings, chat assistant configurations, and the specified large and small models can all significantly impact the final results. Running a retrieval test verifies whether the intended chunks can be recovered, allowing you to quickly identify areas for improvement or pinpoint any issue that needs addressing. For instance, when debugging your question answering system, if you know that the correct chunks can be retrieved, you can focus your efforts elsewhere. For example, in issue [#5627](https://github.com/infiniflow/ragflow/issues/5627), the problem was found to be due to the LLM's limitations.
During a retrieval test, chunks created from your specified chunk method are retrieved using a hybrid search. This search combines weighted keyword similarity with either weighted vector cosine similarity or a weighted reranking score, depending on your settings: During a retrieval test, chunks created from your specified chunking method are retrieved using a hybrid search. This search combines weighted keyword similarity with either weighted vector cosine similarity or a weighted reranking score, depending on your settings:
- If no rerank model is selected, weighted keyword similarity will be combined with weighted vector cosine similarity. - If no rerank model is selected, weighted keyword similarity will be combined with weighted vector cosine similarity.
- If a rerank model is selected, weighted keyword similarity will be combined with weighted vector reranking score. - If a rerank model is selected, weighted keyword similarity will be combined with weighted vector reranking score.

View File

@ -32,7 +32,7 @@ The page rank value must be an integer. Range: [0,100]
If you set the page rank value to a non-integer, say 1.7, it will be rounded down to the nearest integer, which in this case is 1. If you set the page rank value to a non-integer, say 1.7, it will be rounded down to the nearest integer, which in this case is 1.
::: :::
## Mechanism ## Scoring mechanism
If you configure a chat assistant's **similarity threshold** to 0.2, only chunks with a hybrid score greater than 0.2 x 100 = 20 will be retrieved and sent to the chat model for content generation. This initial filtering step is crucial for narrowing down relevant information. If you configure a chat assistant's **similarity threshold** to 0.2, only chunks with a hybrid score greater than 0.2 x 100 = 20 will be retrieved and sent to the chat model for content generation. This initial filtering step is crucial for narrowing down relevant information.

View File

@ -42,7 +42,7 @@ As a rule of thumb, consider including the following entries in your tag table:
### Create a tag set ### Create a tag set
1. Click **+ Create knowledge base** to create a knowledge base. 1. Click **+ Create knowledge base** to create a knowledge base.
2. Navigate to the **Configuration** page of the created knowledge base and choose **Tag** as the default chunk method. 2. Navigate to the **Configuration** page of the created knowledge base and choose **Tag** as the default chunking method.
3. Navigate to the **Dataset** page and upload and parse your table file in XLSX, CSV, or TXT formats. 3. Navigate to the **Dataset** page and upload and parse your table file in XLSX, CSV, or TXT formats.
_A tag cloud appears under the **Tag view** section, indicating the tag set is created:_ _A tag cloud appears under the **Tag view** section, indicating the tag set is created:_
![Image](https://github.com/user-attachments/assets/abefbcbf-c130-4abe-95e1-267b0d2a0505) ![Image](https://github.com/user-attachments/assets/abefbcbf-c130-4abe-95e1-267b0d2a0505)

View File

@ -229,7 +229,9 @@ This section provides instructions on setting up the RAGFlow server on Linux. If
* Running on all addresses (0.0.0.0) * Running on all addresses (0.0.0.0)
``` ```
> If you skip this confirmation step and directly log in to RAGFlow, your browser may prompt a `network anomaly` error because, at that moment, your RAGFlow may not be fully initialized. :::danger IMPORTANT
If you skip this confirmation step and directly log in to RAGFlow, your browser may prompt a `network anomaly` error because, at that moment, your RAGFlow may not be fully initialized.
:::
5. In your web browser, enter the IP address of your server and log in to RAGFlow. 5. In your web browser, enter the IP address of your server and log in to RAGFlow.
@ -285,7 +287,7 @@ To create your first knowledge base:
![knowledge base configuration](https://github.com/infiniflow/ragflow/assets/93570324/384c671a-8b9c-468c-b1c9-1401128a9b65) ![knowledge base configuration](https://github.com/infiniflow/ragflow/assets/93570324/384c671a-8b9c-468c-b1c9-1401128a9b65)
3. RAGFlow offers multiple chunk templates that cater to different document layouts and file formats. Select the embedding model and chunk method (template) for your knowledge base. 3. RAGFlow offers multiple chunk templates that cater to different document layouts and file formats. Select the embedding model and chunking method (template) for your knowledge base.
:::danger IMPORTANT :::danger IMPORTANT
Once you have selected an embedding model and used it to parse a file, you are no longer allowed to change it. The obvious reason is that we must ensure that all files in a specific knowledge base are parsed using the *same* embedding model (ensure that they are being compared in the same embedding space). Once you have selected an embedding model and used it to parse a file, you are no longer allowed to change it. The obvious reason is that we must ensure that all files in a specific knowledge base are parsed using the *same* embedding model (ensure that they are being compared in the same embedding space).

View File

@ -519,7 +519,7 @@ A `Document` object contains the following attributes:
- `name`: The document name. Defaults to `""`. - `name`: The document name. Defaults to `""`.
- `thumbnail`: The thumbnail image of the document. Defaults to `None`. - `thumbnail`: The thumbnail image of the document. Defaults to `None`.
- `dataset_id`: The dataset ID associated with the document. Defaults to `None`. - `dataset_id`: The dataset ID associated with the document. Defaults to `None`.
- `chunk_method` The chunk method name. Defaults to `"naive"`. - `chunk_method` The chunking method name. Defaults to `"naive"`.
- `source_type`: The source type of the document. Defaults to `"local"`. - `source_type`: The source type of the document. Defaults to `"local"`.
- `type`: Type or category of the document. Defaults to `""`. Reserved for future use. - `type`: Type or category of the document. Defaults to `""`. Reserved for future use.
- `created_by`: `str` The creator of the document. Defaults to `""`. - `created_by`: `str` The creator of the document. Defaults to `""`.

View File

@ -7,14 +7,24 @@ slug: /release_notes
Key features, improvements and bug fixes in the latest releases. Key features, improvements and bug fixes in the latest releases.
:::info
Each RAGFlow release is available in two editions:
- **Slim edition**: excludes built-in embedding models and is identified by a **-slim** suffix added to the version name. Example: `infiniflow/ragflow:v0.18.0-slim`
- **Full edition**: includes built-in embedding models and has no suffix added to the version name. Example: `infiniflow/ragflow:v0.18.0`
:::
## v0.18.0 ## v0.18.0
Released on April 23, 2025. Released on April 23, 2025.
### Compatibility changes
From this release onwards, built-in rerank models have been removed because they have minimal impact on retrieval rates but significantly increase retrieval time.
### New features ### New features
- MCP server: enables access to RAGFlow's knowledge bases via MCP. - MCP server: enables access to RAGFlow's knowledge bases via MCP.
- DeepDoc supports adopting VLM model as a processing pipeline during document layout recognition, enabling in-depth analysis of images in PDFs. - DeepDoc supports adopting VLM model as a processing pipeline during document layout recognition, enabling in-depth analysis of images in PDF and DOCX files.
- OpenAI-compatible APIs: Agents can be called via OpenAI-compatible APIs. - OpenAI-compatible APIs: Agents can be called via OpenAI-compatible APIs.
- User registration control: administrators can enable or disable user registration through an environment variable. - User registration control: administrators can enable or disable user registration through an environment variable.
- Team collaboration: Agents can be shared with team members. - Team collaboration: Agents can be shared with team members.
@ -54,7 +64,7 @@ From this release onwards, if you still see RAGFlow's responses being cut short
- Accelerates knowledge graph extraction. - Accelerates knowledge graph extraction.
- Enables Tavily-based web search in the **Retrieval** agent component. - Enables Tavily-based web search in the **Retrieval** agent component.
- Adds Tongyi-Qianwen QwQ models (OpenAI-compatible). - Adds Tongyi-Qianwen QwQ models (OpenAI-compatible).
- Supports CSV files in the **General** chunk method. - Supports CSV files in the **General** chunking method.
### Fixed issues ### Fixed issues
@ -317,7 +327,7 @@ Released on October 31, 2024.
- Adds the team management functionality for all users. - Adds the team management functionality for all users.
- Updates the Agent UI to improve usability. - Updates the Agent UI to improve usability.
- Adds support for Markdown chunking in the **General** chunk method. - Adds support for Markdown chunking in the **General** chunking method.
- Introduces an **invoke** tool within the Agent UI. - Introduces an **invoke** tool within the Agent UI.
- Integrates support for Dify's knowledge base API. - Integrates support for Dify's knowledge base API.
- Adds support for GLM4-9B and Yi-Lightning models. - Adds support for GLM4-9B and Yi-Lightning models.
@ -349,7 +359,7 @@ Released on September 30, 2024.
- Improves the results of multi-round dialogues. - Improves the results of multi-round dialogues.
- Enables users to remove added LLM vendors. - Enables users to remove added LLM vendors.
- Adds support for **OpenTTS** and **SparkTTS** models. - Adds support for **OpenTTS** and **SparkTTS** models.
- Implements an **Excel to HTML** toggle in the **General** chunk method, allowing users to parse a spreadsheet into either HTML tables or key-value pairs by row. - Implements an **Excel to HTML** toggle in the **General** chunking method, allowing users to parse a spreadsheet into either HTML tables or key-value pairs by row.
- Adds agent tools **YahooFinance** and **Jin10**. - Adds agent tools **YahooFinance** and **Jin10**.
- Adds an investment advisor agent template. - Adds an investment advisor agent template.
@ -410,7 +420,7 @@ Released on August 6, 2024.
### New features ### New features
- Supports GraphRAG as a chunk method. - Supports GraphRAG as a chunking method.
- Introduces Agent component **Keyword** and search tools, including **Baidu**, **DuckDuckGo**, **PubMed**, **Wikipedia**, **Bing**, and **Google**. - Introduces Agent component **Keyword** and search tools, including **Baidu**, **DuckDuckGo**, **PubMed**, **Wikipedia**, **Bing**, and **Google**.
- Supports speech-to-text recognition for audio files. - Supports speech-to-text recognition for audio files.
- Supports model vendors **Gemini** and **Groq**. - Supports model vendors **Gemini** and **Groq**.
@ -425,8 +435,8 @@ Released on July 8, 2024.
- Supports Agentic RAG, enabling graph-based workflow construction for RAG and agents. - Supports Agentic RAG, enabling graph-based workflow construction for RAG and agents.
- Supports model vendors **Mistral**, **MiniMax**, **Bedrock**, and **Azure OpenAI**. - Supports model vendors **Mistral**, **MiniMax**, **Bedrock**, and **Azure OpenAI**.
- Supports DOCX files in the MANUAL chunk method. - Supports DOCX files in the MANUAL chunking method.
- Supports DOCX, MD, and PDF files in the Q&A chunk method. - Supports DOCX, MD, and PDF files in the Q&A chunking method.
## v0.7.0 ## v0.7.0
@ -438,7 +448,7 @@ Released on May 31, 2024.
- Integrates reranker and embedding models: [BCE](https://github.com/netease-youdao/BCEmbedding), [BGE](https://github.com/FlagOpen/FlagEmbedding), and [Jina](https://jina.ai/embeddings/). - Integrates reranker and embedding models: [BCE](https://github.com/netease-youdao/BCEmbedding), [BGE](https://github.com/FlagOpen/FlagEmbedding), and [Jina](https://jina.ai/embeddings/).
- Supports LLMs Baichuan and VolcanoArk. - Supports LLMs Baichuan and VolcanoArk.
- Implements [RAPTOR](https://arxiv.org/html/2401.18059v1) for improved text retrieval. - Implements [RAPTOR](https://arxiv.org/html/2401.18059v1) for improved text retrieval.
- Supports HTML files in the GENERAL chunk method. - Supports HTML files in the GENERAL chunking method.
- Provides HTTP and Python APIs for deleting documents by ID. - Provides HTTP and Python APIs for deleting documents by ID.
- Supports ARM64 platforms. - Supports ARM64 platforms.
@ -467,7 +477,7 @@ Released on May 21, 2024.
- Supports streaming output. - Supports streaming output.
- Provides HTTP and Python APIs for retrieving document chunks. - Provides HTTP and Python APIs for retrieving document chunks.
- Supports monitoring of system components, including Elasticsearch, MySQL, Redis, and MinIO. - Supports monitoring of system components, including Elasticsearch, MySQL, Redis, and MinIO.
- Supports disabling **Layout Recognition** in the GENERAL chunk method to reduce file chunking time. - Supports disabling **Layout Recognition** in the GENERAL chunking method to reduce file chunking time.
### Related APIs ### Related APIs

View File

@ -100,7 +100,7 @@ export default {
webCrawl: 'Web Crawl', webCrawl: 'Web Crawl',
chunkNumber: 'Chunk Number', chunkNumber: 'Chunk Number',
uploadDate: 'Upload Date', uploadDate: 'Upload Date',
chunkMethod: 'Chunk Method', chunkMethod: 'Chunking method',
enabled: 'Enable', enabled: 'Enable',
disabled: 'Disable', disabled: 'Disable',
action: 'Action', action: 'Action',
@ -166,7 +166,7 @@ export default {
delimiterTip: delimiterTip:
'A delimiter or separator can consist of one or multiple special characters. If it is multiple characters, ensure they are enclosed in backticks( ``). For example, if you configure your delimiters like this: \\n`##`;, then your texts will be separated at line breaks, double hash symbols (##), and semicolons.', 'A delimiter or separator can consist of one or multiple special characters. If it is multiple characters, ensure they are enclosed in backticks( ``). For example, if you configure your delimiters like this: \\n`##`;, then your texts will be separated at line breaks, double hash symbols (##), and semicolons.',
html4excel: 'Excel to HTML', html4excel: 'Excel to HTML',
html4excelTip: `Use with the General chunk method. When disabled, spreadsheets (XLSX or XLS(Excel97~2003)) in the knowledge base will be parsed into key-value pairs. When enabled, they will be parsed into HTML tables, splitting every 12 rows if the original table has more than 12 rows.`, html4excelTip: `Use with the General chunking method. When disabled, spreadsheets (XLSX or XLS(Excel97~2003)) in the knowledge base will be parsed into key-value pairs. When enabled, they will be parsed into HTML tables, splitting every 12 rows if the original table has more than 12 rows.`,
autoKeywords: 'Auto-keyword', autoKeywords: 'Auto-keyword',
autoKeywordsTip: `Automatically extract N keywords for each chunk to increase their ranking for queries containing those keywords. Be aware that extra tokens will be consumed by the chat model specified in 'System model settings'. You can check or update the added keywords for a chunk from the chunk list. `, autoKeywordsTip: `Automatically extract N keywords for each chunk to increase their ranking for queries containing those keywords. Be aware that extra tokens will be consumed by the chat model specified in 'System model settings'. You can check or update the added keywords for a chunk from the chunk list. `,
autoQuestions: 'Auto-question', autoQuestions: 'Auto-question',
@ -201,7 +201,7 @@ export default {
}, },
knowledgeConfiguration: { knowledgeConfiguration: {
titleDescription: titleDescription:
'Update your knowledge base configuration here, particularly the chunk method.', 'Update your knowledge base configuration here, particularly the chunking method.',
name: 'Knowledge base name', name: 'Knowledge base name',
photo: 'Knowledge base photo', photo: 'Knowledge base photo',
description: 'Description', description: 'Description',
@ -218,19 +218,19 @@ export default {
"If it is set to 'Team', all your team members will be able to manage the knowledge base.", "If it is set to 'Team', all your team members will be able to manage the knowledge base.",
chunkTokenNumberTip: chunkTokenNumberTip:
'It kind of sets the token threshold for a creating a chunk. A segment with fewer tokens than this threshold will be combined with the following segments until the token count exceeds the threshold, at which point a chunk is created. No new chunk is created unless a delimiter is encountered, even if the threshold is exceeded.', 'It kind of sets the token threshold for a creating a chunk. A segment with fewer tokens than this threshold will be combined with the following segments until the token count exceeds the threshold, at which point a chunk is created. No new chunk is created unless a delimiter is encountered, even if the threshold is exceeded.',
chunkMethod: 'Chunk method', chunkMethod: 'Chunking method',
chunkMethodTip: 'View the tips on the right.', chunkMethodTip: 'View the tips on the right.',
upload: 'Upload', upload: 'Upload',
english: 'English', english: 'English',
chinese: 'Chinese', chinese: 'Chinese',
portugueseBr: 'Portuguese (Brazil)', portugueseBr: 'Portuguese (Brazil)',
embeddingModelPlaceholder: 'Please select a embedding model.', embeddingModelPlaceholder: 'Please select a embedding model.',
chunkMethodPlaceholder: 'Please select a chunk method.', chunkMethodPlaceholder: 'Please select a chunking method.',
save: 'Save', save: 'Save',
me: 'Only me', me: 'Only me',
team: 'Team', team: 'Team',
cancel: 'Cancel', cancel: 'Cancel',
methodTitle: 'Chunk method description', methodTitle: 'Chunking method description',
methodExamples: 'Examples', methodExamples: 'Examples',
methodExamplesDescription: methodExamplesDescription:
'The following screenshots are provided for clarity.', 'The following screenshots are provided for clarity.',
@ -258,10 +258,10 @@ export default {
However, it also increases the context for AI conversations and adds to the computational cost for the LLM. So during a conversation, consider reducing the value of <b>topN</b>.</p>`, However, it also increases the context for AI conversations and adds to the computational cost for the LLM. So during a conversation, consider reducing the value of <b>topN</b>.</p>`,
presentation: `<p>Supported file formats are <b>PDF</b>, <b>PPTX</b>.</p><p> presentation: `<p>Supported file formats are <b>PDF</b>, <b>PPTX</b>.</p><p>
Every page in the slides is treated as a chunk, with its thumbnail image stored.</p><p> Every page in the slides is treated as a chunk, with its thumbnail image stored.</p><p>
<i>This chunk method is automatically applied to all uploaded PPT files, so you do not need to specify it manually.</i></p>`, <i>This chunking method is automatically applied to all uploaded PPT files, so you do not need to specify it manually.</i></p>`,
qa: ` qa: `
<p> <p>
This chunk method supports <b>XLSX</b> and <b>CSV/TXT</b> file formats. This chunking method supports <b>XLSX</b> and <b>CSV/TXT</b> file formats.
</p> </p>
<li> <li>
If a file is in <b>XLSX</b> or <b>XLS (Excel97~2003)</b> format, it should contain two columns without headers: one for questions and the other for answers, with the question column preceding the answer column. Multiple sheets are If a file is in <b>XLSX</b> or <b>XLS (Excel97~2003)</b> format, it should contain two columns without headers: one for questions and the other for answers, with the question column preceding the answer column. Multiple sheets are
@ -314,8 +314,8 @@ export default {
<p>This approach chunks files using the 'naive'/'General' method. It splits a document into segments and then combines adjacent segments until the token count exceeds the threshold specified by 'Chunk token number for text', at which point a chunk is created.</p> <p>This approach chunks files using the 'naive'/'General' method. It splits a document into segments and then combines adjacent segments until the token count exceeds the threshold specified by 'Chunk token number for text', at which point a chunk is created.</p>
<p>The chunks are then fed to the LLM to extract entities and relationships for a knowledge graph and a mind map.</p> <p>The chunks are then fed to the LLM to extract entities and relationships for a knowledge graph and a mind map.</p>
<p>Ensure that you set the <b>Entity types</b>.</p>`, <p>Ensure that you set the <b>Entity types</b>.</p>`,
tag: `<p>A knowledge base using the 'Tag' chunk method functions as a tag set. Other knowledge bases can use it to tag their own chunks, and queries to these knowledge bases will also be tagged using this tag set.</p> tag: `<p>A knowledge base using the 'Tag' chunking method functions as a tag set. Other knowledge bases can use it to tag their own chunks, and queries to these knowledge bases will also be tagged using this tag set.</p>
<p>Knowledge base using 'Tag' as a chunk method will <b>NOT</b> be involved in a Retrieval-Augmented Generation (RAG) process.</p> <p>Knowledge base using 'Tag' as a chunking method will <b>NOT</b> be involved in a Retrieval-Augmented Generation (RAG) process.</p>
<p>Each chunk in this knowledge base is an independent description-tag pair.</p> <p>Each chunk in this knowledge base is an independent description-tag pair.</p>
<p>Supported file formats include <b>XLSX</b> and <b>CSV/TXT</b>:</p> <p>Supported file formats include <b>XLSX</b> and <b>CSV/TXT</b>:</p>
<p>If a file is in <b>XLSX</b> format, it should contain two columns without headers: one for tag descriptions and the other for tag names, with the Description column preceding the Tag column. Multiple sheets are acceptable, provided the columns are properly structured.</p> <p>If a file is in <b>XLSX</b> format, it should contain two columns without headers: one for tag descriptions and the other for tag names, with the Description column preceding the Tag column. Multiple sheets are acceptable, provided the columns are properly structured.</p>
@ -1216,7 +1216,7 @@ This delimiter is used to split the input text into several text pieces echo of
}`, }`,
datatype: 'MINE type of the HTTP request', datatype: 'MINE type of the HTTP request',
insertVariableTip: `Enter / Insert variables`, insertVariableTip: `Enter / Insert variables`,
historyversion: 'History version', historyversion: 'Version history',
filename: 'File name', filename: 'File name',
version: { version: {
created: 'Created', created: 'Created',
@ -1226,14 +1226,14 @@ This delimiter is used to split the input text into several text pieces echo of
version: 'Version', version: 'Version',
select: 'No version selected', select: 'No version selected',
}, },
setting: 'Setting', setting: 'Settings',
settings: { settings: {
agentSetting: 'Agent Setting', agentSetting: 'Agent settings',
title: 'title', title: 'title',
description: 'description', description: 'description',
upload: 'Upload', upload: 'Upload',
photo: 'Photo', photo: 'Photo',
permissions: 'Permission', permissions: 'Permissions',
permissionsTip: 'You can set the permissions of the team members here.', permissionsTip: 'You can set the permissions of the team members here.',
me: 'me', me: 'me',
team: 'Team', team: 'Team',

View File

@ -162,7 +162,7 @@ export default {
topKTip: `與 Rerank 模型配合使用,用於設定傳給 Rerank 模型的文本塊數量。`, topKTip: `與 Rerank 模型配合使用,用於設定傳給 Rerank 模型的文本塊數量。`,
delimiter: `文字分段標識符`, delimiter: `文字分段標識符`,
delimiterTip: delimiterTip:
'支持多字符作為分隔符,多字符用 `` 分隔符包裹。若配置成:\\n`##`; 系統將首先使用換行符、兩個#號以及分號先對文本進行分割,隨後再對分得的小文本塊按照「建议文本块大小」設定的大小進行拼裝。在设置文本分段標識符之前,請確保您已理解上述文本分段切片機制。', '支持多字符作為分隔符,多字符用兩個反引號 \\`\\` 分隔符包裹。若配置成:\\n`##`; 系統將首先使用換行符、兩個#號以及分號先對文本進行分割,隨後再對分得的小文本塊按照「建议文本块大小」設定的大小進行拼裝。在设置文本分段標識符之前,請確保您已理解上述文本分段切片機制。',
html4excel: '表格轉HTML', html4excel: '表格轉HTML',
html4excelTip: `與 General 切片方法配合使用。未開啟狀態下表格檔案XLSX、XLSExcel97~2003會按行解析為鍵值對。開啟後表格檔案會被解析為 HTML 表格。若原始表格超過 12 行,系統會自動按每 12 行拆分為多個 HTML 表格。`, html4excelTip: `與 General 切片方法配合使用。未開啟狀態下表格檔案XLSX、XLSExcel97~2003會按行解析為鍵值對。開啟後表格檔案會被解析為 HTML 表格。若原始表格超過 12 行,系統會自動按每 12 行拆分為多個 HTML 表格。`,
autoKeywords: '自動關鍵字', autoKeywords: '自動關鍵字',

View File

@ -162,7 +162,7 @@ export default {
topKTip: `与 Rerank 模型配合使用,用于设置传给 Rerank 模型的文本块数量。`, topKTip: `与 Rerank 模型配合使用,用于设置传给 Rerank 模型的文本块数量。`,
delimiter: `文本分段标识符`, delimiter: `文本分段标识符`,
delimiterTip: delimiterTip:
'支持多字符作为分隔符,多字符用 `` 分隔符包裹。若配置成:\\n`##`; 系统将首先使用换行符、两个#号以及分号先对文本进行分割,随后再对分得的小文本块按照「建议文本块大小」设定的大小进行拼装。在设置文本分段标识符前请确保理解上述文本分段切片机制。', '支持多字符作为分隔符,多字符用两个反引号 \\`\\` 分隔符包裹。若配置成:\\n`##`; 系统将首先使用换行符、两个#号以及分号先对文本进行分割,随后再对分得的小文本块按照「建议文本块大小」设定的大小进行拼装。在设置文本分段标识符前请确保理解上述文本分段切片机制。',
html4excel: '表格转HTML', html4excel: '表格转HTML',
html4excelTip: `与 General 切片方法配合使用。未开启状态下表格文件XLSX、XLSExcel97~2003会按行解析为键值对。开启后表格文件会被解析为 HTML 表格。若原始表格超过 12 行,系统会自动按每 12 行拆分为多个 HTML 表格。`, html4excelTip: `与 General 切片方法配合使用。未开启状态下表格文件XLSX、XLSExcel97~2003会按行解析为键值对。开启后表格文件会被解析为 HTML 表格。若原始表格超过 12 行,系统会自动按每 12 行拆分为多个 HTML 表格。`,
autoKeywords: '自动关键词提取', autoKeywords: '自动关键词提取',

View File

@ -12,7 +12,7 @@ export const useGetPageTitle = (): string => {
[ProfileSettingRouteKey.Api]: 'Api', [ProfileSettingRouteKey.Api]: 'Api',
[ProfileSettingRouteKey.Team]: 'Team management', [ProfileSettingRouteKey.Team]: 'Team management',
[ProfileSettingRouteKey.Prompt]: 'Prompt management', [ProfileSettingRouteKey.Prompt]: 'Prompt management',
[ProfileSettingRouteKey.Chunk]: 'Chunk method', [ProfileSettingRouteKey.Chunk]: 'Chunking method',
[ProfileSettingRouteKey.Logout]: 'Logout', [ProfileSettingRouteKey.Logout]: 'Logout',
}; };