Miscellaneous edits to RAGFlow's UI (#3337)

### What problem does this PR solve?



### Type of change

- [x] Documentation Update
This commit is contained in:
writinwaters 2024-11-11 19:29:34 +08:00 committed by GitHub
parent 88072b1e90
commit 8536335e63
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
6 changed files with 53 additions and 47 deletions

View File

@ -1,7 +1,7 @@
{ {
"id": 8, "id": 8,
"title": "Intelligent investment advisor", "title": "Intelligent investment advisor",
"description": "An intelligent investment advisor that can answer your financial questions based on real-time domestic financial data and financial information.", "description": "An intelligent investment advisor that answers your financial questions using real-time domestic financial data.",
"canvas_type": "chatbot", "canvas_type": "chatbot",
"dsl": { "dsl": {
"answer": [], "answer": [],

View File

@ -1,7 +1,7 @@
{ {
"id": 7, "id": 7,
"title": "Medical consultation", "title": "Medical consultation",
"description": "Medical Consultation Assistant, can provide you with some professional consultation suggestions for your reference. Please note that the content provided by the medical assistant is for reference only and may not be authentic or available. Knowledge Base Content Reference: <a href = 'https://huggingface.co/datasets/InfiniFlow/medical_QA/tree/main'> Medical Knowledge Base Reference</a>", "description": "A consultant that offers medical suggestions using an internal QA dataset and PubMed search results. Note that this agent's answers are for reference only and may not be valid. The dataset can be found at https://huggingface.co/datasets/InfiniFlow/medical_QA/tree/main",
"canvas_type": "chatbot", "canvas_type": "chatbot",
"dsl": { "dsl": {
"answer": [], "answer": [],

View File

@ -410,7 +410,7 @@ def queue_raptor_tasks(doc):
"doc_id": doc["id"], "doc_id": doc["id"],
"from_page": 0, "from_page": 0,
"to_page": -1, "to_page": -1,
"progress_msg": "Start to do RAPTOR (Recursive Abstractive Processing For Tree-Organized Retrieval)." "progress_msg": "Start to do RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval)."
} }
task = new_task() task = new_task()

View File

@ -136,37 +136,44 @@ If you cannot download the RAGFlow Docker image, try the following mirrors.
[service_conf.yaml](https://github.com/infiniflow/ragflow/blob/main/docker/service_conf.yaml) specifies the system-level configuration for RAGFlow and is used by its API server and task executor. [service_conf.yaml](https://github.com/infiniflow/ragflow/blob/main/docker/service_conf.yaml) specifies the system-level configuration for RAGFlow and is used by its API server and task executor.
- `ragflow` ### `ragflow`
- `host`: The API server's IP address inside the Docker container. Defaults to `0.0.0.0`.
- `port`: The API server's serving port inside the Docker container. Defaults to `9380`.
- `mysql` - `host`: The API server's IP address inside the Docker container. Defaults to `0.0.0.0`.
- `name`: The MySQL database name. Defaults to `rag_flow`. - `port`: The API server's serving port inside the Docker container. Defaults to `9380`.
- `user`: The username for MySQL.
- `password`: The password for MySQL. When updated, you must revise the `MYSQL_PASSWORD` variable in [.env](https://github.com/infiniflow/ragflow/blob/main/docker/.env) accordingly.
- `port`: The MySQL serving port inside the Docker container. Defaults to `3306`.
- `max_connections`: The maximum number of concurrent connections to the MySQL database. Defaults to `100`.
- `stale_timeout`: Timeout in seconds.
- `minio` ### `mysql`
- `user`: The username for MinIO. When updated, you must revise the `MINIO_USER` variable in [.env](https://github.com/infiniflow/ragflow/blob/main/docker/.env) accordingly.
- `password`: The password for MinIO. When updated, you must revise the `MINIO_PASSWORD` variable in [.env](https://github.com/infiniflow/ragflow/blob/main/docker/.env) accordingly. - `name`: The MySQL database name. Defaults to `rag_flow`.
- `host`: The MinIO serving IP *and* port inside the Docker container. Defaults to `minio:9000`. - `user`: The username for MySQL.
- `password`: The password for MySQL. When updated, you must revise the `MYSQL_PASSWORD` variable in [.env](https://github.com/infiniflow/ragflow/blob/main/docker/.env) accordingly.
- `port`: The MySQL serving port inside the Docker container. Defaults to `3306`.
- `max_connections`: The maximum number of concurrent connections to the MySQL database. Defaults to `100`.
- `stale_timeout`: Timeout in seconds.
- `oauth` ### `minio`
The OAuth configuration for signing up or signing in to RAGFlow using a third-party account. It is disabled by default. To enable this feature, uncomment the corresponding lines in **service_conf.yaml**.
- `github`: The GitHub authentication settings for your application. Visit the [Github Developer Settings](https://github.com/settings/developers) page to obtain your client_id and secret_key. - `user`: The username for MinIO. When updated, you must revise the `MINIO_USER` variable in [.env](https://github.com/infiniflow/ragflow/blob/main/docker/.env) accordingly.
- `password`: The password for MinIO. When updated, you must revise the `MINIO_PASSWORD` variable in [.env](https://github.com/infiniflow/ragflow/blob/main/docker/.env) accordingly.
- `host`: The MinIO serving IP *and* port inside the Docker container. Defaults to `minio:9000`.
- `user_default_llm` ### `oauth`
The default LLM to use for a new RAGFlow user. It is disabled by default. To enable this feature, uncomment the corresponding lines in **service_conf.yaml**.
- `factory`: The LLM supplier. Available options: The OAuth configuration for signing up or signing in to RAGFlow using a third-party account. It is disabled by default. To enable this feature, uncomment the corresponding lines in **service_conf.yaml**.
- `"OpenAI"`
- `"DeepSeek"` - `github`: The GitHub authentication settings for your application. Visit the [Github Developer Settings](https://github.com/settings/developers) page to obtain your client_id and secret_key.
- `"Moonshot"`
- `"Tongyi-Qianwen"` ### `user_default_llm`
- `"VolcEngine"`
- `"ZHIPU-AI"` The default LLM to use for a new RAGFlow user. It is disabled by default. To enable this feature, uncomment the corresponding lines in **service_conf.yaml**.
- `api_key`: The API key for the specified LLM. You will need to apply for your model API key online.
- `factory`: The LLM supplier. Available options:
- `"OpenAI"`
- `"DeepSeek"`
- `"Moonshot"`
- `"Tongyi-Qianwen"`
- `"VolcEngine"`
- `"ZHIPU-AI"`
- `api_key`: The API key for the specified LLM. You will need to apply for your model API key online.
:::tip NOTE :::tip NOTE
If you do not set the default LLM here, configure the default LLM on the **Settings** page in the RAGFlow UI. If you do not set the default LLM here, configure the default LLM on the **Settings** page in the RAGFlow UI.

View File

@ -52,13 +52,13 @@ RAGFlow offers multiple chunking template to facilitate chunking files of differ
| Picture | | JPEG, JPG, PNG, TIF, GIF | | Picture | | JPEG, JPG, PNG, TIF, GIF |
| One | The entire document is chunked as one. | DOCX, EXCEL, PDF, TXT | | One | The entire document is chunked as one. | DOCX, EXCEL, PDF, TXT |
You can also change the chunk template for a particular file on the **Datasets** page. You can also change the chunk template for a particular file on the **Datasets** page.
![change chunk method](https://github.com/infiniflow/ragflow/assets/93570324/ac116353-2793-42b2-b181-65e7082bed42) ![change chunk method](https://github.com/infiniflow/ragflow/assets/93570324/ac116353-2793-42b2-b181-65e7082bed42)
### Select embedding model ### Select embedding model
An embedding model builds vector index on file chunks. Once you have chosen an embedding model and used it to parse a file, you are no longer allowed to change it. To switch to a different embedding model, you *must* delete all completed file chunks in the knowledge base. The obvious reason is that we must *ensure* that all files in a specific knowledge base are parsed using the *same* embedding model (ensure that they are compared in the same embedding space). An embedding model converts chunks into embeddings. It cannot be changed once the knowledge base has chunks. To switch to a different embedding model, You must delete all chunks in the knowledge base. The obvious reason is that we *must* ensure that files in a specific knowledge base are converted to embeddings using the *same* embedding model (ensure that they are compared in the same embedding space).
The following embedding models can be deployed locally: The following embedding models can be deployed locally:

View File

@ -157,14 +157,14 @@ export default {
delimiter: `Delimiter`, delimiter: `Delimiter`,
html4excel: 'Excel to HTML', html4excel: 'Excel to HTML',
html4excelTip: `Excel will be parsed into HTML table or not. If it's FALSE, every row in Excel will be formed as a chunk.`, html4excelTip: `Excel will be parsed into HTML table or not. If it's FALSE, every row in Excel will be formed as a chunk.`,
autoKeywords: 'Auto keywords', autoKeywords: 'Auto-keyword',
autoKeywordsTip: `Extract N keywords for every chunk to boost their rank score while querying such keywords. Extra tokens will be comsumed for LLM that you set in 'System model settings'. You can check the result in the chunk list.`, autoKeywordsTip: `Extract N keywords for each chunk to improve their ranking for queries containing those keywords. You can check or update the added keywords for a chunk from the chunk list. Be aware that extra tokens will be consumed by the LLM specified in 'System model settings'.`,
autoQuestions: 'Auto questions', autoQuestions: 'Auto-question',
autoQuestionsTip: `Extract N questions for every chunk to boost their rank score while querying such questions. Extra tokens will be comsumed for LLM that you set in 'System model settings'. You can check the result in the chunk list. This function will not destroy the entire chunking process if errors occur except adding empty result to the original chunk.`, autoQuestionsTip: `Extract N questions for each chunk to improve their ranking for queries containing those questions. You can check or update the added questions for a chunk from the chunk list. This feature will not disrupt the chunking process if an error occurs, except that it may add an empty result to the original chunk. Be aware that extra tokens will be consumed by the LLM specified in 'System model settings'.`,
}, },
knowledgeConfiguration: { knowledgeConfiguration: {
titleDescription: titleDescription:
'Update your knowledge base details especially parsing method here.', 'Update your knowledge base configurations here, particularly the chunk method.',
name: 'Knowledge base name', name: 'Knowledge base name',
photo: 'Knowledge base photo', photo: 'Knowledge base photo',
description: 'Description', description: 'Description',
@ -176,13 +176,13 @@ export default {
chunkTokenNumber: 'Chunk token number', chunkTokenNumber: 'Chunk token number',
chunkTokenNumberMessage: 'Chunk token number is required', chunkTokenNumberMessage: 'Chunk token number is required',
embeddingModelTip: embeddingModelTip:
"The embedding model used to embedding chunks. It's unchangable once the knowledgebase has chunks. You need to delete all the chunks if you want to change it.", "The model that converts chunks into embeddings. It cannot be changed once the knowledge base has chunks. To switch to a different embedding model, You must delete all chunks in the knowledge base.",
permissionsTip: permissionsTip:
"If the permission is 'Team', all the team member can manipulate the knowledgebase.", "If set to 'Team', all team members will be able to manage the knowledge base.",
chunkTokenNumberTip: chunkTokenNumberTip:
'It determine the token number of a chunk approximately.', 'It sets the token threshold for a chunk. A paragraph with fewer tokens than this threshold will be combined with the following paragraph until the token count exceeds the threshold, at which point a chunk is created.',
chunkMethod: 'Chunk method', chunkMethod: 'Chunk method',
chunkMethodTip: 'The instruction is at right.', chunkMethodTip: 'Tips are on the right.',
upload: 'Upload', upload: 'Upload',
english: 'English', english: 'English',
chinese: 'Chinese', chinese: 'Chinese',
@ -192,11 +192,11 @@ export default {
me: 'Only me', me: 'Only me',
team: 'Team', team: 'Team',
cancel: 'Cancel', cancel: 'Cancel',
methodTitle: 'Chunking Method Description', methodTitle: 'Chunk method description',
methodExamples: 'Examples', methodExamples: 'Examples',
methodExamplesDescription: methodExamplesDescription:
'The following screenshots are presented to facilitate understanding.', 'The following screenshots are provided for clarity.',
dialogueExamplesTitle: 'Dialogue Examples', dialogueExamplesTitle: 'Dialogue examples',
methodEmpty: methodEmpty:
'This will display a visual explanation of the knowledge base categories', 'This will display a visual explanation of the knowledge base categories',
book: `<p>Supported file formats are <b>DOCX</b>, <b>PDF</b>, <b>TXT</b>.</p><p> book: `<p>Supported file formats are <b>DOCX</b>, <b>PDF</b>, <b>TXT</b>.</p><p>
@ -208,8 +208,7 @@ export default {
The chunk granularity is consistent with 'ARTICLE', and all the upper level text will be included in the chunk. The chunk granularity is consistent with 'ARTICLE', and all the upper level text will be included in the chunk.
</p>`, </p>`,
manual: `<p>Only <b>PDF</b> is supported.</p><p> manual: `<p>Only <b>PDF</b> is supported.</p><p>
We assume manual has hierarchical section structure. We use the lowest section titles as pivots to slice documents. We assume that the manual has a hierarchical section structure, using the lowest section titles as basic unit for chunking documents. Therefore, figures and tables in the same section will not be separated, which may result in larger chunk sizes.
So, the figures and tables in the same section will not be sliced apart, and chunk size might be large.
</p>`, </p>`,
naive: `<p>Supported file formats are <b>DOCX, EXCEL, PPT, IMAGE, PDF, TXT, MD, JSON, EML, HTML</b>.</p> naive: `<p>Supported file formats are <b>DOCX, EXCEL, PPT, IMAGE, PDF, TXT, MD, JSON, EML, HTML</b>.</p>
<p>This method apply the naive ways to chunk files: </p> <p>This method apply the naive ways to chunk files: </p>
@ -292,7 +291,7 @@ Successive text will be sliced into pieces each of which is around 512 token num
Mind the entiry type you need to specify.</p>`, Mind the entiry type you need to specify.</p>`,
useRaptor: 'Use RAPTOR to enhance retrieval', useRaptor: 'Use RAPTOR to enhance retrieval',
useRaptorTip: useRaptorTip:
'Recursive Abstractive Processing for Tree-Organized Retrieval, please refer to https://huggingface.co/papers/2401.18059', 'Recursive Abstractive Processing for Tree-Organized Retrieval, see https://huggingface.co/papers/2401.18059 for more information',
prompt: 'Prompt', prompt: 'Prompt',
promptTip: 'LLM prompt used for summarization.', promptTip: 'LLM prompt used for summarization.',
promptMessage: 'Prompt is required', promptMessage: 'Prompt is required',