Added a guide on running a retrieval test, with and without knowledge graph (#5200)

### What problem does this PR solve? ### Type of change - [x] Documentation Update
2025-08-12 17:19:02 +08:00 · 2025-02-21 19:36:20 +08:00 · 2025-02-21 19:36:20 +08:00 · 217caecfda
commit 217caecfda
parent ef8847eda7
9 changed files with 114 additions and 21 deletions
--- a/agent/component/retrieval.py
+++ b/agent/component/retrieval.py
@ -43,7 +43,7 @@ class RetrievalParam(ComponentParamBase):

    def check(self):
        self.check_decimal_float(self.similarity_threshold, "[Retrieval] Similarity threshold")
-        self.check_decimal_float(self.keywords_similarity_weight, "[Retrieval] Keywords similarity weight")
+        self.check_decimal_float(self.keywords_similarity_weight, "[Retrieval] Keyword similarity weight")
        self.check_positive_number(self.top_n, "[Retrieval] Top N")


--- a/docs/guides/agent/text2sql_agent.md
+++ b/docs/guides/agent/text2sql_agent.md
@ -383,7 +383,7 @@ Since version 0.15.0, ragflow has introduced step-by-step execution for Agent co
 Find all customers who has bought a mobile phone
 ```
 ![](https://github.com/user-attachments/assets/a6270188-72af-4be7-a192-efddb611f3a4)
-3. As the image shows, no matching information was retrieved from the Q->SQL knowledge base, yet a similar question exists within the database. Adjust the Rerank model, "Similarity threshold," or "Keywords similarity weight" accordingly to return relevant content.
+3. As the image shows, no matching information was retrieved from the Q->SQL knowledge base, yet a similar question exists within the database. Adjust the Rerank model, "Similarity threshold," or "Keyword similarity weight" accordingly to return relevant content.
 ![](https://github.com/user-attachments/assets/0592c45b-9276-465d-93d3-2530b2fb81c0)
 ![](https://github.com/user-attachments/assets/9e72be3a-41af-4ef2-863d-03757ddfdde6)

--- a/docs/guides/configure_knowledge_base/configure_knowledge_base.md
+++ b/docs/guides/configure_knowledge_base/configure_knowledge_base.md
@ -52,7 +52,7 @@ RAGFlow offers multiple chunking template to facilitate chunking files of differ
 | Picture      |                                                                       | JPEG, JPG, PNG, TIF, GIF                             |
 | One          | The entire document is chunked as one.                                | DOCX, EXCEL, PDF, TXT                                |

-You can also change the chunk template for a particular file on the **Datasets** page.
+You can also change a file's chunk method on the **Datasets** page.

 ![change chunk method](https://github.com/infiniflow/ragflow/assets/93570324/ac116353-2793-42b2-b181-65e7082bed42)

--- a/docs/guides/configure_knowledge_base/construct_knowledge_graph.md
+++ b/docs/guides/configure_knowledge_base/construct_knowledge_graph.md
@ -11,7 +11,7 @@ Generate a knowledge graph for your knowledge base.

 To enhance multi-hop question-answering, RAGFlow adds a knowledge graph construction step between data extraction and indexing, as illustrated below. This step creates additional chunks from existing ones generated by your specified chunk method.

-![Image](https://github.com/user-attachments/assets/edf0528d-cb46-46fc-aef4-edb98996949b)
+![Image](https://github.com/user-attachments/assets/1ec21d8e-f255-4d65-9918-69b72dfa142b)

 As of v0.16.0, RAGFlow supports constructing a knowledge graph on a knowledge base, allowing you to construct a *unified* graph across multiple files within your knowledge base. When a newly uploaded file starts parsing, the generated graph will automatically update.

@ -73,4 +73,12 @@ In a knowledge graph, a community is a cluster of entities linked by relationshi

 ### Can I have different knowledge graph settings for different files in my knowledge base?

-Yes, you can. Just one graph is generated per knowledge base. The smaller graphs of your files will be *combined* into one big, unified graph at the end of the graph extraction process.
+Yes, you can. Just one graph is generated per knowledge base. The smaller graphs of your files will be *combined* into one big, unified graph at the end of the graph extraction process.
+
+### Does the knowledge graph automatically update when I remove a related file?
+
+Nope. The knowledge graph does *not* automatically update *until* a newly uploaded graph is parsed.
+
+### How to remove a generated knowledge graph?
+
+To remove the generated knowledge graph, delete all related files in your knowledge base. Although the **Knowledge Graph** entry will still be visible, the graph has actually been deleted.
--- a/docs/guides/configure_knowledge_base/run_retrieval_test.md
+++ b/docs/guides/configure_knowledge_base/run_retrieval_test.md
@ -0,0 +1,82 @@
+---
+sidebar_position: 10
+slug: /run_retrieval_test
+---
+
+# Run retrieval test
+
+Conduct a retrieval test on your knowledge base to check whether the intended chunks can be retrieved.
+
+---
+
+After your files are uploaded and parsed, it is recommended that you run a retrieval test before proceeding with the chat assistant configuration. Just like fine-tuning a precision instrument, RAGFlow requires careful tuning to deliver optimal question answering performance. Your knowledge base settings, chat assistant configurations, and the specified large and small models can all significantly impact the final results. Running a retrieval test verifies whether the intended chunks can be recovered, allowing you to quickly identify areas for improvement or pinpoint any issue that needs addressing. For instance, when debugging your question answering system, if you know that the correct chunks can be retrieved, you can focus your efforts elsewhere.
+
+During a retrieval test, chunks created from your specified chunk method are retrieved using a hybrid search. This search combines weighted keyword similarity with either weighted vector cosine similarity or a weighted reranking score, depending on your settings:
+
+- If no rerank model is selected, weighted keyword similarity will be combined with weighted vector cosine similarity.
+- If a rerank model is selected, weighted keyword similarity will be combined with weighted vector reranking score.
+
+In contrast, chunks created from [knowledge graph construction](./construct_knowledge_graph.md) are retrieved solely using vector cosine similarity.
+
+## Prerequisites
+
+- Your files are uploaded and successfully parsed before running a retrieval test.
+- A knowledge graph must be successfully built before enabling **Use knowledge graph**.
+
+## Configurations
+
+### Similarity threshold
+
+This sets the bar for retrieving chunks: chunks with similarities below the threshold will be filtered out. By default, the threshold is set to 0.2.
+
+### Keyword similarity weight
+
+This sets the weight of keyword similarity in the combined similarity score, whether used with vector cosine similarity or a reranking score. By default, it is set to 0.7, making the weight of the other component 0.3 (1 - 0.7).
+
+### Rerank model
+
+- If left empty, RAGFlow will use a combination of weighted keyword similarity and weighted vector cosine similarity.
+- If a rerank model is selected, weighted keyword similarity will be combined with weighted vector reranking score.
+
+:::danger IMPORTANT
+Using a rerank model will significantly increase the time to receive a response.
+:::
+
+### Use knowledge graph
+
+In a knowledge graph, an entity description, a relationship description, or a community report each exists as an independent chunk. This switch indicates whether to add these chunks to the retrieval.
+
+The switch is disabled by default. When enabled, RAGFlow performs the following during a retrieval test:
+
+1. Extract entities and entity types from your query using the LLM.
+2. Retrieve top N entities based on their PageRank values, using the extracted entity types.
+3. Find similar entities and their N-hop relationships from the graph using the embeddings of the extracted query entities.
+4. Retrieve similar relationships from the graph using the query embedding.
+5. Rank these retrieved entities and relationships by multiplying each one's PageRank value with its similarity score to the query, returning the top N as the final retrieval.
+6. Use the entities in the final retrieval to retrieve the top 1 community report. Retrieve the report for the community involving the most entities in the final retrieval.  
+   *The retrieved entity descriptions, relationship descriptions, and the top 1 community report are sent to the LLM for content generation.*
+
+:::danger IMPORTANT
+Using a knowledge graph in a retrieval test will significantly increase the time to receive a response.
+:::
+
+### Test text
+
+This field is where you put in your testing query.
+
+## Procedure
+
+1. Navigate to the **Retrieval testing** page of your knowledge base, enter your query in **Test text**, and click **Testing** to run the test.
+2. If the results are unsatisfactory, keep tuning the options listed in the Configuration section.
+
+   *The following is a screenshot of a retrieval test conducted without using knowledge graph. It demonstrates a hybrid search combining weighted keyword similarity and weighted vector cosine similarity. The overall similarity score 28.56, calculated as 25.17 x 0.7 + 36.49 x 0.3:*  
+   ![Image](https://github.com/user-attachments/assets/541554d4-3f3e-44e1-954b-0ae77d7372c6)
+
+   *The following is a screenshot of a retrieval test conducted using a knowledge graph. It shows that only vector similarity is used:*  
+   ![Image](https://github.com/user-attachments/assets/30a03091-0f7b-4058-901a-f4dc5ca5aa6b)
+
+## Frequently asked questions
+
+### Is an LLM used when I the Use Knowledge Graph switch is enabled?
+
+Yes, your LLM will be involved to analyze your query and extract the related entities and relationship from the knowledge graph. This also explains why additional tokens and time will be consumed.
--- a/docs/references/agent_component_reference/retrieval.mdx
+++ b/docs/references/agent_component_reference/retrieval.mdx
@ -30,7 +30,7 @@ RAGFlow employs a combination of weighted keyword similarity and weighted vector

 Defaults to 0.2.

-### Keywords similarity weight
+### Keyword similarity weight

 This parameter sets the weight of keyword similarity in the combined similarity score. The total of the two weights must equal 1.0. Its default value is 0.7, which means the weight of vector similarity in the combined search is 1 - 0.7 = 0.3.

--- a/docs/references/http_api_reference.md
+++ b/docs/references/http_api_reference.md
@ -2178,11 +2178,11 @@ Creates a session with an agent.
 - Body:
  - the required parameters:`str`
  - other parameters:
-    The parameters set in the **Begin** component.
+    The parameters specified in the **Begin** component.

 ##### Request example

-If the **Begin** component in your agent does not have required parameters:
+If the **Begin** component in your agent does not take required parameters:

 ```bash
 curl --request POST \
@ -2193,7 +2193,7 @@ curl --request POST \
     }'
 ```

-If the **Begin** component in your agent has required parameters:
+If the **Begin** component in your agent takes required parameters:

 ```bash
 curl --request POST \
@ -2206,7 +2206,7 @@ curl --request POST \
     }'
 ```

-If the **Begin** component in your agent has required file parameters:
+If the **Begin** component in your agent takes required file parameters:

 ```bash
 curl --request POST \
@ -2220,7 +2220,7 @@ curl --request POST \

 - `agent_id`: (*Path parameter*)  
  The ID of the associated agent.
- `user_id`: (*Filter parameter*), string
+- `user_id`: (*Filter parameter*)
  The optional user-defined ID for parsing docs (especially images) when creating a session while uploading files.

 #### Response
@ -2373,7 +2373,7 @@ Asks a specified agent a question to start an AI-powered conversation.
  - `"user_id"`: `string`(optional)
  - other parameters: `string`
 ##### Request example
-Ifthe **Begin** component doesn't have parameters, the following code will create a session.
+If the **Begin** component does not take parameters, the following code will create a session.
 ```bash 
 curl --request POST \
     --url http://{address}/api/v1/agents/{agent_id}/completions \
@ -2383,7 +2383,7 @@ curl --request POST \
     {
     }'
 ```
-Ifthe **Begin** component have parameters, the following code will create a session.
+If the **Begin** component takes parameters, the following code will create a session.
 ```bash
 curl --request POST \
     --url http://{address}/api/v1/agents/{agent_id}/completions \
@ -2427,7 +2427,7 @@ curl --request POST \
  Parameters specified in the **Begin** component.

 #### Response
-success without `session_id` provided and with no parameters inthe **Begin** component:
+success without `session_id` provided and with no parameters specified in the **Begin** component:
 ```json
 data:{
    "code": 0,
@ -2445,7 +2445,8 @@ data:{
    "data": true
 }
 ```
-Success without `session_id` provided and with parameters inthe **Begin** component:
+
+Success without `session_id` provided and with parameters specified in the **Begin** component:

 ```json
 data:{
@ -2481,7 +2482,7 @@ data:{
 }
 data:
 ```
-Success with parameters inthe **Begin** component:
+Success with parameters specified in the **Begin** component:
 ```json
 data:{
    "code": 0,
@ -2560,7 +2561,6 @@ data:{
 }
 ```

-
 Failure:

 ```json
--- a/docs/release_notes.md
+++ b/docs/release_notes.md
@ -14,10 +14,10 @@ Released on February 6, 2025.
 ### New features

 - Supports DeepSeek R1 and DeepSeek V3.
- GraphRAG refactor: Knowledge graph is dynamically built on an entire knowledge base (dataset) rather than on an individual file, and automatically updated when files are added or removed. See [here](https://ragflow.io/docs/dev/construct_knowledge_graph).
- Adds an **Iteration** agent component and a **Research report generator** agent template.
+- GraphRAG refactor: Knowledge graph is dynamically built on an entire knowledge base (dataset) rather than on an individual file, and automatically updated when a newly uploaded file starts parsing. See [here](https://ragflow.io/docs/dev/construct_knowledge_graph).
+- Adds an **Iteration** agent component and a **Research report generator** agent template. See [here](https://ragflow.io/docs/dev/iteration_component).
 - New UI language: Portuguese.
- Allows setting metadata for a specific file in a knowledge base to support AI-powered chats.
+- Allows setting metadata for a specific file in a knowledge base to enhance AI-powered chats. See [here](https://ragflow.io/docs/dev/set_metada).
 - Upgrades RAGFlow's document engine [Infinity](https://github.com/infiniflow/infinity) to v0.6.0.dev3.
 - Supports GPU acceleration for DeepDoc (see [docker-compose-gpu.yml](https://github.com/infiniflow/ragflow/blob/main/docker/docker-compose-gpu.yml)).
 - Supports creating and referencing a **Tag** knowledge base as a key milestone towards bridging the semantic gap between query and response.
@ -30,6 +30,8 @@ The **Tag knowledge base** feature is *unavailable* on the [Infinity](https://gi

 #### Added documents

+- [Construct knowledge graph](https://ragflow.io/docs/dev/construct_knowledge_graph)
+- [Set metadata](https://ragflow.io/docs/dev/set_metada)
 - [Begin component](https://ragflow.io/docs/dev/begin_component)
 - [Generate component](https://ragflow.io/docs/dev/generate_component)
 - [Interact component](https://ragflow.io/docs/dev/interact_component)
@ -44,6 +46,7 @@ The **Tag knowledge base** feature is *unavailable* on the [Infinity](https://gi
 - [Iteration component](https://ragflow.io/docs/dev/iteration_component)
 - [Note component](https://ragflow.io/docs/dev/note_component)

+
 ## v0.15.1

 Released on December 25, 2024.
--- a/web/src/locales/en.ts
+++ b/web/src/locales/en.ts
@ -112,7 +112,7 @@ export default {
      similarityThreshold: 'Similarity threshold',
      similarityThresholdTip:
        'RAGFlow employs either a combination of weighted keyword similarity and weighted vector cosine similarity, or a combination of weighted keyword similarity and weighted reranking score during retrieval. This parameter sets the threshold for similarities between the user query and chunks. Any chunk with a similarity score below this threshold will be excluded from the results.',
-      vectorSimilarityWeight: 'Keywords similarity weight',
+      vectorSimilarityWeight: 'Keyword similarity weight',
      vectorSimilarityWeightTip:
        'This sets the weight of keyword similarity in the combined similarity score, either used with vector cosine similarity or with reranking score. The total of the two weights must equal 1.0.',
      testText: 'Test text',