ragflow

AI/ragflow

mirror of https://git.mirrors.martin98.com/https://github.com/infiniflow/ragflow.git synced 2025-06-04 11:24:00 +08:00

Author	SHA1	Message	Date
alkscr	ab27609a64	Fix: whole knowledge graph lost after removing any document in the knowledge base (#7151 ) ### What problem does this PR solve? When you removed any document in a knowledge base using knowledge graph, the graph's `removed_kwd` is set to "Y". However, in the function `graphrag.utils.get_gaph`, `rebuild_graph` method is passed and directly return `None` while `removed_kwd=Y`, making residual part of the graph abandoned (but old entity data still exist in db). Besides, infinity instance actually pass deleting graph components' `source_id` when removing document. It may cause wrong graph after rebuild. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-30 09:43:17 +08:00
aniaan	8b8a2f2949	fix(nursery): Fix Closure Trap Issues in Trio Concurrent Tasks (#7106 ) ## Problem Description Multiple files in the RAGFlow project contain closure trap issues when using lambda functions with `trio.open_nursery()`. This problem causes concurrent tasks created in loops to reference the same variable, resulting in all tasks processing the same data (the data from the last iteration) rather than each task processing its corresponding data from the loop. ## Issue Details When using a `lambda` to create a closure function and passing it to `nursery.start_soon()` within a loop, the lambda function captures a reference to the loop variable rather than its value. For example: ```python # Problematic code async with trio.open_nursery() as nursery: for d in docs: nursery.start_soon(lambda: doc_keyword_extraction(chat_mdl, d, topn)) ``` In this pattern, when concurrent tasks begin execution, `d` has already become the value after the loop ends (typically the last element), causing all tasks to use the same data. ## Fix Solution Changed the way concurrent tasks are created with `nursery.start_soon()` by leveraging Trio's API design to directly pass the function and its arguments separately: ```python # Fixed code async with trio.open_nursery() as nursery: for d in docs: nursery.start_soon(doc_keyword_extraction, chat_mdl, d, topn) ``` This way, each task uses the parameter values at the time of the function call, rather than references captured through closures. ## Fixed Files Fixed closure traps in the following files: 1. `rag/svr/task_executor.py`: 3 fixes, involving document keyword extraction, question generation, and tag processing 2. `rag/raptor.py`: 1 fix, involving document summarization 3. `graphrag/utils.py`: 2 fixes, involving graph node and edge processing 4. `graphrag/entity_resolution.py`: 2 fixes, involving entity resolution and graph node merging 5. `graphrag/general/mind_map_extractor.py`: 2 fixes, involving document processing 6. `graphrag/general/extractor.py`: 3 fixes, involving content processing and graph node/edge merging 7. `graphrag/general/community_reports_extractor.py`: 1 fix, involving community report extraction ## Potential Impact This fix resolves a serious concurrency issue that could have caused: - Data processing errors (processing duplicate data) - Performance degradation (all tasks working on the same data) - Inconsistent results (some data not being processed) After the fix, all concurrent tasks should correctly process their respective data, improving system correctness and reliability.	2025-04-18 18:00:20 +08:00
BUJIQI	627fd002ae	Update utils.py (#7091 ) ### What problem does this PR solve? when there are multiple entities, the variable `v` may be a list, which will lead to this error: ``` \| File "/mnt/d/wrf/ragflow/ragflow/graphrag/utils.py", line 59, in replace_all \| result = result.replace(f"{{{k}}}", v) \| TypeError: replace() argument 2 must be str, not list ``` this pr assign this `v` to be a str ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-04-17 17:17:09 +08:00
Zhichang Yu	fdc410e743	Fix set_graph on non-existing edge (#6777 ) ### What problem does this PR solve? Fix set_graph on non-existing edge ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-03 11:09:04 +08:00
Yue-Lyu123	ece59034f7	fix: Resolve KnowledgeGraph entity resolution errors (#6653 ) (#6691 ) ### Related Issue: #6653 ### Environment: Using nightly version Elasticsearch database ### Bug Description: When clicking the "Entity Resolution" button in KnowledgeGraph, encountered the following errors: graphrag/entity_resolution.py ``` list(sub_connect_graph.nodes) AttributeError ``` graphrag/general/extractor.py ``` node0_attrs[attr] = sorted(set(node0_attrs[attr].extend(node1_attrs[attr]))) TypeError: 'NoneType' object is not iterable ``` ``` for attr in ["keywords", "source_id"]: KeyError I think attribute "keywords" is in edges not nodes ``` graphrag/utils.py ``` settings.docStoreConn.delete() # Sync function called as async ``` ### Changes Made: Fixed AttributeError in entity_resolution.py by properly handling graph nodes Fixed TypeError and KeyError in extractor.py by separate operations Corrected async/sync mismatch in document deletion call	2025-03-31 22:31:35 +08:00
Zhichang Yu	fe0396bbb9	Introduced delete_knowledge_graph (#6605 ) ### What problem does this PR solve? Introduced delete_knowledge_graph ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] Documentation Update	2025-03-27 17:16:48 +08:00
Zhichang Yu	6bf26e2a81	Optimize graphrag again (#6513 ) ### What problem does this PR solve? Removed set_entity and set_relation to avoid accessing doc engine during graph computation. Introduced GraphChange to avoid writing unchanged chunks. ### Type of change - [x] Performance Improvement	2025-03-26 15:34:42 +08:00
Zhichang Yu	e213873852	Optimize graphrag cache get entity (#6018 ) ### What problem does this PR solve? Optimize graphrag cache get entity ### Type of change - [x] Performance Improvement	2025-03-13 14:37:59 +08:00
Zhichang Yu	939e668096	Optimized graphrag again (#5927 ) ### What problem does this PR solve? Optimized graphrag again ### Type of change - [x] Performance Improvement	2025-03-11 18:36:10 +08:00
Zhichang Yu	6ec6ca6971	Refactor graphrag to remove redis lock (#5828 ) ### What problem does this PR solve? Refactor graphrag to remove redis lock ### Type of change - [x] Refactoring	2025-03-10 15:15:06 +08:00
Kevin Hu	1919780880	Refa: reduce default value of MAX_CONCURRENT_CHATS (#5821 ) ### What problem does this PR solve? #5786 ### Type of change - [x] Refactoring	2025-03-10 11:22:06 +08:00
Zhichang Yu	c813c1ff4c	Made task_executor async to speedup parsing (#5530 ) ### What problem does this PR solve? Made task_executor async to speedup parsing ### Type of change - [x] Performance Improvement	2025-03-03 18:59:49 +08:00
Kevin Hu	39b96849a9	Fix window size issue of ES. (#5175 ) ### What problem does this PR solve? #5152 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-20 12:54:29 +08:00
Kevin Hu	7c90b87715	Fix window size of ES issue. (#5026 ) ### What problem does this PR solve? #5015 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-17 12:48:56 +08:00
Kevin Hu	f29da49893	Fix keyerror issue while rebuilding graph. (#5022 ) ### What problem does this PR solve? #4995 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-17 12:02:44 +08:00
Kevin Hu	6f30397bb5	Infinity adapt to graphrag. (#4663 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-27 18:35:18 +08:00
Kevin Hu	71c132f76d	Make infinity adapt (#4635 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-24 17:45:04 +08:00
Kevin Hu	86892959a0	Rebuild graph when it's out of time. (#4607 ) ### What problem does this PR solve? #4543 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring	2025-01-23 17:26:20 +08:00
Kevin Hu	dd0ebbea35	Light GraphRAG (#4585 ) ### What problem does this PR solve? #4543 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-01-22 19:43:14 +08:00
Kevin Hu	c5da3cdd97	Tagging (#4426 ) ### What problem does this PR solve? #4367 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-01-09 17:07:21 +08:00
Kevin Hu	8fb18f37f6	Code refactor. (#4291 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2024-12-30 18:38:51 +08:00
Kevin Hu	6379a934ff	Fix redis get error. (#4140 ) ### What problem does this PR solve? #4126 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-20 10:39:50 +08:00
Kevin Hu	fddac1345d	Fix raptor resuable issue. (#4063 ) ### What problem does this PR solve? #4045 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-12-17 15:28:35 +08:00
Kevin Hu	cb6e9ce164	Cache the result from llm for graphrag and raptor (#4051 ) ### What problem does this PR solve? #4045 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-12-17 09:48:03 +08:00
Zhichang Yu	4413683898	Introduced beartype (#3460 ) ### What problem does this PR solve? Introduced [beartype](https://github.com/beartype/beartype) for runtime type-checking. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-11-18 17:38:17 +08:00
yqkcn	604061c4a5	Fix mutable default argument (#2635 ) ### What problem does this PR solve? The default value of Python function parameters cannot be mutable. Modifying this parameter inside the function will permanently change the default value ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-09-29 10:16:00 +08:00
Jin Hai	99af1cbeac	Update license (#2086 ) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2024-08-25 18:58:20 +08:00
Kevin Hu	152072f900	Add graphrag (#1793 ) ### What problem does this PR solve? #1594 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-08-02 18:51:14 +08:00

28 Commits