ragflow

AI/ragflow

mirror of https://git.mirrors.martin98.com/https://github.com/infiniflow/ragflow.git synced 2025-06-04 11:24:00 +08:00

Author	SHA1	Message	Date
aniaan	8b8a2f2949	fix(nursery): Fix Closure Trap Issues in Trio Concurrent Tasks (#7106 ) ## Problem Description Multiple files in the RAGFlow project contain closure trap issues when using lambda functions with `trio.open_nursery()`. This problem causes concurrent tasks created in loops to reference the same variable, resulting in all tasks processing the same data (the data from the last iteration) rather than each task processing its corresponding data from the loop. ## Issue Details When using a `lambda` to create a closure function and passing it to `nursery.start_soon()` within a loop, the lambda function captures a reference to the loop variable rather than its value. For example: ```python # Problematic code async with trio.open_nursery() as nursery: for d in docs: nursery.start_soon(lambda: doc_keyword_extraction(chat_mdl, d, topn)) ``` In this pattern, when concurrent tasks begin execution, `d` has already become the value after the loop ends (typically the last element), causing all tasks to use the same data. ## Fix Solution Changed the way concurrent tasks are created with `nursery.start_soon()` by leveraging Trio's API design to directly pass the function and its arguments separately: ```python # Fixed code async with trio.open_nursery() as nursery: for d in docs: nursery.start_soon(doc_keyword_extraction, chat_mdl, d, topn) ``` This way, each task uses the parameter values at the time of the function call, rather than references captured through closures. ## Fixed Files Fixed closure traps in the following files: 1. `rag/svr/task_executor.py`: 3 fixes, involving document keyword extraction, question generation, and tag processing 2. `rag/raptor.py`: 1 fix, involving document summarization 3. `graphrag/utils.py`: 2 fixes, involving graph node and edge processing 4. `graphrag/entity_resolution.py`: 2 fixes, involving entity resolution and graph node merging 5. `graphrag/general/mind_map_extractor.py`: 2 fixes, involving document processing 6. `graphrag/general/extractor.py`: 3 fixes, involving content processing and graph node/edge merging 7. `graphrag/general/community_reports_extractor.py`: 1 fix, involving community report extraction ## Potential Impact This fix resolves a serious concurrency issue that could have caused: - Data processing errors (processing duplicate data) - Performance degradation (all tasks working on the same data) - Inconsistent results (some data not being processed) After the fix, all concurrent tasks should correctly process their respective data, improving system correctness and reliability.	2025-04-18 18:00:20 +08:00
Yue-Lyu123	67330833af	fix: correct [AttributeError: 'set' object has no attribute 'nodes' T… (#6699 ) ### Related Issue: https://github.com/infiniflow/ragflow/issues/6653 ### Environment: Using nightly version [ece5903] Elasticsearch database Thanks for the review! My fault! I realize my initial testing wasn't passed. In graphrag/entity_resolution.py `sub_connect_graph` is a set like` {'HELLO', 'Hi', 'How are you'}`, Neither accessing `.nodes` nor `.nodes()` will work, it still causes `AttributeError: 'set' object has no attribute 'nodes'` In graphrag/general/extractor.py The `list.extend() `method performs an in-place operation, directly modifying the original list and returning ‘None’ rather than the modified list. Neither accessing `sorted(set(node0_attrs[attr].extend(node1_attrs.get(attr, []))))` nor `sorted(set(node0_attrs[attr].extend(node1_attrs[attr])))` will work, it still causes `TypeError: 'NoneType' object is not iterable` ### Type of change - [ ] Bug Fix AttributeError: graphrag/entity_resolution.py - [ ] Bug Fix TypeError: graphrag/general/extractor.py	2025-04-01 09:38:21 +08:00
Yue-Lyu123	ece59034f7	fix: Resolve KnowledgeGraph entity resolution errors (#6653 ) (#6691 ) ### Related Issue: #6653 ### Environment: Using nightly version Elasticsearch database ### Bug Description: When clicking the "Entity Resolution" button in KnowledgeGraph, encountered the following errors: graphrag/entity_resolution.py ``` list(sub_connect_graph.nodes) AttributeError ``` graphrag/general/extractor.py ``` node0_attrs[attr] = sorted(set(node0_attrs[attr].extend(node1_attrs[attr]))) TypeError: 'NoneType' object is not iterable ``` ``` for attr in ["keywords", "source_id"]: KeyError I think attribute "keywords" is in edges not nodes ``` graphrag/utils.py ``` settings.docStoreConn.delete() # Sync function called as async ``` ### Changes Made: Fixed AttributeError in entity_resolution.py by properly handling graph nodes Fixed TypeError and KeyError in extractor.py by separate operations Corrected async/sync mismatch in document deletion call	2025-03-31 22:31:35 +08:00
Zhichang Yu	36b62e0fab	EntityResolution batch. Close #6570 (#6602 ) ### What problem does this PR solve? EntityResolution batch ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-27 16:40:36 +08:00
Zhichang Yu	6bf26e2a81	Optimize graphrag again (#6513 ) ### What problem does this PR solve? Removed set_entity and set_relation to avoid accessing doc engine during graph computation. Introduced GraphChange to avoid writing unchanged chunks. ### Type of change - [x] Performance Improvement	2025-03-26 15:34:42 +08:00
Kevin Hu	9ed004e90d	Refa: control the simi for entity resolution. (#6386 ) ### What problem does this PR solve? #6352 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-21 13:16:34 +08:00
Zhichang Yu	939e668096	Optimized graphrag again (#5927 ) ### What problem does this PR solve? Optimized graphrag again ### Type of change - [x] Performance Improvement	2025-03-11 18:36:10 +08:00
Zhichang Yu	f65c3ae62b	Refactored DocumentService.update_progress (#5642 ) ### What problem does this PR solve? Refactored DocumentService.update_progress ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-05 14:48:03 +08:00
yihong	148a7e7002	fix: issue #5600 (#5620 ) ### What problem does this PR solve? close issue #5600 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-03-05 11:10:04 +08:00
Zhichang Yu	4d6484b03e	Fix nursery.start_soon. Close #5575 (#5591 ) ### What problem does this PR solve? Fix nursery.start_soon. Close #5575 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-04 14:46:54 +08:00
Zhichang Yu	c813c1ff4c	Made task_executor async to speedup parsing (#5530 ) ### What problem does this PR solve? Made task_executor async to speedup parsing ### Type of change - [x] Performance Improvement	2025-03-03 18:59:49 +08:00
Kevin Hu	dd0ebbea35	Light GraphRAG (#4585 ) ### What problem does this PR solve? #4543 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-01-22 19:43:14 +08:00
Kevin Hu	cb6e9ce164	Cache the result from llm for graphrag and raptor (#4051 ) ### What problem does this PR solve? #4045 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-12-17 09:48:03 +08:00
Zhichang Yu	30f6421760	Use consistent log file names, introduced initLogger (#3403 ) ### What problem does this PR solve? Use consistent log file names, introduced initLogger ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [x] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2024-11-14 17:13:48 +08:00
lidp	d6db805885	Refactoring entity_resolution (#2692 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2024-09-30 17:18:09 +08:00
yqkcn	ae5a877ed4	Simplify the usage of dict (#2681 ) ### What problem does this PR solve? Simplify the usage of dictionaries ### Type of change - [x] Refactoring --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2024-09-30 16:54:25 +08:00
H	c6e723f2ee	Fix graphrag : "role" user (#2273 ) ### What problem does this PR solve? #2270 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-09-06 10:04:01 +08:00
Jin Hai	99af1cbeac	Update license (#2086 ) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2024-08-25 18:58:20 +08:00
Kevin Hu	152072f900	Add graphrag (#1793 ) ### What problem does this PR solve? #1594 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-08-02 18:51:14 +08:00

19 Commits