ragflow

AI/ragflow

mirror of https://git.mirrors.martin98.com/https://github.com/infiniflow/ragflow.git synced 2025-06-04 11:24:00 +08:00

Author	SHA1	Message	Date
alkscr	4ae8f87754	Fix: missing graph resolution and community extraction in graphrag tasks (#7586 ) ### What problem does this PR solve? Info of whether applying graph resolution and community extraction is storage in `task["kb_parser_config"]`. However, previous code get `graphrag_conf` from `task["parser_config"]`, making `with_resolution` and `with_community` are always false. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-05-13 09:21:03 +08:00
hfrt456	332e6ffbd4	Fix:local_es_tag (#7534 ) Two Case when local Es tag search has result which is filtered by score 1: Doc has empty tag,and not visi LLM 2: Code may use empty examples in Prompt for LLM search tag Co-authored-by: huangfuqunze <huangfuqunze.hfqz@alibaba-inc.com>	2025-05-09 10:17:24 +08:00
liuzhenghua	2f768b96e8	perf: optimze figure parser (#7392 ) ### What problem does this PR solve? When parsing documents containing images, the current code uses a single-threaded approach to call the VL model, resulting in extremely slow parsing speed (e.g., parsing a Word document with dozens of images takes over 20 minutes). By switching to a multithreaded approach to call the VL model, the parsing speed can be improved to an acceptable level. ### Type of change - [x] Performance Improvement --------- Co-authored-by: liuzhenghua-jk <liuzhenghua-jk@360shuke.com>	2025-05-06 14:39:45 +08:00
Stephen Hu	c88e4b3fc0	Fix: improve recover_pending_tasks timeout (#7408 ) ### What problem does this PR solve? Fix the redis lock will always timeout (change the logic order release lock first) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-29 16:50:39 +08:00
benni82	216cd7474b	fix: task_executor bug fix (#7253 ) ### What problem does this PR solve? The lock is not released correctly when task_exectuor is abnormal ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-04-24 11:44:34 +08:00
liuzhenghua	d4dbdfb61d	feat: Recover pending tasks while pod restart. (#7073 ) ### What problem does this PR solve? If you deploy Ragflow using Kubernetes, the hostname will change during a rolling update. This causes the consumer name of the task executor to change, making it impossible to schedule tasks that were previously in a pending state. To address this, I introduced a recovery task that scans these pending messages and re-publishes them, allowing the tasks to continue being processed. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): --------- Co-authored-by: liuzhenghua-jk <liuzhenghua-jk@360shuke.com>	2025-04-19 16:18:51 +08:00
aniaan	8b8a2f2949	fix(nursery): Fix Closure Trap Issues in Trio Concurrent Tasks (#7106 ) ## Problem Description Multiple files in the RAGFlow project contain closure trap issues when using lambda functions with `trio.open_nursery()`. This problem causes concurrent tasks created in loops to reference the same variable, resulting in all tasks processing the same data (the data from the last iteration) rather than each task processing its corresponding data from the loop. ## Issue Details When using a `lambda` to create a closure function and passing it to `nursery.start_soon()` within a loop, the lambda function captures a reference to the loop variable rather than its value. For example: ```python # Problematic code async with trio.open_nursery() as nursery: for d in docs: nursery.start_soon(lambda: doc_keyword_extraction(chat_mdl, d, topn)) ``` In this pattern, when concurrent tasks begin execution, `d` has already become the value after the loop ends (typically the last element), causing all tasks to use the same data. ## Fix Solution Changed the way concurrent tasks are created with `nursery.start_soon()` by leveraging Trio's API design to directly pass the function and its arguments separately: ```python # Fixed code async with trio.open_nursery() as nursery: for d in docs: nursery.start_soon(doc_keyword_extraction, chat_mdl, d, topn) ``` This way, each task uses the parameter values at the time of the function call, rather than references captured through closures. ## Fixed Files Fixed closure traps in the following files: 1. `rag/svr/task_executor.py`: 3 fixes, involving document keyword extraction, question generation, and tag processing 2. `rag/raptor.py`: 1 fix, involving document summarization 3. `graphrag/utils.py`: 2 fixes, involving graph node and edge processing 4. `graphrag/entity_resolution.py`: 2 fixes, involving entity resolution and graph node merging 5. `graphrag/general/mind_map_extractor.py`: 2 fixes, involving document processing 6. `graphrag/general/extractor.py`: 3 fixes, involving content processing and graph node/edge merging 7. `graphrag/general/community_reports_extractor.py`: 1 fix, involving community report extraction ## Potential Impact This fix resolves a serious concurrency issue that could have caused: - Data processing errors (processing duplicate data) - Performance degradation (all tasks working on the same data) - Inconsistent results (some data not being processed) After the fix, all concurrent tasks should correctly process their respective data, improving system correctness and reliability.	2025-04-18 18:00:20 +08:00
Zhichang Yu	6bf26e2a81	Optimize graphrag again (#6513 ) ### What problem does this PR solve? Removed set_entity and set_relation to avoid accessing doc engine during graph computation. Introduced GraphChange to avoid writing unchanged chunks. ### Type of change - [x] Performance Improvement	2025-03-26 15:34:42 +08:00
Kevin Hu	d83911b632	Fix: huggingface rerank model issue. (#6385 ) ### What problem does this PR solve? #6348 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-21 12:43:32 +08:00
Zhichang Yu	bb869aca33	Fix get_unacked_iterator (#6280 ) ### What problem does this PR solve? Fix get_unacked_iterator. Close #6132 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-19 17:46:58 +08:00
zhou	9cad60fa6d	Fix: Add a basic example when the example of content_tagging is empty (#6276 ) ### What problem does this PR solve? When using LLM for auto-tag, if there are no examples, the tag format generated by LLM may be wrong. This will cause Elasticsearch insert errors. Adding basic examples can avoid this problem. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-19 17:30:47 +08:00
Kevin Hu	3a99c2b5f4	Refa: PARALLEL_DEVICES is a static parameter. (#6168 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2025-03-17 16:49:54 +08:00
Debug Doctor	3e19044dee	Feat: add OCR's muti-gpus and parallel processing support (#5972 ) ### What problem does this PR solve? Add OCR's muti-gpus and parallel processing support ### Type of change - [x] New Feature (non-breaking change which adds functionality) @yuzhichang I've tried to resolve the comments in #5697. OCR jobs can now be done on both CPU and GPU. ( By the way, I've encountered a “Generate embedding error” issue #5954 that might be due to my outdated GPUs? idk. ) Please review it and give me suggestions. GPU: ![gpu_ocr](https://github.com/user-attachments/assets/0ee2ecfb-a665-4e50-8bc7-15941b9cd80e) ![smi](https://github.com/user-attachments/assets/a2312f8c-cf24-443d-bf89-bec50503546d) CPU: ![cpu_ocr](https://github.com/user-attachments/assets/1ba6bb0b-94df-41ea-be79-790096da4bf1)	2025-03-17 11:58:40 +08:00
Zhichang Yu	89a69eed72	Introduced task priority (#6118 ) ### What problem does this PR solve? Introduced task priority ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-03-14 23:43:46 +08:00
Zhichang Yu	5d75b6be62	Fix executor name (#6080 ) ### What problem does this PR solve? Fix executor name ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-14 14:13:47 +08:00
Zhichang Yu	e213873852	Optimize graphrag cache get entity (#6018 ) ### What problem does this PR solve? Optimize graphrag cache get entity ### Type of change - [x] Performance Improvement	2025-03-13 14:37:59 +08:00
donblack01	1c663b32b9	Fix:signal.SIGUSR1 and signal.SIGUSR2 can't use in window. so don't bind signal.SIGUSR1 and signal.SIGUSR2 in the windows env (#5941 ) ### What problem does this PR solve? Fix:signal.SIGUSR1 and signal.SIGUSR2 can't use in window. so don't bind signal.SIGUSR1 and signal.SIGUSR2 in the windows env ### Type of change - [✓ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): Co-authored-by: tangyu <1@1.com>	2025-03-12 09:43:18 +08:00
Zhichang Yu	6ec6ca6971	Refactor graphrag to remove redis lock (#5828 ) ### What problem does this PR solve? Refactor graphrag to remove redis lock ### Type of change - [x] Refactoring	2025-03-10 15:15:06 +08:00
Zhichang Yu	f65c3ae62b	Refactored DocumentService.update_progress (#5642 ) ### What problem does this PR solve? Refactored DocumentService.update_progress ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-05 14:48:03 +08:00
Zhichang Yu	4d6484b03e	Fix nursery.start_soon. Close #5575 (#5591 ) ### What problem does this PR solve? Fix nursery.start_soon. Close #5575 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-04 14:46:54 +08:00
Zhichang Yu	c813c1ff4c	Made task_executor async to speedup parsing (#5530 ) ### What problem does this PR solve? Made task_executor async to speedup parsing ### Type of change - [x] Performance Improvement	2025-03-03 18:59:49 +08:00
yihong	8a2542157f	Fix: possible memory leaks close #5277 (#5500 ) ### What problem does this PR solve? close #5277 by make sure the file close ### Type of change - [x] Performance Improvement --------- Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-03-03 10:26:45 +08:00
Kevin Hu	21943ce0e2	Refine error message while embedding model error, (#5490 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2025-02-28 17:52:38 +08:00
yihong	622b72db4b	Fix: add ctrl+c signal for better exit (#5469 ) ### What problem does this PR solve? This patch add signal for ctrl + c that can exit the code friendly cause code base use thread daemon can not exit friendly for being started. how to reproduce 1. docker-compose -f docker/docker-compose-base.yml up 2. other window `bash docker/launch_backend_service.sh` 3. stop 1 first 4. try to stop 2 then two thread can not exit which must use `kill pid` This patch fix it and should fix most the related issues in the `issues` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-02-28 14:52:40 +08:00
Kevin Hu	4f40f685d9	Code refactor (#5371 ) ### What problem does this PR solve? #5173 ### Type of change - [x] Refactoring	2025-02-26 15:40:52 +08:00
Zhichang Yu	ffb4cda475	Run keyword_extraction, question_proposal, content_tagging in thread pool (#5376 ) ### What problem does this PR solve? Run keyword_extraction, question_proposal, content_tagging in threads ### Type of change - [x] Performance Improvement	2025-02-26 15:21:14 +08:00
Zhichang Yu	db42d0e0ae	Optimize ocr (#5297 ) ### What problem does this PR solve? Introduced OCR.recognize_batch ### Type of change - [x] Performance Improvement	2025-02-24 16:21:55 +08:00
Kevin Hu	86892959a0	Rebuild graph when it's out of time. (#4607 ) ### What problem does this PR solve? #4543 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring	2025-01-23 17:26:20 +08:00
Kevin Hu	dd0ebbea35	Light GraphRAG (#4585 ) ### What problem does this PR solve? #4543 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-01-22 19:43:14 +08:00
Kevin Hu	c852a6dfbf	Accelerate titles' embeddings. (#4492 ) ### What problem does this PR solve? ### Type of change - [x] Performance Improvement	2025-01-15 15:20:29 +08:00
Jin Hai	4dde73f897	Error message: Infinity not support table parsing method (#4439 ) ### What problem does this PR solve? Specific error message. ### Type of change - [x] Refactoring Signed-off-by: jinhai <haijin.chn@gmail.com>	2025-01-10 16:39:13 +08:00
Kevin Hu	c5da3cdd97	Tagging (#4426 ) ### What problem does this PR solve? #4367 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-01-09 17:07:21 +08:00
Kevin Hu	0e5124ec99	Show the errors out. (#4305 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2024-12-31 15:32:02 +08:00
Kevin Hu	4ba4f622a5	Refactor (#4303 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2024-12-31 14:31:31 +08:00
Jin Hai	d030b4a680	Update progress time info (#4193 ) ### What problem does this PR solve? Ignore the millisecond and microsecond value. ### Type of change - [x] Refactoring Signed-off-by: jinhai <haijin.chn@gmail.com>	2024-12-23 21:04:44 +08:00
Zhichang Yu	8d73cf6f02	Added time to progress message (#4185 ) ### What problem does this PR solve? Added time to progress message ### Type of change - [x] Refactoring	2024-12-23 17:25:55 +08:00
Kevin Hu	cb6e9ce164	Cache the result from llm for graphrag and raptor (#4051 ) ### What problem does this PR solve? #4045 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-12-17 09:48:03 +08:00
Zhichang Yu	9b2ef62aee	Fix xinfo_groups returns unexpected result (#4026 ) ### What problem does this PR solve? Fix xinfo_groups returns unexpected result. Close #3545 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-13 17:31:15 +08:00
Zhichang Yu	86507af770	Set task progress on exception (#4025 ) ### What problem does this PR solve? Set task progress on exception ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-13 17:15:08 +08:00
Zhichang Yu	c8b1a564aa	Replaced md5 with xxhash64 for chunk id (#4009 ) ### What problem does this PR solve? Replaced md5 with xxhash64 for chunk id ### Type of change - [x] Refactoring	2024-12-12 17:47:39 +08:00
Zhichang Yu	301f95837c	Try to reuse existing chunks (#3983 ) ### What problem does this PR solve? Try to reuse existing chunks. Close #3793 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-12-12 16:38:03 +08:00
Kevin Hu	6d19294ddc	Support debug components. (#3994 ) ### What problem does this PR solve? #3993 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-11 19:23:59 +08:00
Zhichang Yu	03f00c9e6f	Rename page_num_list, top_list, position_list (#3940 ) ### What problem does this PR solve? Rename page_num_list, top_list, position_list to page_num_int, top_int, position_int ### Type of change - [x] Refactoring	2024-12-10 16:32:58 +08:00
so95	d5a322a352	Theme switch support (#3568 ) ### What problem does this PR solve? - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com> Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2024-12-10 11:42:04 +08:00
Zhichang Yu	7a6bf4326e	Fixed log not displaying (#3946 ) ### What problem does this PR solve? Fixed log not displaying ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-10 09:36:59 +08:00
Kevin Hu	215e9361ea	Fix field missing issue. (#3931 ) ### What problem does this PR solve? #3905 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-09 13:20:58 +08:00
Zhichang Yu	0d68a6cd1b	Fix errors detected by Ruff (#3918 ) ### What problem does this PR solve? Fix errors detected by Ruff ### Type of change - [x] Refactoring	2024-12-08 14:21:12 +08:00
Kevin Hu	56f473b680	Feat: Add question parameter to edit chunk modal (#3875 ) ### What problem does this PR solve? Close #3873 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-12-05 14:51:19 +08:00
Kevin Hu	74b28ef1b0	Add pagerank to KB. (#3809 ) ### What problem does this PR solve? #3794 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-12-03 14:30:35 +08:00
Zhichang Yu	c5f13629af	Set Log level by env (#3798 ) ### What problem does this PR solve? Set Log level by env ### Type of change - [x] Refactoring	2024-12-02 17:24:39 +08:00

1 2 3

128 Commits