ragflow

AI/ragflow

mirror of https://git.mirrors.martin98.com/https://github.com/infiniflow/ragflow.git synced 2025-07-30 08:12:02 +08:00

Author	SHA1	Message	Date
Kevin Hu	93f5df716f	Fix: order chunks from docx by positions. (#7979 ) ### What problem does this PR solve? #7934 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-30 17:20:53 +08:00
Yongteng Lei	bd4678bca6	Fix: Unnecessary truncation in markdown parser (#7972 ) ### What problem does this PR solve? Fix unnecessary truncation in markdown parser. So that markdown can work perfectly like [this](https://github.com/infiniflow/ragflow/issues/7824#issuecomment-2921312576) in #7824, supporting multiple special delimiters. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-30 15:04:21 +08:00
Yongteng Lei	46963ab1ca	Fix: add advanced delimiter detection for naive merge (#7941 ) ### What problem does this PR solve? Add advanced delimiter detection for naive merge. #7824 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2025-05-29 16:17:22 +08:00
Yongteng Lei	0c562f0a9f	Refa: change citation mark as [ID:n] (#7923 ) ### What problem does this PR solve? Change citation mark as [ID:n], it's easier for LLMs to follow the instruction :) #7904 ### Type of change - [x] Refactoring	2025-05-29 10:03:51 +08:00
Sol	0d7cfce6e1	Update rag/nlp/query.py (#7816 ) ### What problem does this PR solve? Fix tokenizer resulting in low recall ![37743d3a495f734aa69f1e173fa77457](https://github.com/user-attachments/assets/1394757e-8fcb-4f87-96af-a92716144884) ![4aba633a17f34269a4e17e84fafb34c4](https://github.com/user-attachments/assets/a1828e32-3e17-4394-a633-ba3f09bd506d) ![image](https://github.com/user-attachments/assets/61308f32-2a4f-44d5-a034-d65bbec554ef) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-05-23 17:13:37 +08:00
Stephen Hu	db4371c745	Fix: Improve First Chunk Size (#7806 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/7790 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-23 14:30:19 +08:00
Emmanuel Ferdman	d4a123d6dd	Fix: resolve regex library warnings (#7782 ) ### What problem does this PR solve? This small PR resolves the regex library warnings showing in Python3.11: ```python DeprecationWarning: 'count' is passed as positional argument ``` ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [x] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>	2025-05-22 10:06:28 +08:00
Kevin Hu	321a280031	Feat: add image preview to retrieval test. (#7610 ) ### What problem does this PR solve? #7608 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-05-13 14:30:36 +08:00
Stephen Hu	573d46a4ef	FIX:ZeroDivisionError when using large page_size in client.retrieve() (#7595 ) ### What problem does this PR solve? Close #7592 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-13 10:46:31 +08:00
Kevin Hu	a14865e6bb	Fix: empty query issue. (#7551 ) ### What problem does this PR solve? #5214 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-09 12:20:19 +08:00
Kevin Hu	c7310f7fb2	Refa: similarity calculations. (#7381 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2025-04-28 19:17:11 +08:00
Stephen Hu	1662c7eda3	Feat: Markdown add image (#7124 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/6984 1. Markdown parser supports get pictures 2. For Native, when handling Markdown, it will handle images 3. improve merge and ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-04-25 18:35:28 +08:00
Yongteng Lei	67dee2d74e	Fix: fix retrieval tesing wrong pagination (#7174 ) ### What problem does this PR solve? Fix retrieval testing wrong pagination. #7171 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-04-22 15:16:04 +08:00
alulala	d9266ed65a	Fix: incorrect total chunks count in retrieval function after similarity filtering (#6741 ) (#6932 ) ### Related Issue: https://github.com/infiniflow/ragflow/issues/6741 ### Environment: Using nightly version Commit version: [[6051abb](`6051abb4a3`)] ### Bug Description: The retrieval function in rag/nlp/search.py returns the original total chunks number even after chunks are filtered by similarity_threshold. This creates inconsistency between the actual returned chunks and the reported total. ### Changes Made: Added code to count how many search results actually meet or exceed the configured similarity threshold Positioned the calculation after the doc_ids conditional logic to ensure special cases are handled correctly Updated the ranks["total"] value to store this filtered count instead of using the raw search result count Using NumPy leverages optimized C-level batch operations to optimize speed	2025-04-11 12:31:36 +08:00
kaiyuan Zhang	ead5f7aba9	Fix infinite recursion in RagTokenizer when processing repetitive characters (#6109 ) ### What problem does this PR solve? fix #6085 RagTokenizer's dfs_() function falls into infinite recursion when processing text with repetitive Chinese characters (e.g., "一一一一一十一十一十一..." or "一一一一一一十十十十十十十二十二十二..."), causing memory leaks. ### Type of change Implemented three optimizations to the dfs_() function: 1.Added memoization with _memo dictionary to cache computed results 2.Added recursion depth limiting with _depth parameter (max 10 levels) 3.Implemented special handling for repetitive character sequences - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-04-01 13:59:52 +08:00
Kevin Hu	0758c04941	Refa: token similarity calculations. (#6614 ) ### What problem does this PR solve? #6507 ### Type of change - [x] Performance Improvement	2025-03-28 09:33:08 +08:00
Kevin Hu	cc8029a732	Fix: uploading in chat box issue. (#6547 ) ### What problem does this PR solve? #6228 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-26 15:37:48 +08:00
Kevin Hu	ee5aa51d43	Fix: point in tag issue. (#6436 ) ### What problem does this PR solve? #6414 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-24 10:45:29 +08:00
Kevin Hu	a087d13ccb	Feat: text file support position retaining. (#6231 ) ### What problem does this PR solve? #5832 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-03-18 16:55:11 +08:00
Kevin Hu	6e8d0e3177	Fix: rank feat issue. (#6225 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-18 16:07:29 +08:00
Kevin Hu	1333d3c02a	Fix: float transfer exception. (#6197 ) ### What problem does this PR solve? #6177 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-18 11:13:44 +08:00
Kevin Hu	fabc5e9259	Refa: fix re-rank scope. (#6152 ) ### What problem does this PR solve? #6140 ### Type of change - [x] Refactoring	2025-03-17 13:26:29 +08:00
Kevin Hu	e5a8b23684	Fix: empty tag field issue. (#6103 ) ### What problem does this PR solve? #6102 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-14 17:35:57 +08:00
Kevin Hu	485bc7d7d6	Fix: limit the depth of DFS (#6101 ) ### What problem does this PR solve? #6085 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-14 17:10:38 +08:00
Kevin Hu	e05cdc2f9c	Fix: encode detect error. (#6006 ) ### What problem does this PR solve? #5967 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-13 10:47:58 +08:00
Kevin Hu	15736c57c3	Fix: empty query issue. (#5830 ) ### What problem does this PR solve? #5214 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-10 13:56:56 +08:00
Kevin Hu	c190086707	Fix: bad case for tokenizer. (#5543 ) ### What problem does this PR solve? #5492 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-03 15:36:16 +08:00
Kevin Hu	4f40f685d9	Code refactor (#5371 ) ### What problem does this PR solve? #5173 ### Type of change - [x] Refactoring	2025-02-26 15:40:52 +08:00
Kevin Hu	53b9e7b52f	Add tavily as web searh tool. (#5349 ) ### What problem does this PR solve? #5198 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-02-26 10:21:04 +08:00
Kevin Hu	daddfc9e1b	Remove dup gb2312, solve currupt error. (#5326 ) ### What problem does this PR solve? #5252 #5325 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-25 12:22:37 +08:00
Kevin Hu	3444cb15e3	Refine search query. (#5235 ) ### What problem does this PR solve? #5173 #5214 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-21 18:32:32 +08:00
Kevin Hu	cdb3e6434a	Fix empty question issue. (#5225 ) ### What problem does this PR solve? #5241 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-21 15:47:39 +08:00
Kevin Hu	7b3d700d5f	Apply agentic searching. (#5196 ) ### What problem does this PR solve? #5173 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-02-20 17:41:01 +08:00
Kevin Hu	e6c024f8bf	Fix too many clause while searching. (#5119 ) ### What problem does this PR solve? #5100 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-19 13:18:39 +08:00
ubbg	29a59ed7e2	Fix: Use self.dataStore.indexExist in all_tags method of Dealer (#5108 ) ### What problem does this PR solve? This PR fixes an AttributeError in the all_tags method of the Dealer class. Previously, the method incorrectly called self.docStoreConn.indexExist instead of self.dataStore.indexExist. Since self.docStoreConn was never set (and self.dataStore is already initialized in init), this resulted in an error when attempting to check if the index exists. This change ensures that the proper connector is used for the index existence check, thereby resolving the issue._ ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-19 11:50:57 +08:00
Kevin Hu	9ff825f39d	Ignore exceptions when no index ahead. (#5047 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2025-02-18 09:09:22 +08:00
Mathias Panzenböck	9bcccadebd	Remove use of eval() from search.py (#4887 ) Use `json.loads()` instead. ### What problem does this PR solve? Using `eval()` can lead to code injections. I think this loads a JSON field, right? If yes, why is this done via `eval()` and not `json.loads()`? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-12 13:15:38 +08:00
Kevin Hu	f374dd38b6	Fix divided by zero issue. (#4784 ) ### What problem does this PR solve? #4779 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-08 10:36:26 +08:00
Kevin Hu	448fa1c4d4	Robust for abnormal response from LLMs. (#4747 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-06 17:34:53 +08:00
Kevin Hu	6f2c3a3c3c	Fix too long query exception. (#4729 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-06 10:11:52 +08:00
Kevin Hu	4011c8f68c	Fix potential error. (#4650 ) ### What problem does this PR solve? #4622 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-26 12:38:32 +08:00
Kevin Hu	86892959a0	Rebuild graph when it's out of time. (#4607 ) ### What problem does this PR solve? #4543 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring	2025-01-23 17:26:20 +08:00
Kevin Hu	dd0ebbea35	Light GraphRAG (#4585 ) ### What problem does this PR solve? #4543 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-01-22 19:43:14 +08:00
Kevin Hu	c5da3cdd97	Tagging (#4426 ) ### What problem does this PR solve? #4367 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-01-09 17:07:21 +08:00
Kevin Hu	d9a4e4cc3b	Fix page size error. (#4401 ) ### What problem does this PR solve? #4400 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-07 19:06:31 +08:00
Kevin Hu	f948c0d9f1	Clean query. (#4259 ) ### What problem does this PR solve? #4239 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-27 14:25:03 +08:00
Kevin Hu	7e063283ba	Removing invisible chars before tokenization. (#4233 ) ### What problem does this PR solve? #4223 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-26 11:48:16 +08:00
Bo Liu	321e9f3719	fix: stop rerank by model when search result is empty (#4203 ) ### What problem does this PR solve? stop rerank by model when search result is empty, otherwise rerank may raise an error (qwen). ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: 刘博 <liubo@ynby.cn>	2024-12-24 14:33:46 +08:00
Kevin Hu	c373dba0bc	Fix raptor bug. (#4192 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-23 18:59:48 +08:00
Kevin Hu	31d67c850e	Fetch chunk by batches. (#4177 ) ### What problem does this PR solve? #4173 ### Type of change - [x] Performance Improvement	2024-12-23 12:12:15 +08:00

1 2 3 4

183 Commits