ragflow

AI/ragflow

mirror of https://git.mirrors.martin98.com/https://github.com/infiniflow/ragflow.git synced 2025-08-04 07:40:37 +08:00

Author	SHA1	Message	Date
Kevin Hu	c7310f7fb2	Refa: similarity calculations. (#7381 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2025-04-28 19:17:11 +08:00
Stephen Hu	1662c7eda3	Feat: Markdown add image (#7124 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/6984 1. Markdown parser supports get pictures 2. For Native, when handling Markdown, it will handle images 3. improve merge and ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-04-25 18:35:28 +08:00
Yongteng Lei	67dee2d74e	Fix: fix retrieval tesing wrong pagination (#7174 ) ### What problem does this PR solve? Fix retrieval testing wrong pagination. #7171 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-04-22 15:16:04 +08:00
alulala	d9266ed65a	Fix: incorrect total chunks count in retrieval function after similarity filtering (#6741 ) (#6932 ) ### Related Issue: https://github.com/infiniflow/ragflow/issues/6741 ### Environment: Using nightly version Commit version: [[6051abb](`6051abb4a3`)] ### Bug Description: The retrieval function in rag/nlp/search.py returns the original total chunks number even after chunks are filtered by similarity_threshold. This creates inconsistency between the actual returned chunks and the reported total. ### Changes Made: Added code to count how many search results actually meet or exceed the configured similarity threshold Positioned the calculation after the doc_ids conditional logic to ensure special cases are handled correctly Updated the ranks["total"] value to store this filtered count instead of using the raw search result count Using NumPy leverages optimized C-level batch operations to optimize speed	2025-04-11 12:31:36 +08:00
kaiyuan Zhang	ead5f7aba9	Fix infinite recursion in RagTokenizer when processing repetitive characters (#6109 ) ### What problem does this PR solve? fix #6085 RagTokenizer's dfs_() function falls into infinite recursion when processing text with repetitive Chinese characters (e.g., "一一一一一十一十一十一..." or "一一一一一一十十十十十十十二十二十二..."), causing memory leaks. ### Type of change Implemented three optimizations to the dfs_() function: 1.Added memoization with _memo dictionary to cache computed results 2.Added recursion depth limiting with _depth parameter (max 10 levels) 3.Implemented special handling for repetitive character sequences - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-04-01 13:59:52 +08:00
Kevin Hu	0758c04941	Refa: token similarity calculations. (#6614 ) ### What problem does this PR solve? #6507 ### Type of change - [x] Performance Improvement	2025-03-28 09:33:08 +08:00
Kevin Hu	cc8029a732	Fix: uploading in chat box issue. (#6547 ) ### What problem does this PR solve? #6228 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-26 15:37:48 +08:00
Kevin Hu	ee5aa51d43	Fix: point in tag issue. (#6436 ) ### What problem does this PR solve? #6414 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-24 10:45:29 +08:00
Kevin Hu	a087d13ccb	Feat: text file support position retaining. (#6231 ) ### What problem does this PR solve? #5832 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-03-18 16:55:11 +08:00
Kevin Hu	6e8d0e3177	Fix: rank feat issue. (#6225 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-18 16:07:29 +08:00
Kevin Hu	1333d3c02a	Fix: float transfer exception. (#6197 ) ### What problem does this PR solve? #6177 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-18 11:13:44 +08:00
Kevin Hu	fabc5e9259	Refa: fix re-rank scope. (#6152 ) ### What problem does this PR solve? #6140 ### Type of change - [x] Refactoring	2025-03-17 13:26:29 +08:00
Kevin Hu	e5a8b23684	Fix: empty tag field issue. (#6103 ) ### What problem does this PR solve? #6102 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-14 17:35:57 +08:00
Kevin Hu	485bc7d7d6	Fix: limit the depth of DFS (#6101 ) ### What problem does this PR solve? #6085 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-14 17:10:38 +08:00
Kevin Hu	e05cdc2f9c	Fix: encode detect error. (#6006 ) ### What problem does this PR solve? #5967 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-13 10:47:58 +08:00
Kevin Hu	15736c57c3	Fix: empty query issue. (#5830 ) ### What problem does this PR solve? #5214 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-10 13:56:56 +08:00
Kevin Hu	c190086707	Fix: bad case for tokenizer. (#5543 ) ### What problem does this PR solve? #5492 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-03 15:36:16 +08:00
Kevin Hu	4f40f685d9	Code refactor (#5371 ) ### What problem does this PR solve? #5173 ### Type of change - [x] Refactoring	2025-02-26 15:40:52 +08:00
Kevin Hu	53b9e7b52f	Add tavily as web searh tool. (#5349 ) ### What problem does this PR solve? #5198 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-02-26 10:21:04 +08:00
Kevin Hu	daddfc9e1b	Remove dup gb2312, solve currupt error. (#5326 ) ### What problem does this PR solve? #5252 #5325 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-25 12:22:37 +08:00
Kevin Hu	3444cb15e3	Refine search query. (#5235 ) ### What problem does this PR solve? #5173 #5214 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-21 18:32:32 +08:00
Kevin Hu	cdb3e6434a	Fix empty question issue. (#5225 ) ### What problem does this PR solve? #5241 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-21 15:47:39 +08:00
Kevin Hu	7b3d700d5f	Apply agentic searching. (#5196 ) ### What problem does this PR solve? #5173 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-02-20 17:41:01 +08:00
Kevin Hu	e6c024f8bf	Fix too many clause while searching. (#5119 ) ### What problem does this PR solve? #5100 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-19 13:18:39 +08:00
ubbg	29a59ed7e2	Fix: Use self.dataStore.indexExist in all_tags method of Dealer (#5108 ) ### What problem does this PR solve? This PR fixes an AttributeError in the all_tags method of the Dealer class. Previously, the method incorrectly called self.docStoreConn.indexExist instead of self.dataStore.indexExist. Since self.docStoreConn was never set (and self.dataStore is already initialized in init), this resulted in an error when attempting to check if the index exists. This change ensures that the proper connector is used for the index existence check, thereby resolving the issue._ ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-19 11:50:57 +08:00
Kevin Hu	9ff825f39d	Ignore exceptions when no index ahead. (#5047 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2025-02-18 09:09:22 +08:00
Mathias Panzenböck	9bcccadebd	Remove use of eval() from search.py (#4887 ) Use `json.loads()` instead. ### What problem does this PR solve? Using `eval()` can lead to code injections. I think this loads a JSON field, right? If yes, why is this done via `eval()` and not `json.loads()`? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-12 13:15:38 +08:00
Kevin Hu	f374dd38b6	Fix divided by zero issue. (#4784 ) ### What problem does this PR solve? #4779 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-08 10:36:26 +08:00
Kevin Hu	448fa1c4d4	Robust for abnormal response from LLMs. (#4747 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-06 17:34:53 +08:00
Kevin Hu	6f2c3a3c3c	Fix too long query exception. (#4729 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-06 10:11:52 +08:00
Kevin Hu	4011c8f68c	Fix potential error. (#4650 ) ### What problem does this PR solve? #4622 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-26 12:38:32 +08:00
Kevin Hu	86892959a0	Rebuild graph when it's out of time. (#4607 ) ### What problem does this PR solve? #4543 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring	2025-01-23 17:26:20 +08:00
Kevin Hu	dd0ebbea35	Light GraphRAG (#4585 ) ### What problem does this PR solve? #4543 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-01-22 19:43:14 +08:00
Kevin Hu	c5da3cdd97	Tagging (#4426 ) ### What problem does this PR solve? #4367 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-01-09 17:07:21 +08:00
Kevin Hu	d9a4e4cc3b	Fix page size error. (#4401 ) ### What problem does this PR solve? #4400 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-07 19:06:31 +08:00
Kevin Hu	f948c0d9f1	Clean query. (#4259 ) ### What problem does this PR solve? #4239 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-27 14:25:03 +08:00
Kevin Hu	7e063283ba	Removing invisible chars before tokenization. (#4233 ) ### What problem does this PR solve? #4223 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-26 11:48:16 +08:00
Bo Liu	321e9f3719	fix: stop rerank by model when search result is empty (#4203 ) ### What problem does this PR solve? stop rerank by model when search result is empty, otherwise rerank may raise an error (qwen). ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: 刘博 <liubo@ynby.cn>	2024-12-24 14:33:46 +08:00
Kevin Hu	c373dba0bc	Fix raptor bug. (#4192 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-23 18:59:48 +08:00
Kevin Hu	31d67c850e	Fetch chunk by batches. (#4177 ) ### What problem does this PR solve? #4173 ### Type of change - [x] Performance Improvement	2024-12-23 12:12:15 +08:00
Jin Hai	50c2b9d562	Refactor trie load and construct (#4083 ) ### What problem does this PR solve? 1. Fix initial build and load trie 2. Update comment ### Type of change - [x] Refactoring Signed-off-by: jinhai <haijin.chn@gmail.com>	2024-12-18 12:52:56 +08:00
Kevin Hu	000cd6d615	Fix position lost issue. (#4068 ) ### What problem does this PR solve? #4040 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-17 16:31:58 +08:00
Luo Pan	68d46b2a1e	Fix bug in hierarchical_merge function (#4006 ) ### What problem does this PR solve? Fix hierarchical_merge function. From idx vs. actual value to actual value vs. actual value. Related issue #4003 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: luopan <luopan@example.com>	2024-12-13 08:50:58 +08:00
Zhichang Yu	03f00c9e6f	Rename page_num_list, top_list, position_list (#3940 ) ### What problem does this PR solve? Rename page_num_list, top_list, position_list to page_num_int, top_int, position_int ### Type of change - [x] Refactoring	2024-12-10 16:32:58 +08:00
Kevin Hu	927873bfa6	Fix syn error. (#3953 ) ### What problem does this PR solve? Close #3696 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-10 10:54:54 +08:00
Zhichang Yu	7a6bf4326e	Fixed log not displaying (#3946 ) ### What problem does this PR solve? Fixed log not displaying ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-10 09:36:59 +08:00
Zhichang Yu	0d68a6cd1b	Fix errors detected by Ruff (#3918 ) ### What problem does this PR solve? Fix errors detected by Ruff ### Type of change - [x] Refactoring	2024-12-08 14:21:12 +08:00
Kevin Hu	56f473b680	Feat: Add question parameter to edit chunk modal (#3875 ) ### What problem does this PR solve? Close #3873 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-12-05 14:51:19 +08:00
Kevin Hu	1b817a5b4c	Refine synonym query. (#3855 ) ### What problem does this PR solve? ### Type of change - [x] Performance Improvement	2024-12-04 17:20:12 +08:00
Jin Hai	6657ca7cde	Change default error message to English (#3838 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2024-12-04 09:34:49 +08:00

1 2 3 4

173 Commits