ragflow

AI/ragflow

mirror of https://git.mirrors.martin98.com/https://github.com/infiniflow/ragflow.git synced 2025-08-05 14:50:41 +08:00

Author	SHA1	Message	Date
QuintinTao	1b4016317e	fix bug chunking:expected string or bytes-like object (#7116 ) … bytes-like object ### What problem does this PR solve? fix bug #6990 internal server error ehile chunking:expected string or bytes-like object _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): Co-authored-by: unknown <taoshi.ln@chinatelecom.cn>	2025-04-18 14:42:36 +08:00
Kevin Hu	ed5f81b02e	Fix: abnormal cell mergeing. (#6991 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-14 11:00:11 +08:00
dylan	5aae73c230	Make error messages during PPT processing clearer. (#6980 ) ### What problem does this PR solve? Sometimes a slide may trigger a Proxy error (ArgumentException: Parameter is not valid) due to issues in the original file, and this error message can be confusing for users. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [x] Other (please describe):	2025-04-14 10:10:20 +08:00
Kevin Hu	14a3efd756	Fix: docx image exceptions. (#6839 ) ### What problem does this PR solve? Close #6784 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-07 12:33:34 +08:00
Kevin Hu	ee5aa51d43	Fix: point in tag issue. (#6436 ) ### What problem does this PR solve? #6414 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-24 10:45:29 +08:00
fansir	0e0ebaac5f	Feat: Adds hierarchical title path tracking for tables in DOCX documents to improve context association (#6374 ) ### What problem does this PR solve? Adds hierarchical title path tracking for tables in DOCX documents to improve context association. Previously, extracted tables lacked positional context within document structure. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-03-21 18:42:36 +08:00
Kevin Hu	95497b4aab	Fix: adapt to old configurations. (#6321 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-20 14:50:59 +08:00
Yongteng Lei	9611185eb4	Feat: add VLM-boosted DocX parser (#6307 ) ### What problem does this PR solve? Add VLM-boosted DocX parser ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-03-20 11:24:44 +08:00
Yongteng Lei	e4380843c4	Feat: add fallback for PDF figure parser (#6305 ) ### What problem does this PR solve? Add fallback for PDF figure parser ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-03-20 10:48:38 +08:00
Yongteng Lei	1d6760dd84	Feat: add VLM-boosted PDF parser (#6278 ) ### What problem does this PR solve? Add VLM-boosted PDF parser if VLM is set. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-03-20 09:39:32 +08:00
Yongteng Lei	5cf610af40	Feat: add vision LLM PDF parser (#6173 ) ### What problem does this PR solve? Add vision LLM PDF parser ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-03-18 14:52:20 +08:00
Kevin Hu	1333d3c02a	Fix: float transfer exception. (#6197 ) ### What problem does this PR solve? #6177 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-18 11:13:44 +08:00
Kevin Hu	3a99c2b5f4	Refa: PARALLEL_DEVICES is a static parameter. (#6168 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2025-03-17 16:49:54 +08:00
Kevin Hu	bfa8d342b3	Fix: retrieval debug mode issue. (#6150 ) ### What problem does this PR solve? #6139 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-17 13:07:13 +08:00
Debug Doctor	3e19044dee	Feat: add OCR's muti-gpus and parallel processing support (#5972 ) ### What problem does this PR solve? Add OCR's muti-gpus and parallel processing support ### Type of change - [x] New Feature (non-breaking change which adds functionality) @yuzhichang I've tried to resolve the comments in #5697. OCR jobs can now be done on both CPU and GPU. ( By the way, I've encountered a “Generate embedding error” issue #5954 that might be due to my outdated GPUs? idk. ) Please review it and give me suggestions. GPU: ![gpu_ocr](https://github.com/user-attachments/assets/0ee2ecfb-a665-4e50-8bc7-15941b9cd80e) ![smi](https://github.com/user-attachments/assets/a2312f8c-cf24-443d-bf89-bec50503546d) CPU: ![cpu_ocr](https://github.com/user-attachments/assets/1ba6bb0b-94df-41ea-be79-790096da4bf1)	2025-03-17 11:58:40 +08:00
Yongteng Lei	4ff609b6a8	Fix: optimize OCR garbage identification to reduce unnecessary filtering (#6027 ) ### What problem does this PR solve? Optimize OCR garbage identification to reduce unnecessary filtering. #5713 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-13 18:48:32 +08:00
Yongteng Lei	7cd37c37cd	Feat: add CSV file parsing support (#5989 ) ### What problem does this PR solve? Add CSV file parsing support #4552, #5849, #5870 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-03-12 19:20:50 +08:00
hy89	b0c21b00d9	Refactor: Optimize error handling and support parsing of XLS(EXCEL97—2003) files. (#5633 ) Optimize error handling and support parsing of XLS(EXCEL97—2003) files.	2025-03-05 11:55:27 +08:00
Kevin Hu	b418ce5643	Fix table parser issue. (#5482 ) ### What problem does this PR solve? #1475 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-28 16:09:12 +08:00
Kevin Hu	4f40f685d9	Code refactor (#5371 ) ### What problem does this PR solve? #5173 ### Type of change - [x] Refactoring	2025-02-26 15:40:52 +08:00
Kevin Hu	c28bc41a96	Fix docx table issue. (#5117 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-19 12:40:06 +08:00
Kevin Hu	c24137bd11	Fix too long integer for `Table`. (#4651 ) ### What problem does this PR solve? #4594 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-26 12:54:58 +08:00
Kevin Hu	9d717f0b6e	Fix csv reader exception. (#4628 ) ### What problem does this PR solve? #4552 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-24 14:47:19 +08:00
Kevin Hu	13f04b7cca	Fix pdf applying Q&A issue. (#4599 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-23 12:30:46 +08:00
Kevin Hu	dd0ebbea35	Light GraphRAG (#4585 ) ### What problem does this PR solve? #4543 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-01-22 19:43:14 +08:00
Jin Hai	3894de895b	Update comments (#4569 ) ### What problem does this PR solve? Add license statement. ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-01-21 20:52:28 +08:00
Kevin Hu	f556f0239c	Fix dify retrieval issue. (#4473 ) ### What problem does this PR solve? #4464 #4469 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-14 13:16:05 +08:00
Kevin Hu	e098fcf6ad	Fix csv for TAG. (#4454 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-13 12:03:18 +08:00
Kevin Hu	c5da3cdd97	Tagging (#4426 ) ### What problem does this PR solve? #4367 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-01-09 17:07:21 +08:00
Yingfeng	50f209204e	Synchronize with enterprise version (#4325 ) ### Type of change - [x] Refactoring	2025-01-02 13:44:44 +08:00
Kevin Hu	8fb18f37f6	Code refactor. (#4291 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2024-12-30 18:38:51 +08:00
TeslaZY	dd13a5d05c	Fix some bugs in text2sql.(#4279 )(#4281 ) (#4280 ) Fix some bugs in text2sql.(#4279)(#4281) ### What problem does this PR solve? - The incorrect results in parsing CSV files of the QA knowledge base in the text2sql scenario. Process CSV files using the csv library. Decouple CSV parsing from TXT parsing - Most llm return results in markdown format ```sql query ```, Fix execution error caused by LLM output SQLmarkdown format.### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-30 10:32:19 +08:00
ly0303521	101b8ff813	fix chunk method "Table" losing content when the Excel file has multi… (#4123 ) …ple sheets ### What problem does this PR solve? discussed in https://github.com/infiniflow/ragflow/pull/4102 - In excel_parser.py, `total` means the total number of rows in Excel, but it return in the first iterate, that lead to the wrong `to_page` - In table.py, it when Excel file has multiple sheets, it will be divided into multiple parts, every part size is 3000, `data` may be empty, because it has recorded in the last iterate. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-19 17:30:26 +08:00
liuhua	1d65299791	Fix rerank_model bug in chat and markdown bug (#4061 ) ### What problem does this PR solve? Fix rerank_model bug in chat and markdown bug #4000 #3992 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>	2024-12-17 16:03:37 +08:00
Zhichang Yu	03f00c9e6f	Rename page_num_list, top_list, position_list (#3940 ) ### What problem does this PR solve? Rename page_num_list, top_list, position_list to page_num_int, top_int, position_int ### Type of change - [x] Refactoring	2024-12-10 16:32:58 +08:00
Kevin Hu	927873bfa6	Fix syn error. (#3953 ) ### What problem does this PR solve? Close #3696 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-10 10:54:54 +08:00
Zhichang Yu	0d68a6cd1b	Fix errors detected by Ruff (#3918 ) ### What problem does this PR solve? Fix errors detected by Ruff ### Type of change - [x] Refactoring	2024-12-08 14:21:12 +08:00
Jin Hai	821fdf02b4	Fix parsing JSON file error (#3829 ) ### What problem does this PR solve? Close issue: #3828 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: jinhai <haijin.chn@gmail.com>	2024-12-03 19:02:03 +08:00
Jin Hai	08c1a5e1e8	Refactor parse progress (#3781 ) ### What problem does this PR solve? Refactor parse file progress ### Type of change - [x] Refactoring Signed-off-by: jinhai <haijin.chn@gmail.com>	2024-12-01 22:28:00 +08:00
Jin Hai	e079656473	Update progress info and start welcome info (#3768 ) ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] Refactoring --------- Signed-off-by: jinhai <haijin.chn@gmail.com>	2024-11-30 18:48:06 +08:00
kuschzzp	e678819f70	Fix RGBA error (#3707 ) ### What problem does this PR solve? Passing cv_mdl.describe() is not an RGB converted image ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-11-28 13:09:02 +08:00
Zhichang Yu	bc701d7b4c	Edit chunk shall update instead of insert it (#3709 ) ### What problem does this PR solve? Edit chunk shall update instead of insert it. Close #3679 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-11-28 13:00:38 +08:00
Kevin Hu	609236f5c1	Let 'One' applicable for tables in docx (#3619 ) ### What problem does this PR solve? #3598 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Performance Improvement	2024-11-25 09:57:54 +08:00
Zhichang Yu	482c1b59c8	Check tika.parser return result (#3564 ) ### What problem does this PR solve? Check tika.parser return result. Close #3229 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2024-11-22 11:05:06 +08:00
Michal Masrna	c4f2464935	fix: laws.py added missing import logging (#3501 ) ### What problem does this PR solve? _Choosing Laws Chunk Method results in an error when parsing a document. The error is caused by a missing import in the `laws.py` file._ ``` Traceback (most recent call last): File "/ragflow/rag/svr/task_executor.py", line 445, in handle_task do_handle_task(task) File "/ragflow/rag/svr/task_executor.py", line 384, in do_handle_task cks = build(r) ^^^^^^^^ File "/ragflow/rag/svr/task_executor.py", line 196, in build cks = chunker.chunk(row["name"], binary=binary, from_page=row["from_page"], ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/ragflow/rag/app/laws.py", line 161, in chunk for txt, poss in pdf_parser(filename if not binary else binary, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/ragflow/rag/app/laws.py", line 124, in __call__ logging.debug("layouts:".format( ^^^^^^^ NameError: name 'logging' is not defined. Did you forget to import 'logging' ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): Co-authored-by: Michal Masrna <m.marna1@gmail.com>	2024-11-20 20:52:05 +08:00
Zhichang Yu	30f6421760	Use consistent log file names, introduced initLogger (#3403 ) ### What problem does this PR solve? Use consistent log file names, introduced initLogger ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [x] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2024-11-14 17:13:48 +08:00
Kevin Hu	83c6b1f308	set DLA active for KG (#3386 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2024-11-13 16:59:19 +08:00
Zhichang Yu	a2a5631da4	Rework logging (#3358 ) Unified all log files into one. ### What problem does this PR solve? Unified all log files into one. ### Type of change - [x] Refactoring	2024-11-12 17:35:13 +08:00
Zhichang Yu	f4c52371ab	Integration with Infinity (#2894 ) ### What problem does this PR solve? Integration with Infinity - Replaced ELASTICSEARCH with dataStoreConn - Renamed deleteByQuery with delete - Renamed bulk to upsertBulk - getHighlight, getAggregation - Fix KGSearch.search - Moved Dealer.sql_retrieval to es_conn.py ### Type of change - [x] Refactoring	2024-11-12 14:59:41 +08:00
Kevin Hu	f86826b7a0	refactor error message of qwen (#3074 ) ### What problem does this PR solve? #3055 ### Type of change - [x] Refactoring	2024-10-29 10:08:08 +08:00

1 2 3 4

156 Commits