ragflow

AI/ragflow

mirror of https://git.mirrors.martin98.com/https://github.com/infiniflow/ragflow.git synced 2025-08-04 12:10:50 +08:00

Author	SHA1	Message	Date
Stephen Hu	3da8776a3c	Fix: Creating Knowledge Base Support Enter Key (#7258 ) ### What problem does this PR solve? [https://github.com/infiniflow/ragflow/issues/7180](https://github.com/infiniflow/ragflow/issues/7180) When creating a knowledge base, support the enter key ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-25 18:53:52 +08:00
balibabu	3052006ba8	Feat: Save document metadata #3221 (#7323 ) ### What problem does this PR solve? Feat: Save document metadata #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-25 18:38:15 +08:00
Stephen Hu	1662c7eda3	Feat: Markdown add image (#7124 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/6984 1. Markdown parser supports get pictures 2. For Native, when handling Markdown, it will handle images 3. improve merge and ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-04-25 18:35:28 +08:00
balibabu	fef44a71c5	Feat: Save the configuration information of the knowledge base document #3221 (#7317 ) ### What problem does this PR solve? Feat: Save the configuration information of the knowledge base document #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-25 17:31:28 +08:00
Kevin Hu	b271cc34b3	Fix: LLM generated tag issue. (#7301 ) ### What problem does this PR solve? #7298 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-25 14:38:34 +08:00
alulala	eead838353	Fix pymysql interface error (#7295 ) ### What problem does this PR solve? According to the [[Rucongzhang](https://github.com/Rucongzhang)](https://github.com/infiniflow/ragflow/pull/7057#issuecomment-2827410047) I added DB reconnection strategy in function `update_by_id`	2025-04-25 13:29:47 +08:00
balibabu	02cc867c06	Feat: Display the document configuration dialog with shadcn #3221 (#7302 ) ### What problem does this PR solve? Feat: Display the document configuration dialog with shadcn #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-25 11:15:44 +08:00
writinwaters	6e98cd311c	Doc: Updated sharing behavior in the open-source editions. (#7293 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2025-04-25 09:21:33 +08:00
Yongteng Lei	97a13ef1ab	Fix: Qwen-vl-plus url error (#7281 ) ### What problem does this PR solve? Fix Qwen-vl-* url error. #7277 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-25 09:20:10 +08:00
Neal Davis	7e1464a950	Feat: Replace the logo of novita (#7287 ) ### What problem does this PR solve? Replace the logo of novita ### Type of change - [x] Other (please describe): Update logo	2025-04-24 21:20:36 +08:00
writinwaters	e6a4d6bcf0	DocsHow to disable user registration (#7265 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2025-04-24 18:02:32 +08:00
pyyuhao	c8c3b756b0	Feat: Adds OpenSearch2.19.1 as the vector_database support (#7140 ) ### What problem does this PR solve? This PR adds the support for latest OpenSearch2.19.1 as the store engine & search engine option for RAGFlow. ### Main Benefit 1. OpenSearch2.19.1 is licensed under the [Apache v2.0 License] which is much better than Elasticsearch 2. For search, OpenSearch2.19.1 supports full-text search、vector_search、hybrid_search those are similar with Elasticsearch on schema 3. For store, OpenSearch2.19.1 stores text、vector those are quite simliar with Elasticsearch on schema ### Changes - Support opensearch_python_connetor. I make a lot of adaptions since the schema and api/method between ES and Opensearch differs in many ways(especially the knn_search has a significant gap) : rag/utils/opensearch_coon.py - Support static config adaptions by changing: conf/service_conf.yaml、api/settings.py、rag/settings.py - Supprt some store&search schema changes between OpenSearch and ES: conf/os_mapping.json - Support OpenSearch python sdk : pyproject.toml - Support docker config for OpenSearch2.19.1 : docker/.env、docker/docker-compose-base.yml、docker/service_conf.yaml.template ### How to use - I didn't change the priority that ES as the default doc/search engine. Only if in docker/.env , we set DOC_ENGINE=${DOC_ENGINE:-opensearch}, it will work. ### Others Our team tested a lot of docs in our environment by using OpenSearch as the vector database ,it works very well. All the conifg for OpenSearch is necessary. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Yongteng Lei <yongtengrey@outlook.com> Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com> Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2025-04-24 16:03:31 +08:00
balibabu	9a8dda8fc7	Feat: Delete and rename files in the knowledge base #3221 (#7268 ) ### What problem does this PR solve? Feat: Delete and rename files in the knowledge base #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-24 14:51:20 +08:00
balibabu	ff442c48b5	Feat: Display document parsing status #3221 (#7241 ) ### What problem does this PR solve? Feat: Display document parsing status #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-24 11:45:37 +08:00
benni82	216cd7474b	fix: task_executor bug fix (#7253 ) ### What problem does this PR solve? The lock is not released correctly when task_exectuor is abnormal ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-04-24 11:44:34 +08:00
WhiteBear	2c62652ea8	<think> tag is missing. (#7256 ) ### What problem does this PR solve? Some models force thinking, resulting in the absence of the think tag in the returned content ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-24 11:44:10 +08:00
Wanderson Pinto dos Santos	4e8fd73a20	chore: adds pre-commit (#7242 ) ### What problem does this PR solve? Sometimes after we commit the code and open the PR the CI pipeline fails in Ruff checks. Including a pre-commit we can identify this problem early and avoid time loss. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [X] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-04-24 11:25:33 +08:00
Pierre Lafievre	19931cd9ed	Fix: mcp server start (#7251 ) ### What problem does this PR solve? Fix the entrypoint file from the docker container to solve #7249 Here is the important part from the logs: ``` docker logs -f ragflow-server ... usage: server.py [-h] [--base_url BASE_URL] [--host HOST] [--port PORT] [--mode MODE] [--api_key API_KEY] server.py: error: unrecognized arguments: ... ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-04-24 10:52:12 +08:00
Stephen Hu	0b460a9a12	Fix: improve retrieval API (#6744 ) ### What problem does this PR solve? Get the highlight parameter from the request to keep consistency with the document > - Method: POST - URL: `/api/v1/retrieval` - Headers: - `'content-Type: application/json'` - `'Authorization: Bearer <YOUR_API_KEY>'` - Body: - `"question"`: `string` - `"dataset_ids"`: `list[string]` - `"document_ids"`: `list[string]` - `"page"`: `integer` - `"page_size"`: `integer` - `"similarity_threshold"`: `float` - `"vector_similarity_weight"`: `float` - `"top_k"`: `integer` - `"rerank_id"`: `string` - `"keyword"`: `boolean` - `"highlight"`: `boolean` > ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-24 09:29:00 +08:00
yxkong	4e31eea55f	Fix/mcp doc (#7239 ) ### What problem does this PR solve? This PR fixes an issue with the MCP server configuration in RAGFlow's Docker deployment where: 1. Incorrect parameter naming (`--mcp--host-api-key` with double hyphens) caused server startup failures 2. Port binding conflicts occurred due to unexposed MCP ports in Docker 3. Inconsistent host addressing between `0.0.0.0` and `127.0.0.1` led to connectivity issues The changes ensure proper MCP server initialization and reliable inter-service communication. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): ### Key Changes 1. Parameter Correction: - Fixed `--mcp--host-api-key` → `--mcp-host-api-key`	2025-04-24 09:20:26 +08:00
balibabu	1366712560	Feat: Deleting files in batches. #3221 (#7234 ) ### What problem does this PR solve? Feat: Deleting files in batches. #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-23 19:03:02 +08:00
balibabu	51d9bde5a3	Feat: Create a folder #3221 (#7228 ) ### What problem does this PR solve? Feat: Create a folder #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-23 15:21:09 +08:00
Yongteng Lei	94181a990b	Refa: knowledge_graph chunk method is deprecated (#7220 ) ### What problem does this PR solve? The knowledge_graph chunk method is deprecated and should no longer be used. #7184. ### Type of change - [x] Refactoring v0.18.0	2025-04-23 13:01:46 +08:00
liu an	03672df691	Docs: update for v0.18.0 (#7223 ) ### What problem does this PR solve? update for v0.18.0 ### Type of change - [x] Documentation Update	2025-04-23 12:02:50 +08:00
writinwaters	e9669e7fb1	Updated v0.18.0 release notes (#7221 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2025-04-23 11:12:14 +08:00
writinwaters	9a1ac8020d	v0.18.0 release notes (#7185 ) ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] Documentation Update	2025-04-23 10:41:58 +08:00
balibabu	b44bbd11b8	Feat: Upload document #3221 (#7209 ) ### What problem does this PR solve? Feat: Upload document #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-23 10:39:09 +08:00
writinwaters	1e91318445	Added a RAPTOR guide (#7211 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2025-04-22 20:56:30 +08:00
Yongteng Lei	f35ff65c36	[BREAKING CHANGE] GET to POST: enhance kb list capability (#7205 ) ### What problem does this PR solve? Enhance capability of `list_kbs`. Breaking change: change method from `GET` to `POST`. ### Type of change - [x] Refactoring - [x] Enhancement with breaking change	2025-04-22 17:54:12 +08:00
balibabu	ba0e363d5a	Feat: Show the owner of this knowledge base on the list card. #3221 (#7204 ) ### What problem does this PR solve? Feat: Show the owner of this knowledge base on the list card. #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-22 16:46:13 +08:00
balibabu	dde8c26feb	Feat: Even if the knowledge base has slices, the chunk method can be changed #7115 (#7201 ) ### What problem does this PR solve? Feat: Even if the knowledge base has slices, the chunk method can be changed #7115 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-22 16:04:49 +08:00
balibabu	64dd187498	Fix: Knowledge Graph Extraction Conflict Between Dataset-Level and File-Specific Configurations #7198 (#7199 ) ### What problem does this PR solve? Fix: Knowledge Graph Extraction Conflict Between Dataset-Level and File-Specific Configurations #7198 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-22 16:00:55 +08:00
Yongteng Lei	67dee2d74e	Fix: fix retrieval tesing wrong pagination (#7174 ) ### What problem does this PR solve? Fix retrieval testing wrong pagination. #7171 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-04-22 15:16:04 +08:00
balibabu	bcac195a0c	Put the knowledge base list related hooks into use-knowledge-request.ts #3221 (#7197 ) ### What problem does this PR solve? Put the knowledge base list related hooks into use-knowledge-request.ts #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-22 15:01:35 +08:00
balibabu	8fca8faa7d	Feat: Move langfuse configuration to api page #6155 (#7196 ) ### What problem does this PR solve? Feat: Move langfuse configuration to api page #6155 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-22 14:08:20 +08:00
balibabu	1cc17eb611	Feat: Filter the knowledge base list using owner #3221 (#7191 ) ### What problem does this PR solve? Feat: Filter the knowledge base list using owner #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-22 13:44:41 +08:00
Wanderson Pinto dos Santos	c8194f5fd0	refactor: Update Redis configuration to use StatefulSet instead of deployment with pvc (#7187 ) ### What problem does this PR solve? This PR changes Redis to be a statefulset. In some situation when we Redis pod gets rescheduled to another Node, it gets stuck in pending state due to the PVC attached to another Kubernetes node. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [X] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-04-22 12:53:30 +08:00
Kevin Hu	f2c9ffc056	Fix: KG search issue. (#7186 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-22 12:10:30 +08:00
zhudongwork	10432a1be7	Refa: Optimize pptx shape extraction to reduce content loss (#6703 ) ### What problem does this PR solve? When parsing pptx files, some shapes do not contain the `shape_type` attribute, which causes the original code to throw an exception during extraction, leading to failure in content extraction. This optimization introduces handling logic for such anomalous shapes, providing a safer and more robust processing mechanism. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [x] Refactoring - [x] Performance Improvement - [ ] Other (please describe):	2025-04-22 10:16:24 +08:00
balibabu	e7f83b13ca	Feat: Rename a dataset #3221 (#7162 ) ### What problem does this PR solve? Feat: Rename a dataset #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-22 10:09:41 +08:00
Yongteng Lei	ad220a0a3c	Feat: add mcp self-host mode (#7157 ) ### What problem does this PR solve? Add mcp self-host mode, a complement of #7084. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-22 10:04:21 +08:00
Yongteng Lei	91c5a5c08f	Docs: add mcp self-host mode (#7163 ) ### What problem does this PR solve? Add mcp self-host mode documentation, a complement of #7141. ### Type of change - [x] Documentation Update --------- Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>	2025-04-22 10:03:38 +08:00
Song Fuchang	8362ab405c	Fix: don't modify S3 file name when not using prefix_path (#7152 ) ### What problem does this PR solve? Hello, I encountered a problem when trying to use a S3 backend (seaweedfs) for storage in RAGFlow: when calling `STORAGE_IMPL.get("bucket", "key")`, the actual request sent to S3 is `bucket/bucket/key`, causing a `NoSuchKey` error. I compared the code in `s3_conn.py` to `minio_conn.py` and `oss_conn.py`, then decided to remove the `else` branch in `use_prefix_path` method, and it works. I didn't configure `prefix_path` or `bucket` in `s3` section of the `service_conf.yaml`. I think this is a bug, but not sure. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-04-21 11:55:50 +08:00
Yongteng Lei	68b9dae6c0	Feat: mcp server (#7084 ) ### What problem does this PR solve? Add MCP support with a client example. Issue link: #4344 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-21 09:43:20 +08:00
Yongteng Lei	9b956ac1a9	Docs: MCP server (#7141 ) ### What problem does this PR solve? Documentation for MCP server ### Type of change - [x] Documentation Update --------- Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>	2025-04-21 09:42:32 +08:00
liuzhenghua	d4dbdfb61d	feat: Recover pending tasks while pod restart. (#7073 ) ### What problem does this PR solve? If you deploy Ragflow using Kubernetes, the hostname will change during a rolling update. This causes the consumer name of the task executor to change, making it impossible to schedule tasks that were previously in a pending state. To address this, I introduced a recovery task that scans these pending messages and re-publishes them, allowing the tasks to continue being processed. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): --------- Co-authored-by: liuzhenghua-jk <liuzhenghua-jk@360shuke.com>	2025-04-19 16:18:51 +08:00
Kevin Hu	487aed419e	Fix: cite disfunction for G component. (#7117 ) ### What problem does this PR solve? #7097 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-18 18:05:26 +08:00
aniaan	8b8a2f2949	fix(nursery): Fix Closure Trap Issues in Trio Concurrent Tasks (#7106 ) ## Problem Description Multiple files in the RAGFlow project contain closure trap issues when using lambda functions with `trio.open_nursery()`. This problem causes concurrent tasks created in loops to reference the same variable, resulting in all tasks processing the same data (the data from the last iteration) rather than each task processing its corresponding data from the loop. ## Issue Details When using a `lambda` to create a closure function and passing it to `nursery.start_soon()` within a loop, the lambda function captures a reference to the loop variable rather than its value. For example: ```python # Problematic code async with trio.open_nursery() as nursery: for d in docs: nursery.start_soon(lambda: doc_keyword_extraction(chat_mdl, d, topn)) ``` In this pattern, when concurrent tasks begin execution, `d` has already become the value after the loop ends (typically the last element), causing all tasks to use the same data. ## Fix Solution Changed the way concurrent tasks are created with `nursery.start_soon()` by leveraging Trio's API design to directly pass the function and its arguments separately: ```python # Fixed code async with trio.open_nursery() as nursery: for d in docs: nursery.start_soon(doc_keyword_extraction, chat_mdl, d, topn) ``` This way, each task uses the parameter values at the time of the function call, rather than references captured through closures. ## Fixed Files Fixed closure traps in the following files: 1. `rag/svr/task_executor.py`: 3 fixes, involving document keyword extraction, question generation, and tag processing 2. `rag/raptor.py`: 1 fix, involving document summarization 3. `graphrag/utils.py`: 2 fixes, involving graph node and edge processing 4. `graphrag/entity_resolution.py`: 2 fixes, involving entity resolution and graph node merging 5. `graphrag/general/mind_map_extractor.py`: 2 fixes, involving document processing 6. `graphrag/general/extractor.py`: 3 fixes, involving content processing and graph node/edge merging 7. `graphrag/general/community_reports_extractor.py`: 1 fix, involving community report extraction ## Potential Impact This fix resolves a serious concurrency issue that could have caused: - Data processing errors (processing duplicate data) - Performance degradation (all tasks working on the same data) - Inconsistent results (some data not being processed) After the fix, all concurrent tasks should correctly process their respective data, improving system correctness and reliability.	2025-04-18 18:00:20 +08:00
balibabu	42e236f464	Feat: Rendering a search test list with real data #3221 (#7138 ) ### What problem does this PR solve? Feat: Rendering a search test list with real data #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-18 16:29:41 +08:00
QuintinTao	1b4016317e	fix bug chunking:expected string or bytes-like object (#7116 ) … bytes-like object ### What problem does this PR solve? fix bug #6990 internal server error ehile chunking:expected string or bytes-like object _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): Co-authored-by: unknown <taoshi.ln@chinatelecom.cn>	2025-04-18 14:42:36 +08:00

1 2 3 4 5 ...

2850 Commits