2850 Commits

Author SHA1 Message Date
Stephen Hu
3da8776a3c
Fix: Creating Knowledge Base Support Enter Key (#7258)
### What problem does this PR solve?


[https://github.com/infiniflow/ragflow/issues/7180](https://github.com/infiniflow/ragflow/issues/7180)
When creating a knowledge base, support the enter key
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-25 18:53:52 +08:00
balibabu
3052006ba8
Feat: Save document metadata #3221 (#7323)
### What problem does this PR solve?

Feat: Save document metadata #3221
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-25 18:38:15 +08:00
Stephen Hu
1662c7eda3
Feat: Markdown add image (#7124)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/6984

1. Markdown parser supports get pictures
2. For Native, when handling Markdown, it will handle images
3. improve merge and 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-04-25 18:35:28 +08:00
balibabu
fef44a71c5
Feat: Save the configuration information of the knowledge base document #3221 (#7317)
### What problem does this PR solve?

Feat: Save the configuration information of the knowledge base document
#3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-25 17:31:28 +08:00
Kevin Hu
b271cc34b3
Fix: LLM generated tag issue. (#7301)
### What problem does this PR solve?
#7298

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-25 14:38:34 +08:00
alulala
eead838353
Fix pymysql interface error (#7295)
### What problem does this PR solve?

According to the
[[Rucongzhang](https://github.com/Rucongzhang)](https://github.com/infiniflow/ragflow/pull/7057#issuecomment-2827410047)
I added DB reconnection strategy in function `update_by_id`
2025-04-25 13:29:47 +08:00
balibabu
02cc867c06
Feat: Display the document configuration dialog with shadcn #3221 (#7302)
### What problem does this PR solve?

Feat: Display the document configuration dialog with shadcn #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-25 11:15:44 +08:00
writinwaters
6e98cd311c
Doc: Updated sharing behavior in the open-source editions. (#7293)
### What problem does this PR solve?


### Type of change


- [x] Documentation Update
2025-04-25 09:21:33 +08:00
Yongteng Lei
97a13ef1ab
Fix: Qwen-vl-plus url error (#7281)
### What problem does this PR solve?

Fix Qwen-vl-* url error. #7277

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-25 09:20:10 +08:00
Neal Davis
7e1464a950
Feat: Replace the logo of novita (#7287)
### What problem does this PR solve?

Replace the logo of novita

### Type of change

- [x] Other (please describe): Update logo
2025-04-24 21:20:36 +08:00
writinwaters
e6a4d6bcf0
DocsHow to disable user registration (#7265)
### What problem does this PR solve?



### Type of change


- [x] Documentation Update
2025-04-24 18:02:32 +08:00
pyyuhao
c8c3b756b0
Feat: Adds OpenSearch2.19.1 as the vector_database support (#7140)
### What problem does this PR solve?

This PR adds the support for latest OpenSearch2.19.1 as the store engine
& search engine option for RAGFlow.

### Main Benefit

1. OpenSearch2.19.1 is licensed under the [Apache v2.0 License] which is
much better than Elasticsearch
2. For search, OpenSearch2.19.1 supports full-text
search、vector_search、hybrid_search those are similar with Elasticsearch
on schema
3. For store, OpenSearch2.19.1 stores text、vector those are quite
simliar with Elasticsearch on schema

### Changes

- Support opensearch_python_connetor. I make a lot of adaptions since
the schema and api/method between ES and Opensearch differs in many
ways(especially the knn_search has a significant gap) :
rag/utils/opensearch_coon.py
- Support static config adaptions by changing:
conf/service_conf.yaml、api/settings.py、rag/settings.py
- Supprt some store&search schema changes between OpenSearch and ES:
conf/os_mapping.json
- Support OpenSearch python sdk : pyproject.toml
- Support docker config for OpenSearch2.19.1 :
docker/.env、docker/docker-compose-base.yml、docker/service_conf.yaml.template

### How to use
- I didn't change the priority that ES as the default doc/search engine.
Only if in docker/.env , we set DOC_ENGINE=${DOC_ENGINE:-opensearch}, it
will work.


### Others
Our team tested a lot of docs in our environment by using OpenSearch as
the vector database ,it works very well.
All the conifg for OpenSearch is necessary.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Yongteng Lei <yongtengrey@outlook.com>
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2025-04-24 16:03:31 +08:00
balibabu
9a8dda8fc7
Feat: Delete and rename files in the knowledge base #3221 (#7268)
### What problem does this PR solve?

Feat: Delete and rename files in the knowledge base #3221
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-24 14:51:20 +08:00
balibabu
ff442c48b5
Feat: Display document parsing status #3221 (#7241)
### What problem does this PR solve?

Feat: Display document parsing status #3221
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-24 11:45:37 +08:00
benni82
216cd7474b
fix: task_executor bug fix (#7253)
### What problem does this PR solve?

The lock is not released correctly when task_exectuor is abnormal

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-04-24 11:44:34 +08:00
WhiteBear
2c62652ea8
<think> tag is missing. (#7256)
### What problem does this PR solve?

Some models force thinking, resulting in the absence of the think tag in
the returned content

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-24 11:44:10 +08:00
Wanderson Pinto dos Santos
4e8fd73a20
chore: adds pre-commit (#7242)
### What problem does this PR solve?

Sometimes after we commit the code and open the PR the CI pipeline fails
in Ruff checks. Including a pre-commit we can identify this problem
early and avoid time loss.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [X] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-04-24 11:25:33 +08:00
Pierre Lafievre
19931cd9ed
Fix: mcp server start (#7251)
### What problem does this PR solve?

Fix the entrypoint file from the docker container to solve #7249 

Here is the important part from the logs:
```
docker logs -f ragflow-server
...
usage: server.py [-h] [--base_url BASE_URL] [--host HOST] [--port PORT]
[--mode MODE] [--api_key API_KEY]
server.py: error: unrecognized arguments:
...
```

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-04-24 10:52:12 +08:00
Stephen Hu
0b460a9a12
Fix: improve retrieval API (#6744)
### What problem does this PR solve?

Get the highlight parameter from the request to keep consistency with
the document

> 
- Method: POST
- URL: `/api/v1/retrieval`
- Headers:
  - `'content-Type: application/json'`
  - `'Authorization: Bearer <YOUR_API_KEY>'`
- Body:
  - `"question"`: `string`  
  - `"dataset_ids"`: `list[string]`  
  - `"document_ids"`: `list[string]`
  - `"page"`: `integer`  
  - `"page_size"`: `integer`  
  - `"similarity_threshold"`: `float`  
  - `"vector_similarity_weight"`: `float`  
  - `"top_k"`: `integer`  
  - `"rerank_id"`: `string`  
  - `"keyword"`: `boolean`  
  - `"highlight"`: `boolean`
>

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-24 09:29:00 +08:00
yxkong
4e31eea55f
Fix/mcp doc (#7239)
### What problem does this PR solve?

This PR fixes an issue with the MCP server configuration in RAGFlow's
Docker deployment where:
1. Incorrect parameter naming (`--mcp--host-api-key` with double
hyphens) caused server startup failures
2. Port binding conflicts occurred due to unexposed MCP ports in Docker
3. Inconsistent host addressing between `0.0.0.0` and `127.0.0.1` led to
connectivity issues

The changes ensure proper MCP server initialization and reliable
inter-service communication.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

### Key Changes

1. **Parameter Correction**:
   - Fixed `--mcp--host-api-key` → `--mcp-host-api-key`
2025-04-24 09:20:26 +08:00
balibabu
1366712560
Feat: Deleting files in batches. #3221 (#7234)
### What problem does this PR solve?
Feat: Deleting files in batches. #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-23 19:03:02 +08:00
balibabu
51d9bde5a3
Feat: Create a folder #3221 (#7228)
### What problem does this PR solve?

Feat: Create a folder #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-23 15:21:09 +08:00
Yongteng Lei
94181a990b
Refa: knowledge_graph chunk method is deprecated (#7220)
### What problem does this PR solve?

The knowledge_graph chunk method is deprecated and should no longer be
used. #7184.

### Type of change

- [x] Refactoring
v0.18.0
2025-04-23 13:01:46 +08:00
liu an
03672df691
Docs: update for v0.18.0 (#7223)
### What problem does this PR solve?

update for v0.18.0

### Type of change

- [x] Documentation Update
2025-04-23 12:02:50 +08:00
writinwaters
e9669e7fb1
Updated v0.18.0 release notes (#7221)
### What problem does this PR solve?


### Type of change


- [x] Documentation Update
2025-04-23 11:12:14 +08:00
writinwaters
9a1ac8020d
v0.18.0 release notes (#7185)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2025-04-23 10:41:58 +08:00
balibabu
b44bbd11b8
Feat: Upload document #3221 (#7209)
### What problem does this PR solve?

Feat: Upload document #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-23 10:39:09 +08:00
writinwaters
1e91318445
Added a RAPTOR guide (#7211)
### What problem does this PR solve?

### Type of change

- [x] Documentation Update
2025-04-22 20:56:30 +08:00
Yongteng Lei
f35ff65c36
[BREAKING CHANGE] GET to POST: enhance kb list capability (#7205)
### What problem does this PR solve?

Enhance capability of `list_kbs`.

Breaking change: change method from `GET` to `POST`.

### Type of change

- [x] Refactoring
- [x] Enhancement with breaking change
2025-04-22 17:54:12 +08:00
balibabu
ba0e363d5a
Feat: Show the owner of this knowledge base on the list card. #3221 (#7204)
### What problem does this PR solve?

Feat: Show the owner of this knowledge base on the list card. #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-22 16:46:13 +08:00
balibabu
dde8c26feb
Feat: Even if the knowledge base has slices, the chunk method can be changed #7115 (#7201)
### What problem does this PR solve?

Feat: Even if the knowledge base has slices, the chunk method can be
changed #7115

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-22 16:04:49 +08:00
balibabu
64dd187498
Fix: Knowledge Graph Extraction Conflict Between Dataset-Level and File-Specific Configurations #7198 (#7199)
### What problem does this PR solve?

Fix: Knowledge Graph Extraction Conflict Between Dataset-Level and
File-Specific Configurations #7198

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-22 16:00:55 +08:00
Yongteng Lei
67dee2d74e
Fix: fix retrieval tesing wrong pagination (#7174)
### What problem does this PR solve?

Fix retrieval testing wrong pagination. #7171 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-04-22 15:16:04 +08:00
balibabu
bcac195a0c
Put the knowledge base list related hooks into use-knowledge-request.ts #3221 (#7197)
### What problem does this PR solve?

Put the knowledge base list related hooks into use-knowledge-request.ts
#3221
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-22 15:01:35 +08:00
balibabu
8fca8faa7d
Feat: Move langfuse configuration to api page #6155 (#7196)
### What problem does this PR solve?

Feat: Move langfuse configuration to api page #6155

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-22 14:08:20 +08:00
balibabu
1cc17eb611
Feat: Filter the knowledge base list using owner #3221 (#7191)
### What problem does this PR solve?

Feat: Filter the knowledge base list using owner #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-22 13:44:41 +08:00
Wanderson Pinto dos Santos
c8194f5fd0
refactor: Update Redis configuration to use StatefulSet instead of deployment with pvc (#7187)
### What problem does this PR solve?

This PR changes Redis to be a statefulset. In some situation when we
Redis pod gets rescheduled to another Node, it gets stuck in pending
state due to the PVC attached to another Kubernetes node.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [X] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-04-22 12:53:30 +08:00
Kevin Hu
f2c9ffc056
Fix: KG search issue. (#7186)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-22 12:10:30 +08:00
zhudongwork
10432a1be7
Refa: Optimize pptx shape extraction to reduce content loss (#6703)
### What problem does this PR solve?

When parsing pptx files, some shapes do not contain the `shape_type`
attribute, which causes the original code to throw an exception during
extraction, leading to failure in content extraction. This optimization
introduces handling logic for such anomalous shapes, providing a safer
and more robust processing mechanism.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [x] Refactoring
- [x] Performance Improvement
- [ ] Other (please describe):
2025-04-22 10:16:24 +08:00
balibabu
e7f83b13ca
Feat: Rename a dataset #3221 (#7162)
### What problem does this PR solve?

Feat: Rename a dataset #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-22 10:09:41 +08:00
Yongteng Lei
ad220a0a3c
Feat: add mcp self-host mode (#7157)
### What problem does this PR solve?

Add mcp self-host mode, a complement of #7084.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-04-22 10:04:21 +08:00
Yongteng Lei
91c5a5c08f
Docs: add mcp self-host mode (#7163)
### What problem does this PR solve?

Add mcp self-host mode documentation, a complement of #7141.

### Type of change

- [x] Documentation Update

---------

Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2025-04-22 10:03:38 +08:00
Song Fuchang
8362ab405c
Fix: don't modify S3 file name when not using prefix_path (#7152)
### What problem does this PR solve?

Hello, I encountered a problem when trying to use a S3 backend
(seaweedfs) for storage in RAGFlow: when calling
`STORAGE_IMPL.get("bucket", "key")`, the actual request sent to S3 is
`bucket/bucket/key`, causing a `NoSuchKey` error.

I compared the code in `s3_conn.py` to `minio_conn.py` and
`oss_conn.py`, then decided to remove the `else` branch in
`use_prefix_path` method, and it works. I didn't configure `prefix_path`
or `bucket` in `s3` section of the `service_conf.yaml`.

I think this is a bug, but not sure.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-04-21 11:55:50 +08:00
Yongteng Lei
68b9dae6c0
Feat: mcp server (#7084)
### What problem does this PR solve?

Add MCP support with a client example.

Issue link: #4344

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-04-21 09:43:20 +08:00
Yongteng Lei
9b956ac1a9
Docs: MCP server (#7141)
### What problem does this PR solve?

Documentation for MCP server

### Type of change

- [x] Documentation Update

---------

Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2025-04-21 09:42:32 +08:00
liuzhenghua
d4dbdfb61d
feat: Recover pending tasks while pod restart. (#7073)
### What problem does this PR solve?

If you deploy Ragflow using Kubernetes, the hostname will change during
a rolling update. This causes the consumer name of the task executor to
change, making it impossible to schedule tasks that were previously in a
pending state.
To address this, I introduced a recovery task that scans these pending
messages and re-publishes them, allowing the tasks to continue being
processed.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

---------

Co-authored-by: liuzhenghua-jk <liuzhenghua-jk@360shuke.com>
2025-04-19 16:18:51 +08:00
Kevin Hu
487aed419e
Fix: cite disfunction for G component. (#7117)
### What problem does this PR solve?

#7097

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-18 18:05:26 +08:00
aniaan
8b8a2f2949
fix(nursery): Fix Closure Trap Issues in Trio Concurrent Tasks (#7106)
## Problem Description
Multiple files in the RAGFlow project contain closure trap issues when
using lambda functions with `trio.open_nursery()`. This problem causes
concurrent tasks created in loops to reference the same variable,
resulting in all tasks processing the same data (the data from the last
iteration) rather than each task processing its corresponding data from
the loop.

## Issue Details
When using a `lambda` to create a closure function and passing it to
`nursery.start_soon()` within a loop, the lambda function captures a
reference to the loop variable rather than its value. For example:

```python
# Problematic code
async with trio.open_nursery() as nursery:
    for d in docs:
        nursery.start_soon(lambda: doc_keyword_extraction(chat_mdl, d, topn))
```

In this pattern, when concurrent tasks begin execution, `d` has already
become the value after the loop ends (typically the last element),
causing all tasks to use the same data.

## Fix Solution
Changed the way concurrent tasks are created with `nursery.start_soon()`
by leveraging Trio's API design to directly pass the function and its
arguments separately:

```python
# Fixed code
async with trio.open_nursery() as nursery:
    for d in docs:
        nursery.start_soon(doc_keyword_extraction, chat_mdl, d, topn)
```

This way, each task uses the parameter values at the time of the
function call, rather than references captured through closures.

## Fixed Files
Fixed closure traps in the following files:

1. `rag/svr/task_executor.py`: 3 fixes, involving document keyword
extraction, question generation, and tag processing
2. `rag/raptor.py`: 1 fix, involving document summarization
3. `graphrag/utils.py`: 2 fixes, involving graph node and edge
processing
4. `graphrag/entity_resolution.py`: 2 fixes, involving entity resolution
and graph node merging
5. `graphrag/general/mind_map_extractor.py`: 2 fixes, involving document
processing
6. `graphrag/general/extractor.py`: 3 fixes, involving content
processing and graph node/edge merging
7. `graphrag/general/community_reports_extractor.py`: 1 fix, involving
community report extraction

## Potential Impact
This fix resolves a serious concurrency issue that could have caused:
- Data processing errors (processing duplicate data)
- Performance degradation (all tasks working on the same data)
- Inconsistent results (some data not being processed)

After the fix, all concurrent tasks should correctly process their
respective data, improving system correctness and reliability.
2025-04-18 18:00:20 +08:00
balibabu
42e236f464
Feat: Rendering a search test list with real data #3221 (#7138)
### What problem does this PR solve?

Feat: Rendering a search test list with real data #3221
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-18 16:29:41 +08:00
QuintinTao
1b4016317e
fix bug chunking:expected string or bytes-like object (#7116)
… bytes-like object

### What problem does this PR solve?
fix bug #6990 internal server error ehile chunking:expected string or
bytes-like object
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

Co-authored-by: unknown <taoshi.ln@chinatelecom.cn>
2025-04-18 14:42:36 +08:00