2827 Commits

Author SHA1 Message Date
liu an
03672df691
Docs: update for v0.18.0 (#7223)
### What problem does this PR solve?

update for v0.18.0

### Type of change

- [x] Documentation Update
2025-04-23 12:02:50 +08:00
writinwaters
e9669e7fb1
Updated v0.18.0 release notes (#7221)
### What problem does this PR solve?


### Type of change


- [x] Documentation Update
2025-04-23 11:12:14 +08:00
writinwaters
9a1ac8020d
v0.18.0 release notes (#7185)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2025-04-23 10:41:58 +08:00
balibabu
b44bbd11b8
Feat: Upload document #3221 (#7209)
### What problem does this PR solve?

Feat: Upload document #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-23 10:39:09 +08:00
writinwaters
1e91318445
Added a RAPTOR guide (#7211)
### What problem does this PR solve?

### Type of change

- [x] Documentation Update
2025-04-22 20:56:30 +08:00
Yongteng Lei
f35ff65c36
[BREAKING CHANGE] GET to POST: enhance kb list capability (#7205)
### What problem does this PR solve?

Enhance capability of `list_kbs`.

Breaking change: change method from `GET` to `POST`.

### Type of change

- [x] Refactoring
- [x] Enhancement with breaking change
2025-04-22 17:54:12 +08:00
balibabu
ba0e363d5a
Feat: Show the owner of this knowledge base on the list card. #3221 (#7204)
### What problem does this PR solve?

Feat: Show the owner of this knowledge base on the list card. #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-22 16:46:13 +08:00
balibabu
dde8c26feb
Feat: Even if the knowledge base has slices, the chunk method can be changed #7115 (#7201)
### What problem does this PR solve?

Feat: Even if the knowledge base has slices, the chunk method can be
changed #7115

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-22 16:04:49 +08:00
balibabu
64dd187498
Fix: Knowledge Graph Extraction Conflict Between Dataset-Level and File-Specific Configurations #7198 (#7199)
### What problem does this PR solve?

Fix: Knowledge Graph Extraction Conflict Between Dataset-Level and
File-Specific Configurations #7198

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-22 16:00:55 +08:00
Yongteng Lei
67dee2d74e
Fix: fix retrieval tesing wrong pagination (#7174)
### What problem does this PR solve?

Fix retrieval testing wrong pagination. #7171 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-04-22 15:16:04 +08:00
balibabu
bcac195a0c
Put the knowledge base list related hooks into use-knowledge-request.ts #3221 (#7197)
### What problem does this PR solve?

Put the knowledge base list related hooks into use-knowledge-request.ts
#3221
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-22 15:01:35 +08:00
balibabu
8fca8faa7d
Feat: Move langfuse configuration to api page #6155 (#7196)
### What problem does this PR solve?

Feat: Move langfuse configuration to api page #6155

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-22 14:08:20 +08:00
balibabu
1cc17eb611
Feat: Filter the knowledge base list using owner #3221 (#7191)
### What problem does this PR solve?

Feat: Filter the knowledge base list using owner #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-22 13:44:41 +08:00
Wanderson Pinto dos Santos
c8194f5fd0
refactor: Update Redis configuration to use StatefulSet instead of deployment with pvc (#7187)
### What problem does this PR solve?

This PR changes Redis to be a statefulset. In some situation when we
Redis pod gets rescheduled to another Node, it gets stuck in pending
state due to the PVC attached to another Kubernetes node.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [X] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-04-22 12:53:30 +08:00
Kevin Hu
f2c9ffc056
Fix: KG search issue. (#7186)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-22 12:10:30 +08:00
zhudongwork
10432a1be7
Refa: Optimize pptx shape extraction to reduce content loss (#6703)
### What problem does this PR solve?

When parsing pptx files, some shapes do not contain the `shape_type`
attribute, which causes the original code to throw an exception during
extraction, leading to failure in content extraction. This optimization
introduces handling logic for such anomalous shapes, providing a safer
and more robust processing mechanism.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [x] Refactoring
- [x] Performance Improvement
- [ ] Other (please describe):
2025-04-22 10:16:24 +08:00
balibabu
e7f83b13ca
Feat: Rename a dataset #3221 (#7162)
### What problem does this PR solve?

Feat: Rename a dataset #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-22 10:09:41 +08:00
Yongteng Lei
ad220a0a3c
Feat: add mcp self-host mode (#7157)
### What problem does this PR solve?

Add mcp self-host mode, a complement of #7084.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-04-22 10:04:21 +08:00
Yongteng Lei
91c5a5c08f
Docs: add mcp self-host mode (#7163)
### What problem does this PR solve?

Add mcp self-host mode documentation, a complement of #7141.

### Type of change

- [x] Documentation Update

---------

Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2025-04-22 10:03:38 +08:00
Song Fuchang
8362ab405c
Fix: don't modify S3 file name when not using prefix_path (#7152)
### What problem does this PR solve?

Hello, I encountered a problem when trying to use a S3 backend
(seaweedfs) for storage in RAGFlow: when calling
`STORAGE_IMPL.get("bucket", "key")`, the actual request sent to S3 is
`bucket/bucket/key`, causing a `NoSuchKey` error.

I compared the code in `s3_conn.py` to `minio_conn.py` and
`oss_conn.py`, then decided to remove the `else` branch in
`use_prefix_path` method, and it works. I didn't configure `prefix_path`
or `bucket` in `s3` section of the `service_conf.yaml`.

I think this is a bug, but not sure.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-04-21 11:55:50 +08:00
Yongteng Lei
68b9dae6c0
Feat: mcp server (#7084)
### What problem does this PR solve?

Add MCP support with a client example.

Issue link: #4344

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-04-21 09:43:20 +08:00
Yongteng Lei
9b956ac1a9
Docs: MCP server (#7141)
### What problem does this PR solve?

Documentation for MCP server

### Type of change

- [x] Documentation Update

---------

Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2025-04-21 09:42:32 +08:00
liuzhenghua
d4dbdfb61d
feat: Recover pending tasks while pod restart. (#7073)
### What problem does this PR solve?

If you deploy Ragflow using Kubernetes, the hostname will change during
a rolling update. This causes the consumer name of the task executor to
change, making it impossible to schedule tasks that were previously in a
pending state.
To address this, I introduced a recovery task that scans these pending
messages and re-publishes them, allowing the tasks to continue being
processed.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

---------

Co-authored-by: liuzhenghua-jk <liuzhenghua-jk@360shuke.com>
2025-04-19 16:18:51 +08:00
Kevin Hu
487aed419e
Fix: cite disfunction for G component. (#7117)
### What problem does this PR solve?

#7097

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-18 18:05:26 +08:00
aniaan
8b8a2f2949
fix(nursery): Fix Closure Trap Issues in Trio Concurrent Tasks (#7106)
## Problem Description
Multiple files in the RAGFlow project contain closure trap issues when
using lambda functions with `trio.open_nursery()`. This problem causes
concurrent tasks created in loops to reference the same variable,
resulting in all tasks processing the same data (the data from the last
iteration) rather than each task processing its corresponding data from
the loop.

## Issue Details
When using a `lambda` to create a closure function and passing it to
`nursery.start_soon()` within a loop, the lambda function captures a
reference to the loop variable rather than its value. For example:

```python
# Problematic code
async with trio.open_nursery() as nursery:
    for d in docs:
        nursery.start_soon(lambda: doc_keyword_extraction(chat_mdl, d, topn))
```

In this pattern, when concurrent tasks begin execution, `d` has already
become the value after the loop ends (typically the last element),
causing all tasks to use the same data.

## Fix Solution
Changed the way concurrent tasks are created with `nursery.start_soon()`
by leveraging Trio's API design to directly pass the function and its
arguments separately:

```python
# Fixed code
async with trio.open_nursery() as nursery:
    for d in docs:
        nursery.start_soon(doc_keyword_extraction, chat_mdl, d, topn)
```

This way, each task uses the parameter values at the time of the
function call, rather than references captured through closures.

## Fixed Files
Fixed closure traps in the following files:

1. `rag/svr/task_executor.py`: 3 fixes, involving document keyword
extraction, question generation, and tag processing
2. `rag/raptor.py`: 1 fix, involving document summarization
3. `graphrag/utils.py`: 2 fixes, involving graph node and edge
processing
4. `graphrag/entity_resolution.py`: 2 fixes, involving entity resolution
and graph node merging
5. `graphrag/general/mind_map_extractor.py`: 2 fixes, involving document
processing
6. `graphrag/general/extractor.py`: 3 fixes, involving content
processing and graph node/edge merging
7. `graphrag/general/community_reports_extractor.py`: 1 fix, involving
community report extraction

## Potential Impact
This fix resolves a serious concurrency issue that could have caused:
- Data processing errors (processing duplicate data)
- Performance degradation (all tasks working on the same data)
- Inconsistent results (some data not being processed)

After the fix, all concurrent tasks should correctly process their
respective data, improving system correctness and reliability.
2025-04-18 18:00:20 +08:00
balibabu
42e236f464
Feat: Rendering a search test list with real data #3221 (#7138)
### What problem does this PR solve?

Feat: Rendering a search test list with real data #3221
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-18 16:29:41 +08:00
QuintinTao
1b4016317e
fix bug chunking:expected string or bytes-like object (#7116)
… bytes-like object

### What problem does this PR solve?
fix bug #6990 internal server error ehile chunking:expected string or
bytes-like object
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

Co-authored-by: unknown <taoshi.ln@chinatelecom.cn>
2025-04-18 14:42:36 +08:00
Stephen Hu
b1798bafb0
Fix: handle sometimes graph index will miss explanation (#7127)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/7053

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-18 14:24:36 +08:00
balibabu
86f76df586
Feat: Retrieval test #3221 (#7121)
### What problem does this PR solve?

Feat: Retrieval test #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-17 19:03:55 +08:00
Stephen Hu
db82c15de4
Fix: wrong “available” property when list chunk (#7093)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/7083

Internal due to when returning from ES, fields changed to str, so the
bool conversion does not work as expected.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-17 17:17:35 +08:00
BUJIQI
627fd002ae
Update utils.py (#7091)
### What problem does this PR solve?

when there are multiple entities, the variable `v` may be a list, which
will lead to this error:
```
| File "/mnt/d/wrf/ragflow/ragflow/graphrag/utils.py", line 59, in replace_all
| result = result.replace(f"{{{k}}}", v)
| TypeError: replace() argument 2 must be str, not list
```
this pr assign this `v` to be a str

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-04-17 17:17:09 +08:00
alkscr
9e7d052c8d
Fix: knowledge graph resolution with infinity raise error tokenizing in specific situations (#7048)
### What problem does this PR solve?

When running graph resolution with infinity, if single quotation marks
appeared in the entities name that to be delete, an error tokenizing of
sqlglot might occur after calling infinity.

For example:
```
INFINITY delete table ragflow_xxx, filter knowledge_graph_kwd IN ('entity') AND entity_kwd IN ('86 IMAGES FROM PREVIOUS CONTESTS', 'ADAM OPTIMIZATION', 'BACKGROUND'ESTIMATION')
```
may raise error
```
Error tokenizing 'TS', 'ADAM OPTIMIZATION', 'BACKGROUND'ESTIMATION''
```
and make the document parsing failed。

Replace one single quotation mark with double single quotation marks can
let sqlglot tokenize the entity name correctly.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-17 16:15:21 +08:00
long Cai
d9927f5185
Fix: Error in sending placeholder words in Chinese and Chinese-Traditional (#7094)
### What problem does this PR solve?

The assistant message placeholder is incorrect, I have finished
modifying both Chinese and traditional Chinese characters

### Type of change


- [x] Bug Fix
2025-04-17 15:52:03 +08:00
alulala
5d253e0a34
Fix: pymysql.err.InterfaceError: (0, '') during long time streaming chat responses (#6548) (#7057)
### Related Issue:
https://github.com/infiniflow/ragflow/issues/6548

### Related PR:
https://github.com/infiniflow/ragflow/pull/6861


### Environment:
Commit version:
[[48730e0](48730e00a8)]

### Bug Description:
Unexpected `pymysql.err.InterfaceError: (0, '') `when using Peewee +
PyMySQL + PooledMySQLDatabase after a long-running `chat streamly`
operation.

This is a common issue with Peewee + PyMySQL + connection pooling: you
end up using a connection that was silently closed by the server, but
Peewee doesn't realize it's dead.

**I found that the error only occurs during longer streaming outputs**
and is unrelated to the database connection context, so it's likely
because:

- The prolonged streaming response caused the database connection to
time out

- The original database connection might have been disconnected by the
server during the streaming process

### Why This Happens
This error happens even when using `@DB.connection_context() `after the
stream is done. After investigation, I found this is caused by MySQL
connection pools that appear to be open but are actually dead (expired
due to` wait_timeout`).

1. `@DB.connection_context()` (as a decorator or context manager) pulls
a connection from the pool.

2. If this connection was idle and expired on the MySQL server (e.g.,
due to `wait_timeout`), but not closed in Python, it will still be
considered “open” (`DB.is_closed() == False`).

3. The real error will occur only when I execute a SQL command (such as
.`get_or_none()`), and PyMySQL tries to send it to the server via a
broken socket.


### Changes Made:

1. I implemented manual connection checks before executing SQL:
```
    try:
        DB.execute_sql("SELECT 1")
    except Exception:
        print("Connection dead, reconnecting...")
        DB.close()
        DB.connect()
```
2. Delayed the token count update until after the streaming response is
completed to ensure the streaming output isn't interrupted by database
operations.
```
        total_tokens = 0 
        for txt in chat_streamly(system, history, gen_conf):
            if isinstance(txt, int):
                total_tokens = txt
......
                break
......
        if total_tokens > 0:
            if not TenantLLMService.increase_usage(self.tenant_id, self.llm_type, txt, self.llm_name):
                logging.error("LLMBundle.chat_streamly can't update token usage for {}/CHAT llm_name: {}, content: {}".format(self.tenant_id, self.llm_name, txt))
```
2025-04-16 19:15:35 +08:00
balibabu
de5727f90a
Fix: Files being parsed are not allowed to be deleted in batches #7065 (#7066)
### What problem does this PR solve?

Fix: Files being parsed are not allowed to be deleted in batches #7065

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-16 16:46:24 +08:00
writinwaters
9c2dd70839
Miscellaneous editorial updates. (#7047)
### What problem does this PR solve?

#6910 

### Type of change

- [x] Documentation Update
2025-04-16 10:31:10 +08:00
liu an
e0e78112a2
Docs: Change DELETE to POST in Related Questions curl example (#7054)
### What problem does this PR solve?

docs(api): Fix request method in Related Questions example (DELETE→POST)

### Type of change

- [x] Documentation Update
2025-04-16 10:29:59 +08:00
Kevin Hu
48730e00a8
Docs: updates. (#7042)
### What problem does this PR solve?

#7019

### Type of change

- [x] Documentation Update
2025-04-15 17:45:52 +08:00
liu an
e5f9d148e7
Test: Added test cases for Delete Sessions With Chat Assistant HTTP API (#7025)
### What problem does this PR solve?

cover [Delete chat assistant's
sessions](https://ragflow.io/docs/dev/http_api_reference#delete-chat-assistants-sessions)
endpoints

### Type of change

- [x] Add test cases
2025-04-15 14:54:26 +08:00
Stephen Hu
f6b280e372
Fix: when remove document do not delete the file in storage if the source is not knowledge base (#7005)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/6905

When deleting a document will check before removing it from storage

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-15 12:11:41 +08:00
Kevin Hu
5af2d57086
Refa. (#7022)
### What problem does this PR solve?


### Type of change

- [x] Refactoring
2025-04-15 10:20:33 +08:00
Yongteng Lei
7a34159737
Fix: add fallback for bad citation output (#7014)
### What problem does this PR solve?

Add fallback for bad citation output. #6948

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-15 09:33:53 +08:00
Franck Kerbiriou
b1fa5a0754
Fix Helm Ingress template (#7018)
### What problem does this PR solve?

Fix Helm Ingress template; Trying to access a global variable within a
loop
Fix #6191

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-04-15 09:19:37 +08:00
Yongteng Lei
018ff4dd0a
Refa: update llms (#7007)
### What problem does this PR solve?

Update LLM models

### Type of change

- [x] Refactoring
2025-04-15 09:19:07 +08:00
balibabu
ed352710ec
Feat: Remove the rotation state of the button that parses the document #7008 (#7009)
### What problem does this PR solve?

Feat: Remove the rotation state of the button that parses the document
#7008
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-14 18:50:11 +08:00
Kevin Hu
0a0c1edce3
Docs: readme updating. (#7002)
### What problem does this PR solve?

### Type of change

- [x] Documentation Update

---------

Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2025-04-14 14:45:37 +08:00
balibabu
18eb76f6b8
Fix: The selected state of the TreeView node cannot be seen on Mac #7000 (#7001)
### What problem does this PR solve?

Fix: The selected state of the TreeView node cannot be seen on Mac #7000

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-14 14:23:26 +08:00
Kevin Hu
ed5f81b02e
Fix: abnormal cell mergeing. (#6991)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-14 11:00:11 +08:00
Zhichang Yu
23c5ce48d1
Fix update_progress issue (#6992)
### What problem does this PR solve?

Fix update_progress issue introduced by #6975 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-14 10:23:13 +08:00
balibabu
de766ba628
Fix: Fix api page translation issue. #3221 (#6993)
### What problem does this PR solve?

Fix: Fix api page translation issue. #3221

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-14 10:23:00 +08:00