371 Commits

Author SHA1 Message Date
Jin Hai
31f4d44c73
Update upload filename length limit from 128 to 256, which is aligned with os (#7971)
### What problem does this PR solve?

Change filename length limit from 128 to 256

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-05-30 14:25:59 +08:00
Qidi Cao
f0879563d0
fix: resolve residual image files issue after document deletion (#7964)
### What problem does this PR solve?

When deleting knowledge base documents in RAGFlow, the current process
only removes the block texts in Elasticsearch and the original files in
MinIO, but it leaves behind many binary images and thumbnails generated
during chunking. This pull request improves the deletion process by
querying the block information in Elasticsearch to ensure a more
thorough and complete cleanup.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-30 12:56:33 +08:00
Stephen Hu
a31ad7f960
Fix: File selection in Retrieval testing causes other options to disappear (#7759)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/7753

The internal is due to when the selected row keys change will trigger a
testing, but I do not know why.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-30 09:38:50 +08:00
Yongteng Lei
0c562f0a9f
Refa: change citation mark as [ID:n] (#7923)
### What problem does this PR solve?

Change citation mark as [ID:n], it's easier for LLMs to follow the
instruction :) #7904

### Type of change

- [x] Refactoring
2025-05-29 10:03:51 +08:00
sinopec
243ed4bc35
Feat: Surpport dynamically add knowledge basees for retrieval while u… (#7915)
…sing the SDK chat API

### What problem does this PR solve?

When using the SDK for chat, you can include the IDs of additional
knowledge bases you want to use in the request. This way, you don’t need
to repeatedly create new assistants to support various combinations of
knowledge bases. This is especially useful when there are many knowledge
bases with different content. If users clearly know which knowledge base
contains the information they need and select accordingly, the recall
accuracy will be greatly improved.

Users only need to add an extra field, a kb_ids array, in the HTTP
request. The content of this field can be determined by the client
fetching the list of knowledge bases and letting the user select from
it.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: Li Ye <liye@unittec.com>
2025-05-28 19:16:16 +08:00
liu an
ff0e82988f
Fix: patch regex vulnerability in filename handling (#7887)
### What problem does this PR solve?

[Regular Expression Injection leading to Denial of Service
(ReDoS)](https://github.com/infiniflow/ragflow/security/advisories/GHSA-wqq6-x8g9-f7mh)

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-27 16:35:37 +08:00
Yongteng Lei
453287b06b Feat: more robust fallbacks for citations (#7801)
### What problem does this PR solve?

Add more robust fallbacks for citations

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
2025-05-23 18:24:55 +08:00
Yongteng Lei
42f4d4dbc8 Fix: wrong type hint (#7738)
### What problem does this PR solve?

Wrong hint type. #7729.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-23 18:21:06 +08:00
Yongteng Lei
e8e2a95165
Refa: more fallbacks for bad citation format (#7710)
### What problem does this PR solve?

More fallbacks for bad citation format

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2025-05-19 19:34:05 +08:00
Yongteng Lei
0ebf05440e
Feat: repair corrupted PDF files on upload automatically (#7693)
### What problem does this PR solve?

Try the best to repair corrupted PDF files on upload automatically.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-05-19 14:54:06 +08:00
Song Fuchang
a1f06a4fdc
Feat: Support tool calling in Generate component (#7572)
### What problem does this PR solve?

Hello, our use case requires LLM agent to invoke some tools, so I made a
simple implementation here.

This PR does two things:

1. A simple plugin mechanism based on `pluginlib`:

This mechanism lives in the `plugin` directory. It will only load
plugins from `plugin/embedded_plugins` for now.

A sample plugin `bad_calculator.py` is placed in
`plugin/embedded_plugins/llm_tools`, it accepts two numbers `a` and `b`,
then give a wrong result `a + b + 100`.

In the future, it can load plugins from external location with little
code change.

Plugins are divided into different types. The only plugin type supported
in this PR is `llm_tools`, which must implement the `LLMToolPlugin`
class in the `plugin/llm_tool_plugin.py`.
More plugin types can be added in the future.

2. A tool selector in the `Generate` component:

Added a tool selector to select one or more tools for LLM:


![image](https://github.com/user-attachments/assets/74a21fdf-9333-4175-991b-43df6524c5dc)

And with the `bad_calculator` tool, it results this with the `qwen-max`
model:


![image](https://github.com/user-attachments/assets/93aff9c4-8550-414a-90a2-1a15a5249d94)


### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2025-05-16 16:32:19 +08:00
Stephen Hu
2fa8e3309f
Fix: file name length limit mismtach (#7630)
### What problem does this PR solve?

Close #7597

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-14 10:13:03 +08:00
liu an
6bd7d572ec
Perf: Increase database connection pool size (#7559)
### What problem does this PR solve?

1. The MySQL instance is configured with max_connections=1000,
but our connection pool was limited to max_connections: 100.
This mismatch caused connection pool exhaustion during performance
testing.

2.  Increase stale_timeout to resolve #6548

### Type of change

- [x] Performance Improvement
2025-05-09 17:52:03 +08:00
Kevin Hu
2ccec93d71
Feat: support cross-lang search. (#7557)
### What problem does this PR solve?

#7376
#4503
#5710 
#7470

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-05-09 15:32:02 +08:00
Yongteng Lei
b781207752
Feat: KB detail supports document total size (#7546)
### What problem does this PR solve?

Kb detail supports return document total size now.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-05-09 11:48:54 +08:00
Kevin Hu
9849230a04
Fix: remove deprecated novitaAI. (#7511)
### What problem does this PR solve?

#7484

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-07 19:36:16 +08:00
Stephen Hu
27ffc0ed74
Feat: Improve 'user_canvan_version' delete and 'document' delete performance (#6553)
### What problem does this PR solve?

1.  Add delete_by_ids method
2. Add get_doc_ids_by_doc_names
3. Improve user_canvan_version's logic (avoid O(n) db IO)
4. Improve document delete logic (avoid O(n) db IO)

### Type of change

- [x] Performance Improvement
2025-05-07 10:55:08 +08:00
Kevin Hu
75b24ba02a
Fix: chat solo issue. (#7479)
### What problem does this PR solve?



### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-06 19:30:00 +08:00
Yongteng Lei
f29a5de9f5
Fix: filed_map was incorrectly persisted (#7443)
### What problem does this PR solve?

Fix `filed_map` was incorrectly persisted. #7412 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-06 09:44:38 +08:00
Stephen Hu
2dbcc0a1bf
Fix: Tried to fix the fid mis match under some cases (#7426)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/7407

Based on this context, I think there should be some reasons that let
some LLMs have a mismatch (add the wrong "@xxx"),
So I think when use fid can not fetch llm then tried to just use name
should can fetch it.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-30 14:55:21 +08:00
alkscr
ab27609a64
Fix: whole knowledge graph lost after removing any document in the knowledge base (#7151)
### What problem does this PR solve?

When you removed any document in a knowledge base using knowledge graph,
the graph's `removed_kwd` is set to "Y".
However, in the function `graphrag.utils.get_gaph`, `rebuild_graph`
method is passed and directly return `None` while `removed_kwd=Y`,
making residual part of the graph abandoned (but old entity data still
exist in db).

Besides, infinity instance actually pass deleting graph components'
`source_id` when removing document. It may cause wrong graph after
rebuild.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-30 09:43:17 +08:00
Yongteng Lei
a4be6c50cf
[BREAKING CHANGE] GET to POST: enhance document list capability (#7349)
### What problem does this PR solve?

Enhance capability of `list_docs`.

Breaking change: change method from `GET` to `POST`.

### Type of change

- [x] Refactoring
- [x] Enhancement with breaking change
2025-04-27 16:48:27 +08:00
alulala
eead838353
Fix pymysql interface error (#7295)
### What problem does this PR solve?

According to the
[[Rucongzhang](https://github.com/Rucongzhang)](https://github.com/infiniflow/ragflow/pull/7057#issuecomment-2827410047)
I added DB reconnection strategy in function `update_by_id`
2025-04-25 13:29:47 +08:00
WhiteBear
2c62652ea8
<think> tag is missing. (#7256)
### What problem does this PR solve?

Some models force thinking, resulting in the absence of the think tag in
the returned content

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-24 11:44:10 +08:00
Yongteng Lei
f35ff65c36
[BREAKING CHANGE] GET to POST: enhance kb list capability (#7205)
### What problem does this PR solve?

Enhance capability of `list_kbs`.

Breaking change: change method from `GET` to `POST`.

### Type of change

- [x] Refactoring
- [x] Enhancement with breaking change
2025-04-22 17:54:12 +08:00
alulala
5d253e0a34
Fix: pymysql.err.InterfaceError: (0, '') during long time streaming chat responses (#6548) (#7057)
### Related Issue:
https://github.com/infiniflow/ragflow/issues/6548

### Related PR:
https://github.com/infiniflow/ragflow/pull/6861


### Environment:
Commit version:
[[48730e0](48730e00a8)]

### Bug Description:
Unexpected `pymysql.err.InterfaceError: (0, '') `when using Peewee +
PyMySQL + PooledMySQLDatabase after a long-running `chat streamly`
operation.

This is a common issue with Peewee + PyMySQL + connection pooling: you
end up using a connection that was silently closed by the server, but
Peewee doesn't realize it's dead.

**I found that the error only occurs during longer streaming outputs**
and is unrelated to the database connection context, so it's likely
because:

- The prolonged streaming response caused the database connection to
time out

- The original database connection might have been disconnected by the
server during the streaming process

### Why This Happens
This error happens even when using `@DB.connection_context() `after the
stream is done. After investigation, I found this is caused by MySQL
connection pools that appear to be open but are actually dead (expired
due to` wait_timeout`).

1. `@DB.connection_context()` (as a decorator or context manager) pulls
a connection from the pool.

2. If this connection was idle and expired on the MySQL server (e.g.,
due to `wait_timeout`), but not closed in Python, it will still be
considered “open” (`DB.is_closed() == False`).

3. The real error will occur only when I execute a SQL command (such as
.`get_or_none()`), and PyMySQL tries to send it to the server via a
broken socket.


### Changes Made:

1. I implemented manual connection checks before executing SQL:
```
    try:
        DB.execute_sql("SELECT 1")
    except Exception:
        print("Connection dead, reconnecting...")
        DB.close()
        DB.connect()
```
2. Delayed the token count update until after the streaming response is
completed to ensure the streaming output isn't interrupted by database
operations.
```
        total_tokens = 0 
        for txt in chat_streamly(system, history, gen_conf):
            if isinstance(txt, int):
                total_tokens = txt
......
                break
......
        if total_tokens > 0:
            if not TenantLLMService.increase_usage(self.tenant_id, self.llm_type, txt, self.llm_name):
                logging.error("LLMBundle.chat_streamly can't update token usage for {}/CHAT llm_name: {}, content: {}".format(self.tenant_id, self.llm_name, txt))
```
2025-04-16 19:15:35 +08:00
Kevin Hu
5af2d57086
Refa. (#7022)
### What problem does this PR solve?


### Type of change

- [x] Refactoring
2025-04-15 10:20:33 +08:00
Yongteng Lei
7a34159737
Fix: add fallback for bad citation output (#7014)
### What problem does this PR solve?

Add fallback for bad citation output. #6948

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-15 09:33:53 +08:00
Yongteng Lei
98670c3755
Fix: KB update_time changed whenever system relaunched (#6959)
### What problem does this PR solve?

Fix KB update_time changed whenever system relaunched. #6953 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-11 20:10:49 +08:00
Yongteng Lei
dc2c74b249
Feat: add primitive support for function calls (#6840)
### What problem does this PR solve?

This PR introduces ​**​primitive support for function calls​**​,
enabling the system to handle basic function call capabilities.
However, this feature is currently experimental and ​**​not yet enabled
for general use​**​, as it is only supported by a subset of models,
namely, Qwen and OpenAI models.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-04-08 16:09:03 +08:00
caiming100
a20439bf81
fix: add exception handling for get_by_id method (#6861)
### What problem does this PR solve?

Fixes #6548 

Add exception handling to prevent exceptions from propagating back to
the web, which may lead to failure in displaying conversation content.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

Co-authored-by: cm <caiming@sict.ac.cn>
2025-04-08 16:06:57 +08:00
so95
cded812b97
Feat: add OpenAI compatible API for agent (#6329)
### What problem does this PR solve?
add openai agent
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-04-03 16:51:37 +08:00
Kevin Hu
9ecc78feeb
Refa: copywriting refinement. (#6779)
### What problem does this PR solve?

Close #6762

### Type of change

- [x] Refactoring
2025-04-03 11:38:02 +08:00
Song Fuchang
d4a3e9a7cc
Fix table migration on non-exist-yet indexed columns. (#6666)
### What problem does this PR solve?

Fix #6334

Hello, I encountered the same problem in #6334. In the
`api/db/db_models.py`, it calls `obj.create_table()` unconditionally in
`init_database_tables`, before the `migrate_db()`. Specially for the
`permission` field of `user_canvas` table, it has `index=True`, which
causes `peewee` to issue a SQL trying to create the index when the field
does not exist (the `user_canvas` table already exists), so
`psycopg2.errors.UndefinedColumn: column "permission" does not exist`
occurred.

I've added a judgement in the code, to only call `create_table()` when
the table does not exist, delegate the migration process to
`migrate_db()`.

Then another problem occurs: the `migrate_db()` actually does nothing
because it failed on the first migration! The `playhouse` blindly issue
DDLs without things like `IF NOT EXISTS`, so it fails... even if the
exception is `pass`, the transaction is still rolled back. So I removed
the transaction in `migrate_db()` to make it work.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-03-31 11:27:20 +08:00
Zhichang Yu
65a8cd1772
Fix knowledge_graph_kwd on infinity. Close #6476 and #6624 (#6651)
### What problem does this PR solve?

Fix knowledge_graph_kwd on infinity. Close #6476 and #6624

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-28 22:05:40 +08:00
Kevin Hu
ecc9605a32
Fix: team doc deletion issue. (#6589)
### What problem does this PR solve?

#6557

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-27 13:26:38 +08:00
Yongteng Lei
df3890827d
Refa: change LLM chat output from full to delta (incremental) (#6534)
### What problem does this PR solve?

Change LLM chat output from full to delta (incremental)

### Type of change

- [x] Refactoring
2025-03-26 19:33:14 +08:00
Chenzy
735d9dd949
Feat: add "tools" to llm_factories.json (#6552)
### What problem does this PR solve?



### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Chenzy <chenzy901@gmail.com>
2025-03-26 17:31:18 +08:00
Kevin Hu
bf483fdf02
Fix: describe parameter error. (#6519)
### What problem does this PR solve?
#6228

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-26 09:02:48 +08:00
liwenju0
814a210f5d
Fix: failed to acquire lock exception with retry mechanism for postgres and mysql (#6483)
Added the with_retry decorator in db_models.py to add a retry mechanism
for database operations. Applied the retry mechanism to the lock and
unlock methods of the PostgresDatabaseLock and MysqlDatabaseLock classes
to enhance the reliability of lock operations.

### What problem does this PR solve?
resolve failed to acquire lock exception with retry mechanism

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: wenju.li <wenju.li@deepctr.cn>
2025-03-25 15:09:56 +08:00
Yongteng Lei
5e0a77df2b
Feat: add Langfuse APIs (#6460)
### What problem does this PR solve?

Add Langfuse APIs

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-03-24 18:25:43 +08:00
Yongteng Lei
66e557b6c0
Fix: Langfuse update model has no fields attribute (#6453)
### What problem does this PR solve?

Langfuse update model has no fields attribute

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-24 15:37:14 +08:00
Roccat
200b6f55c6
Fix: NameError: free variable 'langfuse_generation' referenced before assignment in enclosing scope (#6451)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: lizheng@ssc-hn.com <lizheng@ssc-hn.com>
2025-03-24 15:14:36 +08:00
Yongteng Lei
85eb367ede
Feat: add basic Langfuse support for LLM module (#6443)
### What problem does this PR solve?

#6155

Add basic Langfuse support for LLM module.

A trace example:

<img width="755" alt="image"
src="https://github.com/user-attachments/assets/25c1f852-5116-486c-a47f-6097187142ca"
/>


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-03-24 13:18:47 +08:00
lgphone
7f80d7304d
Fix: Optimized the get_by_id method to resolve the issue of missing exceptions and improve query performance (#6320)
Fix: Optimized the get_by_id method to resolve the issue of missing
exceptions and improve query performance

### What problem does this PR solve?

Optimized the get_by_id method to resolve the issue of missing
exceptions and improve query performance.
Optimization details:
1. The original method used a custom query method that required
concatenating SQL, which impacted performance.
2. The query method returned a list, which needed to be accessed by
index, posing a risk of index out-of-bounds errors.
3. The original method used except Exception to catch all errors, which
is not a best practice in Python programming and may lead to missing
exceptions. The get_or_none method accurately catches DoesNotExist
errors while allowing other errors to be raised normally.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Performance Improvement
2025-03-20 23:23:48 +08:00
so95
344727f9ba
Feat: add agent share team viewer (#6222)
### What problem does this PR solve?
Allow member view agent  
#  Canvas editor

![image](https://github.com/user-attachments/assets/042af36d-5fd1-43e2-acf7-05869220a1c1)
# List agent

![image](https://github.com/user-attachments/assets/8b9c7376-780b-47ff-8f5c-6c0e7358158d)
# Setting 

![image](https://github.com/user-attachments/assets/6cb7d12a-7a66-4dd7-9acc-5b53ff79a10a)
 
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-03-19 19:04:13 +08:00
so95
53ac27c3ff
Feat: support agent version history. (#6130)
### What problem does this PR solve?
Add history version save
- Allows users to view and download agent files by version revision
history

![image](https://github.com/user-attachments/assets/c300375d-8b97-4230-9fc4-83d148137132)

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-03-19 15:22:53 +08:00
Kevin Hu
e689532e6e
Fix: long api key issue. (#6267)
### What problem does this PR solve?

#6248

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-19 13:30:40 +08:00
Yongteng Lei
5cf610af40
Feat: add vision LLM PDF parser (#6173)
### What problem does this PR solve?

Add vision LLM PDF parser

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-03-18 14:52:20 +08:00
Zhichang Yu
89a69eed72
Introduced task priority (#6118)
### What problem does this PR solve?

Introduced task priority

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-03-14 23:43:46 +08:00