469 Commits

Author SHA1 Message Date
Jin Hai
8f070c3d56
Fix 'SCORE' not found bug (#4178)
### What problem does this PR solve?

As title

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: jinhai <haijin.chn@gmail.com>
2024-12-23 14:50:12 +08:00
Kevin Hu
31d67c850e
Fetch chunk by batches. (#4177)
### What problem does this PR solve?

#4173

### Type of change

- [x] Performance Improvement
2024-12-23 12:12:15 +08:00
Kevin Hu
2cbe064080
Add Llama3.3 (#4174)
### What problem does this PR solve?

#4168

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-12-23 11:18:01 +08:00
Kenny Dizi
f13f503952
Use s3 configuration from settings module (#4167)
### What problem does this PR solve?

Fix the issue when retrieving AWS credentials from the S3 configuration
from the settings module instead of getting from the environment
variables.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-12-23 10:22:45 +08:00
Kevin Hu
cb45431412
Fix Voyage re-rank model. Limit file name length. (#4171)
### What problem does this PR solve?

#4152 
#4154

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-23 10:03:50 +08:00
Zhichang Yu
85083ad400
Validate returned chunk at list_chunks and add_chunk (#4153)
### What problem does this PR solve?

Validate returned chunk at list_chunks and add_chunk

### Type of change

- [x] Refactoring
2024-12-20 22:55:45 +08:00
Zhichang Yu
a0dc9e1bdf
Fix position_int on infinity (#4144)
### What problem does this PR solve?

Fix position_int on infinity

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-20 11:30:33 +08:00
ly0303521
101b8ff813
fix chunk method "Table" losing content when the Excel file has multi… (#4123)
…ple sheets

### What problem does this PR solve?
discussed in https://github.com/infiniflow/ragflow/pull/4102
- In excel_parser.py, `total` means the total number of rows in Excel,
but it return in the first iterate, that lead to the wrong `to_page`
- In table.py, it when Excel file has multiple sheets, it will be
divided into multiple parts, every part size is 3000, `data` may be
empty, because it has recorded in the last iterate.
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-19 17:30:26 +08:00
Kevin Hu
d8fca43017
Make fast embed and default embed mutually exclusive. (#4121)
### What problem does this PR solve?


### Type of change

- [x] Performance Improvement
2024-12-19 17:27:09 +08:00
Kevin Hu
7474348394
Fix fastembed reloading issue. (#4117)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-19 16:18:18 +08:00
Zhichang Yu
57c99dd811
Fixed infinity exception SCORE() / SCORE_FACTORS() requires Fusion or MATCH TEXT or MATCH TENSOR (#4110)
### What problem does this PR solve?

Fixed infinity exception SCORE() / SCORE_FACTORS() requires Fusion or
MATCH TEXT or MATCH TENSOR. Close #4109

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-19 13:49:36 +08:00
Jin Hai
50c2b9d562
Refactor trie load and construct (#4083)
### What problem does this PR solve?

1. Fix initial build and load trie
2. Update comment

### Type of change

- [x] Refactoring

Signed-off-by: jinhai <haijin.chn@gmail.com>
2024-12-18 12:52:56 +08:00
Kevin Hu
000cd6d615
Fix position lost issue. (#4068)
### What problem does this PR solve?

#4040

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-17 16:31:58 +08:00
liuhua
1d65299791
Fix rerank_model bug in chat and markdown bug (#4061)
### What problem does this PR solve?

Fix rerank_model bug in chat and markdown bug
#4000
#3992
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
2024-12-17 16:03:37 +08:00
Zhichang Yu
bcccaccc2b
Added pagerank support to infinity (#4059)
### What problem does this PR solve?

Added pagerank support to infinity

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-17 15:45:01 +08:00
Kevin Hu
cb6e9ce164
Cache the result from llm for graphrag and raptor (#4051)
### What problem does this PR solve?

#4045

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-12-17 09:48:03 +08:00
Zhichang Yu
9b2ef62aee
Fix xinfo_groups returns unexpected result (#4026)
### What problem does this PR solve?

Fix xinfo_groups returns unexpected result. Close #3545 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-13 17:31:15 +08:00
Zhichang Yu
86507af770
Set task progress on exception (#4025)
### What problem does this PR solve?

Set task progress on exception

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-13 17:15:08 +08:00
Zhichang Yu
0bca46ac3a
Migrate infinity at startup (#3858)
### What problem does this PR solve?

Migrate infinity at startup

#3809
https://github.com/infiniflow/infinity/issues/2321

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-12-13 13:43:56 +08:00
Luo Pan
68d46b2a1e
Fix bug in hierarchical_merge function (#4006)
### What problem does this PR solve?

Fix hierarchical_merge function. From idx vs. actual value to actual
value vs. actual value.
Related issue #4003 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: luopan <luopan@example.com>
2024-12-13 08:50:58 +08:00
Zhichang Yu
c8b1a564aa
Replaced md5 with xxhash64 for chunk id (#4009)
### What problem does this PR solve?

Replaced md5 with xxhash64 for chunk id

### Type of change

- [x] Refactoring
2024-12-12 17:47:39 +08:00
Zhichang Yu
301f95837c
Try to reuse existing chunks (#3983)
### What problem does this PR solve?

Try to reuse existing chunks. Close #3793
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-12-12 16:38:03 +08:00
Kevin Hu
6d19294ddc
Support debug components. (#3994)
### What problem does this PR solve?

#3993

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-11 19:23:59 +08:00
Zhichang Yu
9a6d976252
Add back beartype (#3967)
### What problem does this PR solve?

Add back beartype

### Type of change

- [x] Refactoring
2024-12-10 18:43:43 +08:00
Zhichang Yu
3d76f10a91
Fixed retrieval TypeError: unhashable type: 'list' (#3966)
### What problem does this PR solve?

Fixed retrieval TypeError: unhashable type: 'list'

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-10 18:28:56 +08:00
Zhichang Yu
03f00c9e6f
Rename page_num_list, top_list, position_list (#3940)
### What problem does this PR solve?

Rename page_num_list, top_list, position_list to page_num_int, top_int,
position_int

### Type of change

- [x] Refactoring
2024-12-10 16:32:58 +08:00
so95
d5a322a352
Theme switch support (#3568)
### What problem does this PR solve?
- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2024-12-10 11:42:04 +08:00
Kevin Hu
927873bfa6
Fix syn error. (#3953)
### What problem does this PR solve?

Close #3696
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-10 10:54:54 +08:00
Zhichang Yu
7a6bf4326e
Fixed log not displaying (#3946)
### What problem does this PR solve?

Fixed log not displaying

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-10 09:36:59 +08:00
Kevin Hu
044afa83d1
Fix transformers dependencies for slim. (#3934)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-09 14:21:37 +08:00
Kevin Hu
215e9361ea
Fix field missing issue. (#3931)
### What problem does this PR solve?

#3905
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-09 13:20:58 +08:00
Zhichang Yu
0d68a6cd1b
Fix errors detected by Ruff (#3918)
### What problem does this PR solve?

Fix errors detected by Ruff

### Type of change

- [x] Refactoring
2024-12-08 14:21:12 +08:00
Kevin Hu
56f473b680
Feat: Add question parameter to edit chunk modal (#3875)
### What problem does this PR solve?

Close #3873

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-12-05 14:51:19 +08:00
Kevin Hu
593ffc4067
Fix HuggingFace model error. (#3870)
### What problem does this PR solve?

#3865

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-05 13:28:42 +08:00
Kevin Hu
78601ee1bd
Fix open AI compatible rerank issue. (#3866)
### What problem does this PR solve?
#3700
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-05 10:26:21 +08:00
Kevin Hu
1b817a5b4c
Refine synonym query. (#3855)
### What problem does this PR solve?

### Type of change

- [x] Performance Improvement
2024-12-04 17:20:12 +08:00
Kevin Hu
3f3469130b
Fix preview issue in file manager. (#3846)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-04 11:53:23 +08:00
Jin Hai
6657ca7cde
Change default error message to English (#3838)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-12-04 09:34:49 +08:00
Jin Hai
821fdf02b4
Fix parsing JSON file error (#3829)
### What problem does this PR solve?

Close issue: #3828

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: jinhai <haijin.chn@gmail.com>
2024-12-03 19:02:03 +08:00
Zhichang Yu
92ab7ef659
Refactor embedding batch_size (#3825)
### What problem does this PR solve?

Refactor embedding batch_size. Close #3657

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2024-12-03 16:22:39 +08:00
Kevin Hu
74b28ef1b0
Add pagerank to KB. (#3809)
### What problem does this PR solve?

#3794

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-12-03 14:30:35 +08:00
Kevin Hu
6a0583f5ad
Fix voyage embedding. (#3818)
### What problem does this PR solve?

#3816 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-03 09:33:54 +08:00
Zhichang Yu
c5f13629af
Set Log level by env (#3798)
### What problem does this PR solve?

Set Log level by env

### Type of change

- [x] Refactoring
2024-12-02 17:24:39 +08:00
Zhichang Yu
d19f059f34
Detect invalid response from api.siliconflow.cn (#3792)
### What problem does this PR solve?

Detect invalid response from api.siliconflow.cn. Close #2643

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-02 12:55:05 +08:00
devMls
59a5813f1b
add jina new models in jina connector (#3770)
### What problem does this PR solve?

add new models in jinna connector, to allow use models that support
multilingual models

### Type of change

- [X] Other (please describe): new connectors no breaking change
2024-12-02 10:06:39 +08:00
Jin Hai
08c1a5e1e8
Refactor parse progress (#3781)
### What problem does this PR solve?

Refactor parse file progress

### Type of change

- [x] Refactoring

Signed-off-by: jinhai <haijin.chn@gmail.com>
2024-12-01 22:28:00 +08:00
Jin Hai
ea84cc2e33
Update file parsing progress info (#3780)
### What problem does this PR solve?

Refine the file parsing progress info

### Type of change

- [x] Refactoring

Signed-off-by: jinhai <haijin.chn@gmail.com>
2024-12-01 17:03:00 +08:00
Jin Hai
e079656473
Update progress info and start welcome info (#3768)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Refactoring

---------

Signed-off-by: jinhai <haijin.chn@gmail.com>
2024-11-30 18:48:06 +08:00
Jin Hai
d00297a763
Fix chunk creation using Infinity (#3763)
### What problem does this PR solve?

1. Store error type in Infinity
2. position list value read from Infinity isn't correct.

Fix issue: #3729

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: jinhai <haijin.chn@gmail.com>
2024-11-30 00:10:14 +08:00
Kevin Hu
0f08b0f053
Weight up title and keywords for chunks in terms of retrieval (#3750)
### What problem does this PR solve?


### Type of change

- [x] Performance Improvement
2024-11-29 16:39:55 +08:00