92 Commits

Author SHA1 Message Date
KevinHuSh
aa71462a9f
fix bug #502 (#504)
### What problem does this PR solve?

#502 
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-23 16:01:46 +08:00
chrysanthemum-boy
72384b191d
Add .doc file parser. (#497)
### What problem does this PR solve?
Add `.doc` file parser, using tika.
```
pip install tika
```
```
from tika import parser
from io import BytesIO

def extract_text_from_doc_bytes(doc_bytes):
    file_like_object = BytesIO(doc_bytes)
    parsed = parser.from_buffer(file_like_object)
    return parsed["content"]
```
### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: chrysanthemum-boy <fannc@qq.com>
2024-04-23 15:31:43 +08:00
KevinHuSh
0dfc8ddc0f
enlarge docker memory usage (#501)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
2024-04-23 14:41:10 +08:00
KevinHuSh
a38e163035
remove doc from supported processing types (#488)
### What problem does this PR solve?
#474 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-22 15:46:09 +08:00
KevinHuSh
3610e1e5b4
fix ollama issuet push (#486)
### What problem does this PR solve?

#477 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-22 15:13:01 +08:00
Shaun
11949f9f2e
feat: support markdown files (#483)
parse markdown files as txt

### What problem does this PR solve?

support markdown files

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-22 14:43:36 +08:00
KevinHuSh
b8e58fe27a
add redis to accelerate access of minio (#482)
### What problem does this PR solve?

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-22 14:11:09 +08:00
KevinHuSh
7e41b4bc94
change readme for 0.3.0 release (#459)
### What problem does this PR solve?


### Type of change

- [x] Documentation Update
2024-04-19 18:19:15 +08:00
KevinHuSh
ed6081845a
Fit a lot of encodings for text file. (#458)
### What problem does this PR solve?

#384

### Type of change

- [x] Performance Improvement
2024-04-19 18:02:53 +08:00
KevinHuSh
453c29170f
make sure the models will not be load twice (#422)
### What problem does this PR solve?

#381 
### Type of change

- [x] Refactoring
2024-04-18 09:37:23 +08:00
YC
e8570da856
Update table.py to convert clmns to string (#414)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-17 19:48:11 +08:00
KevinHuSh
800b5c7aaa
fix bulk error for table method (#407)
### What problem does this PR solve?


Issue link:#366

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-17 12:17:14 +08:00
KevinHuSh
d4e0bfc8a5
fix gb2312 encoding issue (#394)
### What problem does this PR solve?

Issue link:#384
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-16 19:45:14 +08:00
KevinHuSh
890561703b
Add bce-embedding and fastembed (#383)
### What problem does this PR solve?


Issue link:#326

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-16 16:42:19 +08:00
Anush
826ad6a33a
feat: FastEmbed embedding support (#291)
### Description

Following up on https://github.com/infiniflow/ragflow/pull/275, this PR
adds support for FastEmbed model configurations.

The options are not exhaustive. You can find the full list
[here](https://qdrant.github.io/fastembed/examples/Supported_Models/).

P.S. I ran into OOM issues when building the image.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: KevinHuSh <kevinhu.sh@gmail.com>
2024-04-15 15:58:06 +08:00
KevinHuSh
c39b751600
conversation API backend update (#360)
### What problem does this PR solve?


Issue link:#345

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-15 14:43:44 +08:00
KevinHuSh
8ffc09cb5c
Support Xinference (#321)
### What problem does this PR solve?

Issue link:#299

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-11 18:25:37 +08:00
KevinHuSh
91068edf16
Support Xinference (#320)
### What problem does this PR solve?

Issue link:#299

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-11 18:22:25 +08:00
KevinHuSh
f6c7204002
refine log format (#312)
### What problem does this PR solve?

Issue link:#264
### Type of change


- [x] Documentation Update
- [x] Refactoring
2024-04-11 10:13:43 +08:00
KevinHuSh
24c15daaed
fix es exception (#298)
### What problem does this PR solve?

Issue link:#295
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-10 16:00:48 +08:00
KevinHuSh
a0a480b708
continue add layout model for 'laws' (#292)
### What problem does this PR solve?

Issue link:#289

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-10 14:06:36 +08:00
KevinHuSh
243de6ac90
add a new model for 'Laws' (#290)
### What problem does this PR solve?

Issue link:#289
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-10 11:59:00 +08:00
KevinHuSh
923cbe488e
fix #258 task_executor occupy cpu too much (#288)
### What problem does this PR solve?

Issue link:#285

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-10 10:11:22 +08:00
KevinHuSh
653c759d8d
resolve issure to call ZH?IPUAI (#277)
### What problem does this PR solve?

Issue link:#265

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-09 16:16:10 +08:00
KevinHuSh
6748872b0b
remove presence_penalty for chatglm (#268)
### What problem does this PR solve?

Issue link:#265

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-09 09:24:08 +08:00
KevinHuSh
3708b97db9
Support Ollama (#261)
### What problem does this PR solve?

Issue link:#221

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-08 19:20:57 +08:00
KevinHuSh
bb96180e77
Add more information on vm map count setting (#241)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

Issue link:#[[Link the issue
here](https://github.com/infiniflow/ragflow/issues/236)]

### Type of change

- [x] Documentation Update
2024-04-07 09:41:53 +08:00
KevinHuSh
23b448cf96
fix docker compose issue (#238)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

Issue link:#[[Link the issue
here](https://github.com/infiniflow/ragflow/issues/226)]

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-07 09:04:32 +08:00
KevinHuSh
392e515c3f
fix bug about reload knowledgebase configuration reloading (#210)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

Issue link:#[[Link the issue
here](https://github.com/infiniflow/ragflow/issues/209)]

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-03 11:00:50 +08:00
KevinHuSh
f89c6c9d59
rm useless warning message (#201)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

Issue link:#[[Link the issue
here](https://github.com/infiniflow/ragflow/issues/200)]

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-02 13:50:05 +08:00
KevinHuSh
572e5b1ff1
Let task continue dispaching while meeting unexpected doc formats (#199)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

Issue link:#[[Link the issue
here](https://github.com/infiniflow/ragflow/issues/198)]

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Breaking Change (fix or feature that could cause existing
functionality not to work as expected)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Test cases
- [ ] Python SDK impacted, Need to update PyPI
- [ ] Other (please describe):
2024-04-02 11:39:01 +08:00
KevinHuSh
36f2d7b797
To avoid assertion while no rows in excel (#197)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

Issue link:#[[Link the issue
here](https://github.com/infiniflow/ragflow/issues/196)]

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Breaking Change (fix or feature that could cause existing
functionality not to work as expected)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Test cases
- [ ] Python SDK impacted, Need to update PyPI
- [ ] Other (please describe):
2024-04-02 10:51:21 +08:00
KevinHuSh
121c7a5681
refine error response, add set api-key MD (#178) 2024-03-31 19:09:42 +08:00
KevinHuSh
e876f58b4c
refine readme (#170) 2024-03-29 14:38:15 +08:00
KevinHuSh
05298d5626
refine README (#169) 2024-03-29 10:48:29 +08:00
KevinHuSh
38e5737067
add base url for OpenAI (#166) 2024-03-28 19:15:16 +08:00
KevinHuSh
a5384446e3
let's load model from local (#163) 2024-03-28 16:10:47 +08:00
KevinHuSh
f3477202fe
refine citation (#161) 2024-03-28 11:45:50 +08:00
KevinHuSh
bf2e3d7fc1
refine OpenAi Api (#159) 2024-03-27 17:55:45 +08:00
KevinHuSh
37185466e2
READEME refined (#156) 2024-03-27 13:14:36 +08:00
KevinHuSh
fd7fcb5baf
apply pep8 formalize (#155) 2024-03-27 11:33:46 +08:00
KevinHuSh
979b3a5b4b
support snapshot download from local (#153)
* support snapshot download from local

* let snapshot download from local
2024-03-27 09:53:42 +08:00
KevinHuSh
da21320b88
fix plainPdf bugs (#152) 2024-03-26 15:11:07 +08:00
KevinHuSh
71fe314955
refine page ranges (#147) 2024-03-25 13:11:57 +08:00
KevinHuSh
f6aee7f230
add use layout or not option (#145)
* add use layout or not option

* trival
2024-03-22 19:21:09 +08:00
KevinHuSh
6c6b144de2
refine manual parser (#140) 2024-03-21 18:17:32 +08:00
KevinHuSh
5875c8ba08
Add 'One' chunk method (#137) 2024-03-20 18:57:22 +08:00
KevinHuSh
6999598101
refine for English corpus (#135) 2024-03-20 16:56:16 +08:00
KevinHuSh
9a843667b3
fix github account login issue (#132) 2024-03-19 15:31:47 +08:00
KevinHuSh
9da671b951
refine manul parser (#131) 2024-03-19 12:26:04 +08:00