Kevin Hu
d29fd52e14
fix bug about divided by zero ( #1482 )
...
### What problem does this PR solve?
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-12 12:59:56 +08:00
Yuhao Tsui
7f4c63d102
fix: Delete hardcode ( #1464 )
...
### What problem does this PR solve?
After checking the language of the pdf, the line will hardcode the
language into Chinese
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-11 15:41:31 +08:00
H
2290c2a2f0
fix pdf_paser char content confusion ( #1462 )
...
### What problem does this PR solve?
#1407
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-11 14:37:55 +08:00
H
dbb8f7b77b
fix pdf_parser content confusion ( #1458 )
...
### What problem does this PR solve?
#1407
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-11 12:36:55 +08:00
Zhedong Cen
a95c1d45f0
Support table for markdown file in general parser ( #1278 )
...
### What problem does this PR solve?
Support extracting table for markdown file in general parser
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2024-06-27 14:38:35 +08:00
Zhedong Cen
45853505bb
Fix occasional errors in pdf table recognition ( #1277 )
...
### What problem does this PR solve?
Fix occasional errors in pdf table recognition
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-27 14:37:58 +08:00
Wang Baoling
18f4a6b35c
feat: support json file ( #1217 )
...
### What problem does this PR solve?
feat: support json file.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: KevinHuSh <kevinhu.sh@gmail.com>
2024-06-21 10:42:29 +08:00
KevinHuSh
e35f7610e7
fix too long query exception ( #1195 )
...
### What problem does this PR solve?
#1161
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-18 09:50:59 +08:00
KevinHuSh
4454ba7a1e
add self-rag ( #1070 )
...
### What problem does this PR solve?
#1069
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2024-06-06 11:13:39 +08:00
Jin Hai
cdea1d0a85
Update readme and add license ( #1018 )
...
### What problem does this PR solve?
- Update readme
- Add license
### Type of change
- [x] Documentation Update
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-06-01 16:24:10 +08:00
KevinHuSh
843720f958
fix bug in pdf parser ( #986 )
...
### What problem does this PR solve?
#963
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-30 11:47:36 +08:00
KevinHuSh
0171082cc5
fix create dialog bug ( #982 )
...
### What problem does this PR solve?
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-30 09:25:05 +08:00
Zhedong Cen
8dd45459be
Add support for HTML file ( #973 )
...
### What problem does this PR solve?
Add support for HTML file
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2024-05-30 09:12:55 +08:00
KevinHuSh
7eee193956
fix #917 #915 ( #946 )
...
### What problem does this PR solve?
#917
#915
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-28 11:13:02 +08:00
xinzhuang
3bbdf3b770
fixbug for computing 'not concating feature' ( #896 )
...
### What problem does this PR solve?
When pdfparser call `_naive_vertical_merge` method,there is a "not
concating feature " value by computing difference between `b` and `b_`'s
layoutno ,but actually is `b` and `b`. I think it's a bug, so fix it.
Please check again.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-23 14:29:42 +08:00
KevinHuSh
a12fcf9156
fix minio helth bug ( #850 )
...
### What problem does this PR solve?
#643
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-20 19:35:30 +08:00
GYH
c27c02ea67
Split Excel file into different chunks ( #847 )
...
### What problem does this PR solve?
Split Excel into different chunk
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2024-05-20 18:35:15 +08:00
KevinHuSh
99be226c7c
fix coordinate error ( #686 )
...
### What problem does this PR solve?
#683
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-08 20:00:14 +08:00
KevinHuSh
7013d7f620
refine text decode ( #657 )
...
### What problem does this PR solve?
#651
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-07 12:25:47 +08:00
KevinHuSh
cab274f560
remove PyMuPDF ( #618 )
...
### What problem does this PR solve?
#613
### Type of change
- [x] Other (please describe):
2024-04-30 12:38:09 +08:00
KevinHuSh
8c07992b6c
refine code ( #595 )
...
### What problem does this PR solve?
### Type of change
- [x] Refactoring
2024-04-28 19:13:33 +08:00
KevinHuSh
d589b0f568
fix exception in pdf parser ( #584 )
...
### What problem does this PR solve?
#451
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-28 14:23:53 +08:00
KevinHuSh
9d60a84958
refactor code ( #583 )
...
### What problem does this PR solve?
### Type of change
- [x] Refactoring
2024-04-28 13:19:54 +08:00
KevinHuSh
66f8d35632
Refactor ( #537 )
...
### What problem does this PR solve?
### Type of change
- [x] Refactoring
2024-04-25 14:14:28 +08:00
KevinHuSh
0dfc8ddc0f
enlarge docker memory usage ( #501 )
...
### What problem does this PR solve?
### Type of change
- [x] Refactoring
2024-04-23 14:41:10 +08:00
KevinHuSh
ed6081845a
Fit a lot of encodings for text file. ( #458 )
...
### What problem does this PR solve?
#384
### Type of change
- [x] Performance Improvement
2024-04-19 18:02:53 +08:00
KevinHuSh
962c66714e
fix divide by zero bug ( #447 )
...
### What problem does this PR solve?
#445
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-19 11:26:38 +08:00
加帆
39f1feaccb
Bug fix pdf parse index out of range ( #440 )
...
### What problem does this PR solve?
fix a bug comes when parse some pdf file #436
### Type of change
- [☑️ ] Bug Fix (non-breaking change which fixes an issue)
2024-04-19 08:44:51 +08:00
KevinHuSh
0499a3f621
rm page number exception for pdf parser ( #424 )
...
### What problem does this PR solve?
#423
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-18 12:09:56 +08:00
KevinHuSh
453c29170f
make sure the models will not be load twice ( #422 )
...
### What problem does this PR solve?
#381
### Type of change
- [x] Refactoring
2024-04-18 09:37:23 +08:00
jie yang
a7be5d4e8b
build ragflow image from scratch ( #376 )
...
### What problem does this PR solve?
issue: #205
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2024-04-16 12:29:58 +08:00
KevinHuSh
36f2d7b797
To avoid assertion while no rows in excel ( #197 )
...
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
Issue link:#[[Link the issue
here](https://github.com/infiniflow/ragflow/issues/196 )]
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Breaking Change (fix or feature that could cause existing
functionality not to work as expected)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Test cases
- [ ] Python SDK impacted, Need to update PyPI
- [ ] Other (please describe):
2024-04-02 10:51:21 +08:00
KevinHuSh
a5384446e3
let's load model from local ( #163 )
2024-03-28 16:10:47 +08:00
KevinHuSh
fd7fcb5baf
apply pep8 formalize ( #155 )
2024-03-27 11:33:46 +08:00
KevinHuSh
979b3a5b4b
support snapshot download from local ( #153 )
...
* support snapshot download from local
* let snapshot download from local
2024-03-27 09:53:42 +08:00
KevinHuSh
da21320b88
fix plainPdf bugs ( #152 )
2024-03-26 15:11:07 +08:00
KevinHuSh
71fe314955
refine page ranges ( #147 )
2024-03-25 13:11:57 +08:00
KevinHuSh
f6aee7f230
add use layout or not option ( #145 )
...
* add use layout or not option
* trival
2024-03-22 19:21:09 +08:00
KevinHuSh
6c6b144de2
refine manual parser ( #140 )
2024-03-21 18:17:32 +08:00
KevinHuSh
6999598101
refine for English corpus ( #135 )
2024-03-20 16:56:16 +08:00
KevinHuSh
9a843667b3
fix github account login issue ( #132 )
2024-03-19 15:31:47 +08:00
KevinHuSh
9da671b951
refine manul parser ( #131 )
2024-03-19 12:26:04 +08:00
KevinHuSh
675a9f8d9a
add dockerfile for cuda envirement. Refine table search strategy, ( #123 )
2024-03-14 19:45:29 +08:00
KevinHuSh
0feb085c88
refine table parser ( #120 )
2024-03-12 18:56:04 +08:00
KevinHuSh
f1f09df901
add local llm implementation ( #119 )
2024-03-12 11:57:08 +08:00
KevinHuSh
8f86ab9f7f
refine pdf parser, add time zone to userinfo ( #112 )
2024-03-08 11:24:24 +08:00
KevinHuSh
602038ac49
fix task cancling bug ( #98 )
2024-03-05 16:33:47 +08:00
KevinHuSh
8a57f2afd5
change callback strategy, add timezone to docker ( #96 )
2024-03-05 12:08:41 +08:00
KevinHuSh
7bfaf0df29
fix position extraction bug ( #93 )
...
* fix position extraction bug
* remove delimiter for naive parser
2024-03-04 17:08:35 +08:00
KevinHuSh
685b4d8a95
fix table desc bugs, add positions to chunks ( #91 )
2024-03-04 14:42:26 +08:00