39 Commits

Author SHA1 Message Date
Zhedong Cen
45853505bb
Fix occasional errors in pdf table recognition (#1277)
### What problem does this PR solve?

Fix occasional errors in pdf table recognition

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-27 14:37:58 +08:00
KevinHuSh
4454ba7a1e
add self-rag (#1070)
### What problem does this PR solve?

#1069 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-06 11:13:39 +08:00
Jin Hai
cdea1d0a85
Update readme and add license (#1018)
### What problem does this PR solve?

- Update readme
- Add license

### Type of change

- [x] Documentation Update

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-06-01 16:24:10 +08:00
KevinHuSh
843720f958
fix bug in pdf parser (#986)
### What problem does this PR solve?

#963 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-30 11:47:36 +08:00
KevinHuSh
7eee193956
fix #917 #915 (#946)
### What problem does this PR solve?

#917 
#915

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-28 11:13:02 +08:00
xinzhuang
3bbdf3b770
fixbug for computing 'not concating feature' (#896)
### What problem does this PR solve?

When pdfparser call `_naive_vertical_merge` method,there is a "not
concating feature " value by computing difference between `b` and `b_`'s
layoutno ,but actually is `b` and `b`. I think it's a bug, so fix it.
Please check again.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-23 14:29:42 +08:00
KevinHuSh
99be226c7c
fix coordinate error (#686)
### What problem does this PR solve?

#683 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-08 20:00:14 +08:00
KevinHuSh
cab274f560
remove PyMuPDF (#618)
### What problem does this PR solve?
#613 

### Type of change


- [x] Other (please describe):
2024-04-30 12:38:09 +08:00
KevinHuSh
8c07992b6c
refine code (#595)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
2024-04-28 19:13:33 +08:00
KevinHuSh
d589b0f568
fix exception in pdf parser (#584)
### What problem does this PR solve?
#451 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-28 14:23:53 +08:00
KevinHuSh
9d60a84958
refactor code (#583)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
2024-04-28 13:19:54 +08:00
KevinHuSh
66f8d35632
Refactor (#537)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
2024-04-25 14:14:28 +08:00
KevinHuSh
0dfc8ddc0f
enlarge docker memory usage (#501)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
2024-04-23 14:41:10 +08:00
KevinHuSh
962c66714e
fix divide by zero bug (#447)
### What problem does this PR solve?

#445 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-19 11:26:38 +08:00
加帆
39f1feaccb
Bug fix pdf parse index out of range (#440)
### What problem does this PR solve?

fix a bug comes when parse some pdf file #436 

### Type of change

- [☑️ ] Bug Fix (non-breaking change which fixes an issue)
2024-04-19 08:44:51 +08:00
KevinHuSh
0499a3f621
rm page number exception for pdf parser (#424)
### What problem does this PR solve?

#423 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-18 12:09:56 +08:00
KevinHuSh
453c29170f
make sure the models will not be load twice (#422)
### What problem does this PR solve?

#381 
### Type of change

- [x] Refactoring
2024-04-18 09:37:23 +08:00
KevinHuSh
a5384446e3
let's load model from local (#163) 2024-03-28 16:10:47 +08:00
KevinHuSh
fd7fcb5baf
apply pep8 formalize (#155) 2024-03-27 11:33:46 +08:00
KevinHuSh
979b3a5b4b
support snapshot download from local (#153)
* support snapshot download from local

* let snapshot download from local
2024-03-27 09:53:42 +08:00
KevinHuSh
da21320b88
fix plainPdf bugs (#152) 2024-03-26 15:11:07 +08:00
KevinHuSh
71fe314955
refine page ranges (#147) 2024-03-25 13:11:57 +08:00
KevinHuSh
f6aee7f230
add use layout or not option (#145)
* add use layout or not option

* trival
2024-03-22 19:21:09 +08:00
KevinHuSh
6c6b144de2
refine manual parser (#140) 2024-03-21 18:17:32 +08:00
KevinHuSh
6999598101
refine for English corpus (#135) 2024-03-20 16:56:16 +08:00
KevinHuSh
9a843667b3
fix github account login issue (#132) 2024-03-19 15:31:47 +08:00
KevinHuSh
9da671b951
refine manul parser (#131) 2024-03-19 12:26:04 +08:00
KevinHuSh
675a9f8d9a
add dockerfile for cuda envirement. Refine table search strategy, (#123) 2024-03-14 19:45:29 +08:00
KevinHuSh
8f86ab9f7f
refine pdf parser, add time zone to userinfo (#112) 2024-03-08 11:24:24 +08:00
KevinHuSh
602038ac49
fix task cancling bug (#98) 2024-03-05 16:33:47 +08:00
KevinHuSh
8a57f2afd5
change callback strategy, add timezone to docker (#96) 2024-03-05 12:08:41 +08:00
KevinHuSh
7bfaf0df29
fix position extraction bug (#93)
* fix position extraction bug

* remove delimiter for naive parser
2024-03-04 17:08:35 +08:00
KevinHuSh
685b4d8a95
fix table desc bugs, add positions to chunks (#91) 2024-03-04 14:42:26 +08:00
KevinHuSh
8a726fb04b
solve task execution issues (#90) 2024-03-01 19:48:01 +08:00
KevinHuSh
3d4315c42a
resolve the issue of naive parser (#87) 2024-02-29 18:53:02 +08:00
KevinHuSh
0429107e80
fix user login issue (#85) 2024-02-29 14:03:07 +08:00
KevinHuSh
4568a4b2cb
refine admin initialization (#75) 2024-02-27 14:57:34 +08:00
KevinHuSh
d32322c081
rename vision, add layour and tsr recognizer (#70)
* rename vision, add layour and tsr recognizer

* trivial fixing
2024-02-22 19:11:37 +08:00
KevinHuSh
cacd36c5e1
use onnx models, new deepdoc (#68) 2024-02-21 16:32:38 +08:00