mirror of
https://git.mirrors.martin98.com/https://github.com/infiniflow/ragflow.git
synced 2025-06-04 11:24:00 +08:00
fix chunk method "Table" losing content when the Excel file has multi… (#4123)
…ple sheets ### What problem does this PR solve? discussed in https://github.com/infiniflow/ragflow/pull/4102 - In excel_parser.py, `total` means the total number of rows in Excel, but it return in the first iterate, that lead to the wrong `to_page` - In table.py, it when Excel file has multiple sheets, it will be divided into multiple parts, every part size is 3000, `data` may be empty, because it has recorded in the last iterate. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)
This commit is contained in:
parent
d8fca43017
commit
101b8ff813
@ -90,7 +90,7 @@ class RAGFlowExcelParser:
|
||||
for sheetname in wb.sheetnames:
|
||||
ws = wb[sheetname]
|
||||
total += len(list(ws.rows))
|
||||
return total
|
||||
return total
|
||||
|
||||
if fnm.split(".")[-1].lower() in ["csv", "txt"]:
|
||||
encoding = find_codec(binary)
|
||||
|
@ -66,6 +66,8 @@ class Excel(ExcelParser):
|
||||
continue
|
||||
data.append(row)
|
||||
done += 1
|
||||
if np.array(data).size == 0:
|
||||
continue
|
||||
res.append(pd.DataFrame(np.array(data), columns=headers))
|
||||
|
||||
callback(0.3, ("Extract records: {}~{}".format(from_page + 1, min(to_page, from_page + rn)) + (
|
||||
|
Loading…
x
Reference in New Issue
Block a user