diff --git a/docs/guides/dataset/configure_knowledge_base.md b/docs/guides/dataset/configure_knowledge_base.md index c18976c7d..5c5c8487e 100644 --- a/docs/guides/dataset/configure_knowledge_base.md +++ b/docs/guides/dataset/configure_knowledge_base.md @@ -41,7 +41,7 @@ RAGFlow offers multiple chunking template to facilitate chunking files of differ | **Template** | Description | File format | |--------------|-----------------------------------------------------------------------|-----------------------------------------------------------------------------------------------| -| General | Files are consecutively chunked based on a preset chunk token number. | DOCX, XLSX, XLS (Excel 97-2003), PPT, PDF, TXT, JPEG, JPG, PNG, TIF, GIF, CSV, JSON, EML, HTML | +| General | Files are consecutively chunked based on a preset chunk token number. | MD, MDX, DOCX, XLSX, XLS (Excel 97-2003), PPT, PDF, TXT, JPEG, JPG, PNG, TIF, GIF, CSV, JSON, EML, HTML | | Q&A | | XLSX, XLS (Excel 97-2003), CSV/TXT | | Resume | Enterprise edition only. You can also try it out on demo.ragflow.io. | DOCX, PDF, TXT | | Manual | | PDF | diff --git a/docs/guides/dataset/enable_excel2html.md b/docs/guides/dataset/enable_excel2html.md index ae01f1998..531a673cc 100644 --- a/docs/guides/dataset/enable_excel2html.md +++ b/docs/guides/dataset/enable_excel2html.md @@ -9,7 +9,7 @@ Convert complex Excel spreadsheets into HTML tables. --- -When using the General chunking method, you can enable the **Excel to HTML** toggle to convert spreadsheet files into HTML tables. If it is disabled, spreadsheet tables will be represented as key-value pairs. For complex tables that cannot be simply represented this way, you must enable this feature. +When using the **General** chunking method, you can enable the **Excel to HTML** toggle to convert spreadsheet files into HTML tables. If it is disabled, spreadsheet tables will be represented as key-value pairs. For complex tables that cannot be simply represented this way, you must enable this feature. :::caution WARNING The feature is disabled by default. If your knowledge base contains spreadsheets with complex tables and you do not enable this feature, RAGFlow will not throw an error but your tables are likely to be garbled. @@ -22,7 +22,7 @@ Works with complex tables that cannot be represented as key-value pairs. Example ## Considerations - The Excel2HTML feature applies only to spreadsheet files (XLSX or XLS (Excel 97-2003)). -- This feature is associated with the General chunking method. In other words, it is available *only when* you select the General chunking method. +- This feature is associated with the **General** chunking method. In other words, it is available *only when* you select the **General** chunking method. - When this feature is enabled, spreadsheet tables with more than 12 rows will be split into chunks of 12 rows each. ## Procedure diff --git a/docs/guides/dataset/enable_raptor.md b/docs/guides/dataset/enable_raptor.md index 701b60113..4beab3dea 100644 --- a/docs/guides/dataset/enable_raptor.md +++ b/docs/guides/dataset/enable_raptor.md @@ -47,7 +47,7 @@ The RAPTOR feature is disabled by default. To enable it, manually switch on the ### Prompt -The following prompt will be applied recursively for cluster summarization, with `{cluster_content}` serving as an internal parameter. We recommend that you keep it as-is for now. The design will be updated in due course. +The following prompt will be applied *recursively* for cluster summarization, with `{cluster_content}` serving as an internal parameter. We recommend that you keep it as-is for now. The design will be updated in due course. ``` Please summarize the following paragraphs... Paragraphs as following: diff --git a/docs/guides/dataset/use_tag_sets.md b/docs/guides/dataset/use_tag_sets.md index 4d713dcc8..a4675b93f 100644 --- a/docs/guides/dataset/use_tag_sets.md +++ b/docs/guides/dataset/use_tag_sets.md @@ -5,7 +5,7 @@ slug: /use_tag_sets # Use tag set -Use a tag set to tag chunks in your datasets. +Use a tag set to auto-tag chunks in your datasets. --- diff --git a/docs/quickstart.mdx b/docs/quickstart.mdx index 75fc5e6bc..42f1cd70d 100644 --- a/docs/quickstart.mdx +++ b/docs/quickstart.mdx @@ -287,7 +287,7 @@ To add and configure an LLM: ## Create your first knowledge base -You are allowed to upload files to a knowledge base in RAGFlow and parse them into datasets. A knowledge base is virtually a collection of datasets. Question answering in RAGFlow can be based on a particular knowledge base or multiple knowledge bases. File formats that RAGFlow supports include documents (PDF, DOC, DOCX, TXT, MD), tables (CSV, XLSX, XLS), pictures (JPEG, JPG, PNG, TIF, GIF), and slides (PPT, PPTX). +You are allowed to upload files to a knowledge base in RAGFlow and parse them into datasets. A knowledge base is virtually a collection of datasets. Question answering in RAGFlow can be based on a particular knowledge base or multiple knowledge bases. File formats that RAGFlow supports include documents (PDF, DOC, DOCX, TXT, MD, MDX), tables (CSV, XLSX, XLS), pictures (JPEG, JPG, PNG, TIF, GIF), and slides (PPT, PPTX). To create your first knowledge base: diff --git a/web/src/locales/de.ts b/web/src/locales/de.ts index 738017b4a..e2c9007e6 100644 --- a/web/src/locales/de.ts +++ b/web/src/locales/de.ts @@ -255,7 +255,7 @@ export default { manual: `
Nur PDF wird unterstützt.
Wir gehen davon aus, dass das Handbuch eine hierarchische Abschnittsstruktur aufweist und verwenden die Titel der untersten Abschnitte als Grundeinheit für die Aufteilung der Dokumente. Daher werden Abbildungen und Tabellen im selben Abschnitt nicht getrennt, was zu größeren Chunk-Größen führen kann.
`, - naive: `Unterstützte Dateiformate sind DOCX, XLSX, XLS (Excel 97-2003), PPT, PDF, TXT, JPEG, JPG, PNG, TIF, GIF, CSV, JSON, EML, HTML.
+ naive: `Unterstützte Dateiformate sind MD, MDX, DOCX, XLSX, XLS (Excel 97-2003), PPT, PDF, TXT, JPEG, JPG, PNG, TIF, GIF, CSV, JSON, EML, HTML.
Diese Methode teilt Dateien mit einer 'naiven' Methode auf:
Only PDF is supported.
We assume that the manual has a hierarchical section structure, using the lowest section titles as basic unit for chunking documents. Therefore, figures and tables in the same section will not be separated, which may result in larger chunk sizes.
`, - naive: `Supported file formats are DOCX, XLSX, XLS (Excel 97-2003), PPT, PDF, TXT, JPEG, JPG, PNG, TIF, GIF, CSV, JSON, EML, HTML.
+ naive: `Supported file formats are MD, MDX, DOCX, XLSX, XLS (Excel 97-2003), PPT, PDF, TXT, JPEG, JPG, PNG, TIF, GIF, CSV, JSON, EML, HTML.
This method chunks files using a 'naive' method:
Format file yang didukung adalah DOCX, XLSX, XLS (Excel 97-2003), PPT, PDF, TXT, JPEG, JPG, PNG, TIF, GIF, CSV, JSON, EML, HTML.
+ naive: `Format file yang didukung adalah MD, MDX, DOCX, XLSX, XLS (Excel 97-2003), PPT, PDF, TXT, JPEG, JPG, PNG, TIF, GIF, CSV, JSON, EML, HTML.
Metode ini menerapkan cara naif untuk memotong file:
対応するのはPDFのみです。
マニュアルは階層的なセクション構造を持つと仮定され、最下位のセクションタイトルを基にチャンク分割を行います。そのため、同じセクション内の図表は分割されませんが、大きなチャンクサイズになる可能性があります。
`, - naive: `対応ファイル形式はDOCX, XLSX, XLS (Excel 97-2003), PPT, PDF, TXT, JPEG, JPG, PNG, TIF, GIF, CSV, JSON, EML, HTMLです。
+ naive: `対応ファイル形式はMD, MDX, DOCX, XLSX, XLS (Excel 97-2003), PPT, PDF, TXT, JPEG, JPG, PNG, TIF, GIF, CSV, JSON, EML, HTMLです。
この方法では、'ナイーブ'な方法でファイルを分割します:
Apenas PDF é suportado.
Assumimos que o manual tem uma estrutura hierárquica de seções, usando os títulos das seções inferiores como unidade básica para fragmentação. Assim, figuras e tabelas na mesma seção não serão separadas, o que pode resultar em fragmentos maiores.
`, - naive: `Os formatos de arquivo suportados são DOCX, XLSX, XLS (Excel 97-2003), PPT, PDF, TXT, JPEG, JPG, PNG, TIF, GIF, CSV, JSON, EML, HTML.
+ naive: `Os formatos de arquivo suportados são MD, MDX, DOCX, XLSX, XLS (Excel 97-2003), PPT, PDF, TXT, JPEG, JPG, PNG, TIF, GIF, CSV, JSON, EML, HTML.
Este método fragmenta arquivos de maneira 'simples':
Các định dạng tệp được hỗ trợ là DOCX, XLSX, XLS (Excel 97-2003), PPT, PDF, TXT, JPEG, JPG, PNG, TIF, GIF, CSV, JSON, EML, HTML.
`, +Các định dạng tệp được hỗ trợ là MD, MDX, DOCX, XLSX, XLS (Excel 97-2003), PPT, PDF, TXT, JPEG, JPG, PNG, TIF, GIF, CSV, JSON, EML, HTML.
`, paper: `Chỉ hỗ trợ tệp PDF.
Bài báo sẽ được chia theo các phần, chẳng hạn như tóm tắt, 1.1, 1.2.
Cách tiếp cận này cho phép LLM tóm tắt bài báo hiệu quả hơn và cung cấp các phản hồi toàn diện, dễ hiểu hơn. diff --git a/web/src/locales/zh-traditional.ts b/web/src/locales/zh-traditional.ts index a44e8597e..d11c40d14 100644 --- a/web/src/locales/zh-traditional.ts +++ b/web/src/locales/zh-traditional.ts @@ -246,7 +246,7 @@ export default { 我們假設手冊具有分層部分結構。我們使用最低的部分標題作為對文檔進行切片的樞軸。 因此,同一部分中的圖和表不會被分割,並且塊大小可能會很大。
`, - naive: `支持的文件格式為DOCX、XLSX、XLS (Excel 97-2003)、PPT、PDF、TXT、JPEG、JPG、PNG、TIF、GIF、CSV、JSON、EML、HTML。
+ naive: `支持的文件格式為MD、MDX、DOCX、XLSX、XLS (Excel 97-2003)、PPT、PDF、TXT、JPEG、JPG、PNG、TIF、GIF、CSV、JSON、EML、HTML。
此方法將簡單的方法應用於塊文件:
支持的文件格式为DOCX、XLSX、XLS (Excel 97-2003)、PPT、PDF、TXT、JPEG、JPG、PNG、TIF、GIF、CSV、JSON、EML、HTML。
+ naive: `支持的文件格式为MD、MDX、DOCX、XLSX、XLS (Excel 97-2003)、PPT、PDF、TXT、JPEG、JPG、PNG、TIF、GIF、CSV、JSON、EML、HTML。
此方法将简单的方法应用于块文件: