Feat: Change “Document parser” to "PDF parser" #6072 (#6073)

### What problem does this PR solve? Feat: Change “Document parser” to "PDF parser" #6072 ### Type of change - [x] New Feature (non-breaking change which adds functionality)
2025-07-30 07:31:58 +08:00 · 2025-03-14 12:03:35 +08:00 · 2025-03-14 12:03:35 +08:00 · c85b468b8d
commit c85b468b8d
parent 7463241896
3 changed files with 12 additions and 12 deletions
--- a/web/src/locales/en.ts
+++ b/web/src/locales/en.ts
@ -138,7 +138,7 @@ export default {
      fromMessage: 'Missing start page number',
      toPlaceholder: 'to',
      toMessage: 'Missing end page number (excluded)',
-      layoutRecognize: 'Document parser',
+      layoutRecognize: 'PDF parser',
      layoutRecognizeTip:
        'Use a visual model for PDF layout analysis to effectively locate document titles, text blocks, images, and tables. If the naive option is chosen, only the plain text in the PDF will be retrieved. Please note that this option currently works ONLY for PDF documents.',
      taskPageSize: 'Task page size',
@ -161,7 +161,7 @@ export default {
      rerankTip: `If left empty, RAGFlow will use a combination of weighted keyword similarity and weighted vector cosine similarity; if a rerank model is selected, a weighted reranking score will replace the weighted vector cosine similarity. Please be aware that using a rerank model will significantly increase the system's response time.`,
      topK: 'Top-K',
      topKTip: `K chunks will be sent into the rerank model.`,
-      delimiter: `Delimiters`,
+      delimiter: `Delimiter for text`,
      delimiterTip:
        'A delimiter or separator can consist of one or multiple special characters. If it is multiple characters, ensure they are enclosed in backticks( ``). For example, if you configure your delimiters like this: \n`##`;, then your texts will be separated at line breaks, double hash symbols (##), or semicolons.',
      html4excel: 'Excel to HTML',
@ -209,8 +209,8 @@ export default {
      languagePlaceholder: 'Please input your language!',
      permissions: 'Permissions',
      embeddingModel: 'Embedding model',
-      chunkTokenNumber: 'Chunk token number',
-      chunkTokenNumberMessage: 'Chunk token number is required',
+      chunkTokenNumber: 'Chunk token number for text',
+      chunkTokenNumberMessage: 'Chunk token number for text is required',
      embeddingModelTip:
        'The model that converts chunks into embeddings. It cannot be changed once the knowledge base has chunks. To switch to a different embedding model, you must delete all existing chunks in the knowledge base.',
      permissionsTip:
@ -250,7 +250,7 @@ export default {
      <p>This method chunks files using a 'naive' method: </p>
      <p>
      <li>Use vision detection model to split the texts into smaller segments.</li>
-      <li>Then, combine adjacent segments until the token count exceeds the threshold specified by 'Chunk token number', at which point a chunk is created.</li></p>`,
+      <li>Then, combine adjacent segments until the token count exceeds the threshold specified by 'Chunk token number for text', at which point a chunk is created.</li></p>`,
      paper: `<p>Only <b>PDF</b> file is supported.</p><p>
      Papers will be split by section, such as <i>abstract, 1.1, 1.2</i>. </p><p>
      This approach enables the LLM to summarize the paper more effectively and to provide more comprehensive, understandable responses. 
@ -312,7 +312,7 @@ export default {
    </p>`,
      knowledgeGraph: `<p>Supported file formats are <b>DOCX, EXCEL, PPT, IMAGE, PDF, TXT, MD, JSON, EML</b>

-<p>This approach chunks files using the 'naive'/'General' method. It splits a document into segments and then combines adjacent segments until the token count exceeds the threshold specified by 'Chunk token number', at which point a chunk is created.</p>
+<p>This approach chunks files using the 'naive'/'General' method. It splits a document into segments and then combines adjacent segments until the token count exceeds the threshold specified by 'Chunk token number for text', at which point a chunk is created.</p>
 <p>The chunks are then fed to the LLM to extract entities and relationships for a knowledge graph and a mind map.</p>
 <p>Ensure that you set the <b>Entity types</b>.</p>`,
      tag: `<p>Knowledge base using 'Tag' as a chunking method is supposed to be used by other knowledge bases to add tags to their chunks, queries to which will also be with tags too.</p>
--- a/web/src/locales/zh-traditional.ts
+++ b/web/src/locales/zh-traditional.ts
@ -137,7 +137,7 @@ export default {
      fromMessage: '缺少起始頁碼',
      toPlaceholder: '到',
      toMessage: '缺少結束頁碼（不包含）',
-      layoutRecognize: '文件解析器',
+      layoutRecognize: 'PDF解析器',
      layoutRecognizeTip:
        '使用視覺模型進行 PDF 布局分析，以更好地識別文檔結構，找到標題、文字塊、圖像和表格的位置。若選擇 Naive 選項，則只能取得 PDF 的純文字。請注意此功能僅適用於 PDF 文檔，對其他文檔不生效。',
      taskPageSize: '任務頁面大小',
@ -159,7 +159,7 @@ export default {
      rerankTip: `如果是空的。它使用查詢和塊的嵌入來構成矢量餘弦相似性。否則，它使用rerank評分代替矢量餘弦相似性。`,
      topK: 'Top-K',
      topKTip: `K塊將被送入Rerank型號。`,
-      delimiter: `分段標識符`,
+      delimiter: `文字分段標識符`,
      delimiterTip:
        '支援多字元作為分隔符，多字元分隔符用`包裹。如配置成這樣：\n`##`;那麼就會用換行，兩個#以及分號先對文字進行分割，然後按照「 token number」大小進行拼裝。',
      html4excel: '表格轉HTML',
@ -206,7 +206,7 @@ export default {
      languagePlaceholder: '請輸入語言',
      permissions: '權限',
      embeddingModel: '嵌入模型',
-      chunkTokenNumber: '塊Token數',
+      chunkTokenNumber: '文字的區塊標記編號',
      chunkTokenNumberMessage: '塊Token數是必填項',
      embeddingModelTip:
        '用於嵌入塊的嵌入模型。一旦知識庫有了塊，它就無法更改。如果你想改變它，你需要刪除所有的塊。',
--- a/web/src/locales/zh.ts
+++ b/web/src/locales/zh.ts
@ -137,7 +137,7 @@ export default {
      fromMessage: '缺少起始页码',
      toPlaceholder: '到',
      toMessage: '缺少结束页码（不包含）',
-      layoutRecognize: '文档解析器',
+      layoutRecognize: 'PDF解析器',
      layoutRecognizeTip:
        '使用视觉模型进行 PDF 布局分析，以更好地识别文档结构，找到标题、文本块、图像和表格的位置。 如果选择 Naive 选项，则只能获取 PDF 的纯文本。请注意该功能只适用于 PDF 文档，对其他文档不生效。',
      taskPageSize: '任务页面大小',
@ -159,7 +159,7 @@ export default {
      rerankTip: `非必选项：若不选择 rerank 模型，系统将默认采用关键词相似度与向量余弦相似度相结合的混合查询方式；如果设置了 rerank 模型，则混合查询中的向量相似度部分将被 rerank 打分替代。请注意：采用 rerank 模型会非常耗时。`,
      topK: 'Top-K',
      topKTip: `K块将被送入Rerank型号。`,
-      delimiter: `分段标识符`,
+      delimiter: `文本分段标识符`,
      delimiterTip:
        '支持多字符作为分隔符，多字符分隔符用`包裹。如配置成这样：\n`##`;那么就会用换行，两个#以及分号先对文本进行分割，然后按照“ token number”大小进行拼装。',
      html4excel: '表格转HTML',
@ -206,7 +206,7 @@ export default {
      languagePlaceholder: '请输入语言',
      permissions: '权限',
      embeddingModel: '嵌入模型',
-      chunkTokenNumber: '块Token数',
+      chunkTokenNumber: '文本的块标记编号',
      chunkTokenNumberMessage: '块Token数是必填项',
      embeddingModelTip:
        '用于嵌入块的嵌入模型。 一旦知识库有了块，它就无法更改。 如果你想改变它，你需要删除所有的块。',