mirror of
https://git.mirrors.martin98.com/https://github.com/infiniflow/ragflow.git
synced 2025-08-12 21:28:59 +08:00
Add docs for tag sets (#5890)
### What problem does this PR solve? #5716, #5529 ### Type of change - [x] Documentation Update
This commit is contained in:
parent
715e2b48ca
commit
bd3fa317e7
@ -9,7 +9,7 @@ Initiate an AI-powered chat with a configured chat assistant.
|
||||
|
||||
---
|
||||
|
||||
Knowledge base, hallucination-free chat, and file management are the three pillars of RAGFlow. Chats in RAGFlow are based on a particular knowledge base or multiple knowledge bases. Once you have created your knowledge base and finished file parsing, you can go ahead and start an AI conversation.
|
||||
Knowledge base, hallucination-free chat, and file management are the three pillars of RAGFlow. Chats in RAGFlow are based on a particular knowledge base or multiple knowledge bases. Once you have created your knowledge base, finished file parsing, and [run a retrieval test](../dataset/run_retrieval_test.md), you can go ahead and start an AI conversation.
|
||||
|
||||
## Start an AI chat
|
||||
|
||||
|
@ -124,6 +124,8 @@ RAGFlow uses multiple recall of both full-text search and vector search in its c
|
||||
- Similarity threshold: Chunks with similarities below the threshold will be filtered. By default, it is set to 0.2.
|
||||
- Vector similarity weight: The percentage by which vector similarity contributes to the overall score. By default, it is set to 0.3.
|
||||
|
||||
See [Run retrieval test](./run_retrieval_test.md) for details.
|
||||
|
||||

|
||||
|
||||
## Search for knowledge base
|
||||
|
@ -13,7 +13,7 @@ To enhance multi-hop question-answering, RAGFlow adds a knowledge graph construc
|
||||
|
||||

|
||||
|
||||
As of v0.17.0, RAGFlow supports constructing a knowledge graph on a knowledge base, allowing you to construct a *unified* graph across multiple files within your knowledge base. When a newly uploaded file starts parsing, the generated graph will automatically update.
|
||||
From v0.16.0 onward, RAGFlow supports constructing a knowledge graph on a knowledge base, allowing you to construct a *unified* graph across multiple files within your knowledge base. When a newly uploaded file starts parsing, the generated graph will automatically update.
|
||||
|
||||
:::danger WARNING
|
||||
Constructing a knowledge graph requires significant memory, computational resources, and tokens.
|
||||
|
@ -9,7 +9,7 @@ Conduct a retrieval test on your knowledge base to check whether the intended ch
|
||||
|
||||
---
|
||||
|
||||
After your files are uploaded and parsed, it is recommended that you run a retrieval test before proceeding with the chat assistant configuration. Just like fine-tuning a precision instrument, RAGFlow requires careful tuning to deliver optimal question answering performance. Your knowledge base settings, chat assistant configurations, and the specified large and small models can all significantly impact the final results. Running a retrieval test verifies whether the intended chunks can be recovered, allowing you to quickly identify areas for improvement or pinpoint any issue that needs addressing. For instance, when debugging your question answering system, if you know that the correct chunks can be retrieved, you can focus your efforts elsewhere.
|
||||
After your files are uploaded and parsed, it is recommended that you run a retrieval test before proceeding with the chat assistant configuration. Running a retrieval test is *not* an unnecessary or superfluous step at all! Just like fine-tuning a precision instrument, RAGFlow requires careful tuning to deliver optimal question answering performance. Your knowledge base settings, chat assistant configurations, and the specified large and small models can all significantly impact the final results. Running a retrieval test verifies whether the intended chunks can be recovered, allowing you to quickly identify areas for improvement or pinpoint any issue that needs addressing. For instance, when debugging your question answering system, if you know that the correct chunks can be retrieved, you can focus your efforts elsewhere. For example, in issue [#5627](https://github.com/infiniflow/ragflow/issues/5627), the problem was found to be due to the LLM's limitations.
|
||||
|
||||
During a retrieval test, chunks created from your specified chunk method are retrieved using a hybrid search. This search combines weighted keyword similarity with either weighted vector cosine similarity or a weighted reranking score, depending on your settings:
|
||||
|
||||
|
103
docs/guides/dataset/use_tag_sets.md
Normal file
103
docs/guides/dataset/use_tag_sets.md
Normal file
@ -0,0 +1,103 @@
|
||||
---
|
||||
sidebar_position: 6
|
||||
slug: /use_tag_sets
|
||||
---
|
||||
|
||||
# Use tag set
|
||||
|
||||
Use a tag set to tag chunks in your datasets.
|
||||
|
||||
---
|
||||
|
||||
Retrieval accuracy is the touchstone for a production-ready RAG framework. In addition to retrieval-enhancing approaches like auto-keyword, auto-question, and knowledge graph, RAGFlow introduces an auto-tagging feature to address semantic gaps. The auto-tagging feature automatically maps tags in the user-defined tag sets to relevant chunks within your knowledge base based on similarity with each chunk. This automation mechanism allows you to apply an additional "layer" of domain-specific knowledge to existing datasets, which is particularly useful when dealing with a large number of chunks.
|
||||
|
||||
To use this feature, ensure you have at least one properly configured tag set, specify the tag set(s) on the **Configuration** page of your knowledge base (dataset), and then re-parse your documents to initiate the auto-tag process. During this process, each chunk in your dataset is compared with every entry in the specified tag set(s), and tags are automatically applied based on similarity.
|
||||
|
||||
## Scenarios
|
||||
|
||||
Auto-tagging applies in situations where chunks are so similar to each other that the intended chunks cannot be distinguished from the rest. For example, when you have a few chunks about iPhone and a majority about iPhone case or iPhone accessaries, it becomes difficult to retrieve the iPhone-specific chunks without additional information.
|
||||
|
||||
## Create tag set
|
||||
|
||||
You can consider a tag set as a closed set, and the tags to attach to the chunks in your dataset (knowledge base) are *exclusively* from the specified tag set. You use a tag set to "inform" RAGFlow which chunks to tag and which tags to apply.
|
||||
|
||||
### Prepare a tag table file
|
||||
|
||||
A tag set can comprise one or multiple table files in XLSX, CSV, or TXT formats. Each table file in the tag set contains two columns, **Description** and **Tag**:
|
||||
|
||||
- The first column provides descriptions of the tags listed in the second column. These descriptions can be example chunks or example queries. Similarity will be calculated between each entry in this column and every chunk in your dataset.
|
||||
- The **Tag** column includes tags to pair with the description entries. Multiple tags should be separated by a comma (,).
|
||||
|
||||
:::tip NOTE
|
||||
As a rule of thumb, consider including the following entries in your tag table:
|
||||
|
||||
- Descriptions of intended chunks, along with their corresponding tags.
|
||||
- User queries that fail to retrieve the correct responses using other methods, ensuring their tags match the intended chunks in your dataset.
|
||||
:::
|
||||
|
||||
### Create a tag set
|
||||
|
||||
1. Click **+ Create knowledge base** to create a knowledge base.
|
||||
2. Navigate to the **Configuration** page of the created knowledge base and choose **Tag** as the default chunk method.
|
||||
3. Navigate to the **Dataset** page and upload and parse your table file in XLSX, CSV, or TXT formats.
|
||||
_A tag cloud appears under the **Tag view** section, indicating the tag set is created:_
|
||||

|
||||
4. Click the **Table** tab to view the tag frequency table:
|
||||

|
||||
|
||||
:::danger IMPORTANT
|
||||
A tag set is *not* involved in document indexing or retrieval. Do not specify a tag set when configuring your chat assistant or agent.
|
||||
:::
|
||||
|
||||
## Tag chunks
|
||||
|
||||
Once a tag set is created, you can apply it to your dataset:
|
||||
|
||||
1. Navigate to the **Configuration** page of your knowledge base (dataset).
|
||||
2. Select the tag set from the **Tag sets** dropdown and click **Save** to confirm.
|
||||
|
||||
:::tip NOTE
|
||||
If the tag set is missing from the dropdown, check that it has been created or configured correctly.
|
||||
:::
|
||||
|
||||
3. Re-parse your documents to start the auto-tagging process.
|
||||
_In an AI chat scenario using auto-tagged datasets, each query will be tagged using the corresponding tag set(s) and chunks with these tags will have a higher chance to be retrieved._
|
||||
|
||||
## Update tag set
|
||||
|
||||
Creating a tag set is *not* for once and for all. Oftentimes, you may find it necessary to update or delete existing tags or add new entries.
|
||||
|
||||
- You can update the existing tag set in the tag frequency table.
|
||||
- To add new entries, you can add and parse new table files in XLSX, CSV, or TXT formats.
|
||||
|
||||
### Update tag set in tag frequency table
|
||||
|
||||
1. Navigate to the **Configuration** page in your tag set.
|
||||
2. Click the **Table** tab under **Tag view** to view the tag frequncy table, where you can update tag names or delete tags.
|
||||
|
||||
:::danger IMPORTANT
|
||||
When a tag set is updated, you must re-parse the documents in your dataset so that their tags can be updated accordingly.
|
||||
:::
|
||||
|
||||
### Add new table files
|
||||
|
||||
1. Navigate to the **Configuration** page in your tag set.
|
||||
2. Navigate to the **Dataset** page and upload and parse your table file in XLSX, CSV, or TXT formats.
|
||||
|
||||
:::danger IMPORTANT
|
||||
If you add new table files to your tag set, it is at your own discretion whether to re-parse your documents in your datasets.
|
||||
:::
|
||||
|
||||
## Frequently asked questions
|
||||
|
||||
### Can I reference more than one tag set?
|
||||
|
||||
Yes, you can. Usually one tag set suffices. When using multiple tag sets, ensure they are independent of each other; otherwise, consider merging your tag sets.
|
||||
|
||||
### Difference between a tag set and a standard knowledge base?
|
||||
|
||||
A standard knowledge base is a dataset. It will be searched by RAGFlow's document engine and the retrieved chunks will be fed to the LLM. In contrast, a tag set is used solely to attach tags to chunks within your dataset. It does not directly participate in the retrieval process, and you should not choose a tag set when selecting datasets for your chat assistant or agent.
|
||||
|
||||
### Difference between auto-tag and auto-keyword?
|
||||
|
||||
Both features enhance retrieval in RAGFlow. The auto-keyword feature relies on the LLM and consumes a significant number of tokens, whereas the auto-tag feature is based on vector similarity and predefined tag set(s). You can view the keywords applied in the auto-keyword feature as an open set, as they are generated by the LLM. In contrast, a tag set can be considered a user-defined close set, requiring upload tag set(s) in specified formats before use.
|
@ -91,7 +91,7 @@ export default {
|
||||
namePlaceholder: 'Please input name!',
|
||||
doc: 'Docs',
|
||||
datasetDescription:
|
||||
'😉 Please wait for your file to finish parsing before starting an AI-powered chat.',
|
||||
'😉 Please wait for your files to finish parsing before starting an AI-powered chat.',
|
||||
addFile: 'Add file',
|
||||
searchFiles: 'Search your files',
|
||||
localFiles: 'Local files',
|
||||
@ -223,8 +223,8 @@ export default {
|
||||
english: 'English',
|
||||
chinese: 'Chinese',
|
||||
portugueseBr: 'Portuguese (Brazil)',
|
||||
embeddingModelPlaceholder: 'Please select a embedding model',
|
||||
chunkMethodPlaceholder: 'Please select a chunk method',
|
||||
embeddingModelPlaceholder: 'Please select a embedding model.',
|
||||
chunkMethodPlaceholder: 'Please select a chunk method.',
|
||||
save: 'Save',
|
||||
me: 'Only me',
|
||||
team: 'Team',
|
||||
@ -233,7 +233,7 @@ export default {
|
||||
methodExamples: 'Examples',
|
||||
methodExamplesDescription:
|
||||
'The following screenshots are provided for clarity.',
|
||||
dialogueExamplesTitle: 'Dialogue examples',
|
||||
dialogueExamplesTitle: 'view',
|
||||
methodEmpty:
|
||||
'This will display a visual explanation of the knowledge base categories',
|
||||
book: `<p>Supported file formats are <b>DOCX</b>, <b>PDF</b>, <b>TXT</b>.</p><p>
|
||||
@ -430,7 +430,7 @@ This auto-tag feature enhances retrieval by adding another layer of domain-speci
|
||||
knowledgeBasesMessage: 'Please select',
|
||||
knowledgeBasesTip:
|
||||
'Select the knowledge bases to associate with this chat assistant.',
|
||||
system: 'System',
|
||||
system: 'System prompt',
|
||||
systemInitialValue: `You are an intelligent assistant. Please summarize the content of the knowledge base to answer the question. Please list the data in the knowledge base and answer in detail. When all knowledge base content is irrelevant to the question, your answer must include the sentence "The answer you are looking for is not found in the knowledge base!" Answers need to consider chat history.
|
||||
Here is the knowledge base:
|
||||
{knowledge}
|
||||
@ -441,7 +441,7 @@ This auto-tag feature enhances retrieval by adding another layer of domain-speci
|
||||
topN: 'Top N',
|
||||
topNTip: `Not all chunks with similarity score above the 'similarity threshold' will be sent to the LLM. This selects 'Top N' chunks from the retrieved ones.`,
|
||||
variable: 'Variable',
|
||||
variableTip: `Variables can assist in developing more flexible strategies, particularly when you are using our chat assistant management APIs. These variables will be used by 'System' as part of the prompts for the LLM. The variable {knowledge} is a reserved special variable representing your selected knowledge base(s), and all variables should be enclosed in curly braces {}.`,
|
||||
variableTip: `Variables can assist in developing more flexible strategies, particularly when you are using our chat assistant management APIs. These variables will be used by 'System prompt' as part of the prompts for the LLM. The variable {knowledge} is a reserved special variable representing your selected knowledge base(s), and all variables should be enclosed in curly braces {}.`,
|
||||
add: 'Add',
|
||||
key: 'Key',
|
||||
optional: 'Optional',
|
||||
|
@ -185,7 +185,7 @@ export default {
|
||||
knowledgeBases: 'Bases de conocimiento',
|
||||
knowledgeBasesMessage: 'Por favor selecciona',
|
||||
knowledgeBasesTip: 'Selecciona las bases de conocimiento asociadas.',
|
||||
system: 'Sistema',
|
||||
system: 'prompt del sistema',
|
||||
systemInitialValue: `Eres un asistente inteligente. Por favor resume el contenido de la base de conocimiento para responder la pregunta. Enumera los datos en la base de conocimiento y responde con detalle. Cuando todo el contenido de la base de conocimiento sea irrelevante para la pregunta, tu respuesta debe incluir la frase "¡La respuesta que buscas no se encuentra en la base de conocimiento!". Las respuestas necesitan considerar el historial de chat.
|
||||
Aquí está la base de conocimiento:
|
||||
{knowledge}
|
||||
@ -197,9 +197,9 @@ export default {
|
||||
topNTip: `No todos los fragmentos cuya puntuación de similitud esté por encima del "umbral de similitud" serán enviados a los LLMs. Los LLMs solo pueden ver estos "Top N" fragmentos.`,
|
||||
variable: 'Variable',
|
||||
variableTip: `Si usas APIs de diálogo, las variables pueden ayudarte a chatear con tus clientes usando diferentes estrategias.
|
||||
Las variables se utilizan para completar la parte "Sistema" del prompt para darle una pista al LLM.
|
||||
Las variables se utilizan para completar la parte "prompt del sistema" del prompt para darle una pista al LLM.
|
||||
La "base de conocimiento" es una variable muy especial que se completará con los fragmentos recuperados.
|
||||
Todas las variables en "Sistema" deben estar entre llaves.`,
|
||||
Todas las variables en "prompt del sistema" deben estar entre llaves.`,
|
||||
add: 'Agregar',
|
||||
key: 'Clave',
|
||||
optional: 'Opcional',
|
||||
|
@ -355,7 +355,7 @@ export default {
|
||||
knowledgeBases: 'Basis Pengetahuan',
|
||||
knowledgeBasesMessage: 'Silakan pilih',
|
||||
knowledgeBasesTip: 'Pilih basis pengetahuan yang terkait.',
|
||||
system: 'Sistem',
|
||||
system: 'Prompt Sistem',
|
||||
systemInitialValue: `Anda adalah asisten cerdas. Silakan rangkum konten basis pengetahuan untuk menjawab pertanyaan. Silakan daftar data di basis pengetahuan dan jawab secara detail. Ketika semua konten basis pengetahuan tidak relevan dengan pertanyaan, jawaban Anda harus menyertakan kalimat "Jawaban yang Anda cari tidak ditemukan di basis pengetahuan!" Jawaban perlu mempertimbangkan riwayat obrolan.
|
||||
Berikut adalah basis pengetahuan:
|
||||
{knowledge}
|
||||
@ -367,9 +367,9 @@ export default {
|
||||
topNTip: `Tidak semua potongan yang skor kesamaannya di atas 'ambang kesamaan' akan diberikan ke LLM. LLM hanya dapat melihat potongan 'Top N' ini.`,
|
||||
variable: 'Variabel',
|
||||
variableTip: `Jika Anda menggunakan API dialog, variabel mungkin membantu Anda berbicara dengan klien Anda dengan strategi yang berbeda.
|
||||
Variabel digunakan untuk mengisi bagian 'Sistem' dalam prompt untuk memberikan petunjuk kepada LLM.
|
||||
Variabel digunakan untuk mengisi bagian 'Prompt Sistem' dalam prompt untuk memberikan petunjuk kepada LLM.
|
||||
'knowledge' adalah variabel yang sangat khusus yang akan diisi dengan potongan yang diambil.
|
||||
Semua variabel dalam 'Sistem' harus diberi kurung kurawal.`,
|
||||
Semua variabel dalam 'Prompt Sistem' harus diberi kurung kurawal.`,
|
||||
add: 'Tambah',
|
||||
key: 'Kunci',
|
||||
optional: 'Opsional',
|
||||
|
@ -154,7 +154,7 @@ export default {
|
||||
cancel: 'キャンセル',
|
||||
rerankModel: 'リランキングモデル',
|
||||
rerankPlaceholder: '選択してください',
|
||||
rerankTip: `リランキングモデルを選択しない場合、RAGFlowはデフォルトの重み付きベクトルコサイン類似度を使用します。`,
|
||||
rerankTip: `オプション:Rerankモデルを選択しない場合、システムはデフォルトでキーワードの類似度とベクトルのコサイン類似度を組み合わせたハイブリッド検索方式を採用します。Rerankモデルを設定した場合、ハイブリッド検索のベクトル類似度部分はrerankのスコアに置き換えられます。`,
|
||||
topK: 'トップK',
|
||||
topKTip: `Kチャンクがリランキングモデルに供給されます。`,
|
||||
delimiter: `区切り文字`,
|
||||
@ -353,7 +353,7 @@ export default {
|
||||
knowledgeBases: 'ナレッジベース',
|
||||
knowledgeBasesMessage: '選択してください',
|
||||
knowledgeBasesTip: '関連付けるナレッジベースを選択してください。',
|
||||
system: 'システム',
|
||||
system: 'システムプロンプト',
|
||||
systemInitialValue: `あなたはインテリジェントなアシスタントです。質問に答えるためにナレッジベースの内容を要約してください。ナレッジベースのデータをリストし、詳細に答えてください。すべてのナレッジベースの内容が質問に関連しない場合、回答には「ナレッジベースにはお探しの回答が見つかりません!」という文を含める必要があります。回答はチャット履歴を考慮する必要があります。
|
||||
こちらがナレッジベースです:
|
||||
{knowledge}
|
||||
@ -364,9 +364,9 @@ export default {
|
||||
topNTip: `類似度スコアがしきい値を超えるチャンクのうち、上位N件のみがLLMに供給されます。`,
|
||||
variable: '変数',
|
||||
variableTip: `ダイアログAPIを使用する場合、変数は異なる戦略でクライアントとチャットするのに役立ちます。
|
||||
変数はプロンプトの'システム'部分を埋めるために使用され、LLMにヒントを与えます。
|
||||
変数はプロンプトの'システムプロンプト'部分を埋めるために使用され、LLMにヒントを与えます。
|
||||
'ナレッジ'は取得されたチャンクで埋められる非常に特別な変数です。
|
||||
'システム'のすべての変数は中括弧で囲む必要があります。`,
|
||||
'システムプロンプト'のすべての変数は中括弧で囲む必要があります。`,
|
||||
add: '追加',
|
||||
key: 'キー',
|
||||
optional: 'オプション',
|
||||
|
@ -342,15 +342,15 @@ export default {
|
||||
tagSet: '標籤庫',
|
||||
topnTags: 'Top-N 標籤',
|
||||
tagSetTip: `
|
||||
<p> 選擇「標籤」知識庫有助於標記每個區塊。 </p>
|
||||
<p>對這些區塊的查詢也將帶有標籤。
|
||||
此過程將透過向資料集添加更多資訊來提高檢索精度,特別是當存在大量區塊時。
|
||||
<p>標籤和關鍵字的差異:</p>
|
||||
<ul>
|
||||
<li>標籤是一個閉集,由使用者定義和操作,而關鍵字是一個開集。
|
||||
<li>您需要在使用前上傳包含範例的標籤集。
|
||||
<li>關鍵字由 LLM 生成,既昂貴又耗時。
|
||||
</ul>
|
||||
<p>請選擇一個或多個標籤集或標籤知識庫,用於對知識庫中的每個文本塊進行標記。</p>
|
||||
<p>對這些文本塊的查詢也將自動關聯相應標籤。</p>
|
||||
<p>此功能基於文本相似度,能夠為數據集的文本塊批量添加更多領域知識,從而顯著提高檢索準確性。該功能還能提升大量文本塊的操作效率。</p>
|
||||
<p>為了更好地理解標籤集的作用,以下是標籤集和關鍵詞之間的主要區別:</p>
|
||||
<ul>
|
||||
<li>標籤集是一個由用戶定義和管理的封閉集,而自動生成的關鍵詞屬於開放集合。</li>
|
||||
<li>在給你的知識庫文本塊批量打標籤之前,你需要先生成標籤集作為樣本。</li>
|
||||
<li>自動關鍵詞功能中的關鍵詞由 LLM 生成,此過程相對耗時,並且會產生一定的 Token 消耗。</li>
|
||||
</ul>
|
||||
`,
|
||||
tags: '標籤',
|
||||
addTag: '增加標籤',
|
||||
@ -413,7 +413,7 @@ export default {
|
||||
knowledgeBases: '知識庫',
|
||||
knowledgeBasesMessage: '請選擇',
|
||||
knowledgeBasesTip: '選擇關聯的知識庫。',
|
||||
system: '系統',
|
||||
system: '系統提示词',
|
||||
systemInitialValue: `你是一個智能助手,請總結知識庫的內容來回答問題,請列舉知識庫中的數據詳細回答。當所有知識庫內容都與問題無關時,你的回答必須包括“知識庫中未找到您要的答案!”這句話。回答需要考慮聊天歷史。
|
||||
以下是知識庫:
|
||||
{knowledge}
|
||||
@ -425,9 +425,9 @@ export default {
|
||||
topNTip: `並非所有相似度得分高於“相似度閾值”的塊都會被提供給法學碩士。LLM 只能看到這些“Top N”塊。`,
|
||||
variable: '變量',
|
||||
variableTip: `如果您使用对话 API,变量可能会帮助您使用不同的策略与客户聊天。
|
||||
这些变量用于填写提示中的“系统”部分,以便给LLM一个提示。
|
||||
这些变量用于填写提示中的“系统提示词”部分,以便给LLM一个提示。
|
||||
“知识”是一个非常特殊的变量,它将用检索到的块填充。
|
||||
“System”中的所有变量都应该用大括号括起来。`,
|
||||
“系统提示词”中的所有变量都应该用大括号括起来。`,
|
||||
add: '新增',
|
||||
key: '關鍵字',
|
||||
optional: '可選的',
|
||||
@ -435,7 +435,7 @@ export default {
|
||||
model: '模型',
|
||||
modelTip: '大語言聊天模型',
|
||||
modelMessage: '請選擇',
|
||||
freedom: '自由',
|
||||
freedom: '自由度',
|
||||
improvise: '即興創作',
|
||||
precise: '精確',
|
||||
balance: '平衡',
|
||||
|
@ -156,7 +156,7 @@ export default {
|
||||
cancel: '取消',
|
||||
rerankModel: 'Rerank模型',
|
||||
rerankPlaceholder: '请选择',
|
||||
rerankTip: `如果是空的。它使用查询和块的嵌入来构成矢量余弦相似性。否则,它使用rerank评分代替矢量余弦相似性。`,
|
||||
rerankTip: `非必选项:若不选择 rerank 模型,系统将默认采用关键词相似度与向量余弦相似度相结合的混合查询方式;如果设置了 rerank 模型,则混合查询中的向量相似度部分将被 rerank 打分替代。请注意:采用 rerank 模型会非常耗时。`,
|
||||
topK: 'Top-K',
|
||||
topKTip: `K块将被送入Rerank型号。`,
|
||||
delimiter: `分段标识符`,
|
||||
@ -355,17 +355,17 @@ export default {
|
||||
searchTags: '搜索标签',
|
||||
tagCloud: '云',
|
||||
tagTable: '表',
|
||||
tagSet: '标签库',
|
||||
tagSet: '标签集',
|
||||
topnTags: 'Top-N 标签',
|
||||
tagSetTip: `
|
||||
<p> 选择“标签”知识库有助于标记每个块。 </p>
|
||||
<p>对这些块的查询也将带有标签。 </p>
|
||||
此过程将通过向数据集添加更多信息来提高检索的准确性,尤其是在存在大量块的情况下。
|
||||
<p>标签和关键字之间的区别:</p>
|
||||
<p> 请选择一个或多个标签集或标签知识库,用于对知识库中的每个文本块进行标记。 </p>
|
||||
<p>对这些文本块的查询也将自动关联相应标签。 </p>
|
||||
<p>此功能基于文本相似度,能够为数据集的文本块批量添加更多领域知识,从而显著提高检索准确性。该功能还能提升大量文本块的操作效率。</p>
|
||||
<p>为了更好地理解标签集的作用,以下是标签集和关键词之间的主要区别:</p>
|
||||
<ul>
|
||||
<li>标签是一个由用户定义和操作的封闭集,而关键字是一个开放集。 </li>
|
||||
<li>您需要在使用前上传带有样本的标签集。 </li>
|
||||
<li>关键字由 LLM 生成,这既昂贵又耗时。 </li>
|
||||
<li>标签集是一个由用户定义和管理的封闭集,而自动生成的关键词属于开放集合。 </li>
|
||||
<li>在给你的知识库文本块批量打标签之前,你需要先生成标签集作为样本。 </li>
|
||||
<li>自动关键词功能中的关键词由 LLM 生成,此过程相对耗时,并且会产生一定的 Token 消耗。 </li>
|
||||
</ul>
|
||||
`,
|
||||
tags: '标签',
|
||||
@ -429,7 +429,7 @@ General:实体和关系提取提示来自 GitHub - microsoft/graphrag:基于
|
||||
knowledgeBases: '知识库',
|
||||
knowledgeBasesMessage: '请选择',
|
||||
knowledgeBasesTip: '选择关联的知识库。',
|
||||
system: '系统',
|
||||
system: '系统提示词',
|
||||
systemInitialValue: `你是一个智能助手,请总结知识库的内容来回答问题,请列举知识库中的数据详细回答。当所有知识库内容都与问题无关时,你的回答必须包括“知识库中未找到您要的答案!”这句话。回答需要考虑聊天历史。
|
||||
以下是知识库:
|
||||
{knowledge}
|
||||
@ -441,9 +441,9 @@ General:实体和关系提取提示来自 GitHub - microsoft/graphrag:基于
|
||||
topNTip: `并非所有相似度得分高于“相似度阈值”的块都会被提供给大语言模型。 LLM 只能看到这些“Top N”块。`,
|
||||
variable: '变量',
|
||||
variableTip: `如果您使用对话 API,变量可能会帮助您使用不同的策略与客户聊天。
|
||||
这些变量用于填写提示中的“系统”部分,以便给LLM一个提示。
|
||||
这些变量用于填写提示中的“系统提示词”部分,以便给LLM一个提示。
|
||||
“知识”是一个非常特殊的变量,它将用检索到的块填充。
|
||||
“System”中的所有变量都应该用大括号括起来。`,
|
||||
“系统提示词”中的所有变量都应该用大括号括起来。`,
|
||||
add: '新增',
|
||||
key: '关键字',
|
||||
optional: '可选的',
|
||||
@ -451,7 +451,7 @@ General:实体和关系提取提示来自 GitHub - microsoft/graphrag:基于
|
||||
model: '模型',
|
||||
modelTip: '大语言聊天模型',
|
||||
modelMessage: '请选择',
|
||||
freedom: '自由',
|
||||
freedom: '自由度',
|
||||
improvise: '即兴创作',
|
||||
precise: '精确',
|
||||
balance: '平衡',
|
||||
|
Loading…
x
Reference in New Issue
Block a user