1301 lines
42 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

import { CodeGroup } from '@/app/components/develop/code.tsx'
import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from '@/app/components/develop/md.tsx'
# Knowledge API
<div>
### Authentication
Service API of Dify authenticates using an `API-Key`.
It is suggested that developers store the `API-Key` in the backend instead of sharing or storing it in the client side to avoid the leakage of the `API-Key`, which may lead to property loss.
All API requests should include your `API-Key` in the **`Authorization`** HTTP Header, as shown below:
<CodeGroup title="Code">
```javascript
Authorization: Bearer {API_KEY}
```
</CodeGroup>
</div>
---
<Heading
url='/datasets/{dataset_id}/document/create_by_text'
method='POST'
title='Create a document from text'
name='#create_by_text'
/>
<Row>
<Col>
This api is based on an existing Knowledge and creates a new document through text based on this Knowledge.
### Params
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
Knowledge ID
</Property>
</Properties>
### Request Body
<Properties>
<Property name='name' type='string' key='name'>
Document name
</Property>
<Property name='text' type='string' key='text'>
Document content
</Property>
<Property name='indexing_technique' type='string' key='indexing_technique'>
Index mode
- <code>high_quality</code> High quality: embedding using embedding model, built as vector database index
- <code>economy</code> Economy: Build using inverted index of Keyword Table Index
</Property>
<Property name='process_rule' type='object' key='process_rule'>
Processing rules
- <code>mode</code> (string) Cleaning, segmentation mode, automatic / custom
- <code>rules</code> (object) Custom rules (in automatic mode, this field is empty)
- <code>pre_processing_rules</code> (array[object]) Preprocessing rules
- <code>id</code> (string) Unique identifier for the preprocessing rule
- enumerate
- <code>remove_extra_spaces</code> Replace consecutive spaces, newlines, tabs
- <code>remove_urls_emails</code> Delete URL, email address
- <code>enabled</code> (bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value.
- <code>segmentation</code> (object) segmentation rules
- <code>separator</code> Custom segment identifier, currently only allows one delimiter to be set. Default is \n
- <code>max_tokens</code> Maximum length (token) defaults to 1000
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="POST"
label="/datasets/{dataset_id}/document/create_by_text"
targetCode={`curl --location --request POST '${props.apiBaseUrl}/datasets/{dataset_id}/document/create_by_text' \\\n--header 'Authorization: Bearer {api_key}' \\\n--header 'Content-Type: application/json' \\\n--data-raw '{"name": "text","text": "text","indexing_technique": "high_quality","process_rule": {"mode": "automatic"}}'`}
>
```bash {{ title: 'cURL' }}
curl --location --request POST '${props.apiBaseUrl}/datasets/{dataset_id}/document/create_by_text' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json' \
--data-raw '{
"name": "text",
"text": "text",
"indexing_technique": "high_quality",
"process_rule": {
"mode": "automatic"
}
}'
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"document": {
"id": "",
"position": 1,
"data_source_type": "upload_file",
"data_source_info": {
"upload_file_id": ""
},
"dataset_process_rule_id": "",
"name": "text.txt",
"created_from": "api",
"created_by": "",
"created_at": 1695690280,
"tokens": 0,
"indexing_status": "waiting",
"error": null,
"enabled": true,
"disabled_at": null,
"disabled_by": null,
"archived": false,
"display_status": "queuing",
"word_count": 0,
"hit_count": 0,
"doc_form": "text_model"
},
"batch": ""
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets/{dataset_id}/document/create_by_file'
method='POST'
title='Create documents from files'
name='#create_by_file'
/>
<Row>
<Col>
This api is based on an existing Knowledge and creates a new document through a file based on this Knowledge.
### Params
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
Knowledge ID
</Property>
</Properties>
### Request Body
<Properties>
<Property name='data' type='multipart/form-data json string' key='data'>
- original_document_id Source document ID (optional)
- Used to re-upload the document or modify the document cleaning and segmentation configuration. The missing information is copied from the source document
- The source document cannot be an archived document
- When original_document_id is passed in, the update operation is performed on behalf of the document. process_rule is a fillable item. If not filled in, the segmentation method of the source document will be used by default
- When original_document_id is not passed in, the new operation is performed on behalf of the document, and process_rule is required
- indexing_technique Index mode
- <code>high_quality</code> High quality: embedding using embedding model, built as vector database index
- <code>economy</code> Economy: Build using inverted index of Keyword Table Index
- process_rule Processing rules
- <code>mode</code> (string) Cleaning, segmentation mode, automatic / custom
- <code>rules</code> (object) Custom rules (in automatic mode, this field is empty)
- <code>pre_processing_rules</code> (array[object]) Preprocessing rules
- <code>id</code> (string) Unique identifier for the preprocessing rule
- enumerate
- <code>remove_extra_spaces</code> Replace consecutive spaces, newlines, tabs
- <code>remove_urls_emails</code> Delete URL, email address
- <code>enabled</code> (bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value.
- <code>segmentation</code> (object) segmentation rules
- <code>separator</code> Custom segment identifier, currently only allows one delimiter to be set. Default is \n
- <code>max_tokens</code> Maximum length (token) defaults to 1000
</Property>
<Property name='file' type='multipart/form-data' key='file'>
Files that need to be uploaded.
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="POST"
label="/datasets/{dataset_id}/document/create_by_file"
targetCode={`curl --location --request POST '${props.apiBaseUrl}/datasets/{dataset_id}/document/create_by_file' \\\n--header 'Authorization: Bearer {api_key}' \\\n--form 'data="{"indexing_technique":"high_quality","process_rule":{"rules":{"pre_processing_rules":[{"id":"remove_extra_spaces","enabled":true},{"id":"remove_urls_emails","enabled":true}],"segmentation":{"separator":"###","max_tokens":500}},"mode":"custom"}}";type=text/plain' \\\n--form 'file=@"/path/to/file"'`}
>
```bash {{ title: 'cURL' }}
curl --location --request POST '${props.apiBaseUrl}/datasets/{dataset_id}/document/create_by_file' \
--header 'Authorization: Bearer {api_key}' \
--form 'data="{\"name\":\"Dify\",\"indexing_technique\":\"high_quality\",\"process_rule\":{\"rules\":{\"pre_processing_rules\":[{\"id\":\"remove_extra_spaces\",\"enabled\":true},{\"id\":\"remove_urls_emails\",\"enabled\":true}],\"segmentation\":{\"separator\":\"###\",\"max_tokens\":500}},\"mode\":\"custom\"}}";type=text/plain' \
--form 'file=@"/path/to/file"'
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"document": {
"id": "",
"position": 1,
"data_source_type": "upload_file",
"data_source_info": {
"upload_file_id": ""
},
"dataset_process_rule_id": "",
"name": "Dify.txt",
"created_from": "api",
"created_by": "",
"created_at": 1695308667,
"tokens": 0,
"indexing_status": "waiting",
"error": null,
"enabled": true,
"disabled_at": null,
"disabled_by": null,
"archived": false,
"display_status": "queuing",
"word_count": 0,
"hit_count": 0,
"doc_form": "text_model"
},
"batch": ""
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets'
method='POST'
title='Create an empty Knowledge'
name='#create_empty_dataset'
/>
<Row>
<Col>
### Request Body
<Properties>
<Property name='name' type='string' key='name'>
Knowledge name
</Property>
<Property name='permission' type='string' key='permission'>
Permission
- <code>only_me</code> Only me
- <code>all_team_members</code> All team members
- <code>partial_members</code> Partial members
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="POST"
label="/datasets"
targetCode={`curl --location --request POST '${props.apiBaseUrl}/datasets' \\\n--header 'Authorization: Bearer {api_key}' \\\n--header 'Content-Type: application/json' \\\n--data-raw '{"name": "name", "permission": "only_me"}'`}
>
```bash {{ title: 'cURL' }}
curl --location --request POST '${apiBaseUrl}/v1/datasets' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json' \
--data-raw '{
"name": "name",
"permission": "only_me"
}'
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"id": "",
"name": "name",
"description": null,
"provider": "vendor",
"permission": "only_me",
"data_source_type": null,
"indexing_technique": null,
"app_count": 0,
"document_count": 0,
"word_count": 0,
"created_by": "",
"created_at": 1695636173,
"updated_by": "",
"updated_at": 1695636173,
"embedding_model": null,
"embedding_model_provider": null,
"embedding_available": null
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets'
method='GET'
title='Knowledge list'
name='#dataset_list'
/>
<Row>
<Col>
### Query
<Properties>
<Property name='page' type='string' key='page'>
Page number
</Property>
<Property name='limit' type='string' key='limit'>
Number of items returned, default 20, range 1-100
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="POST"
label="/datasets"
targetCode={`curl --location --request GET '${props.apiBaseUrl}/datasets?page=1&limit=20' \\\n--header 'Authorization: Bearer {api_key}'`}
>
```bash {{ title: 'cURL' }}
curl --location --request GET '${props.apiBaseUrl}/datasets?page=1&limit=20' \
--header 'Authorization: Bearer {api_key}'
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"data": [
{
"id": "",
"name": "name",
"description": "desc",
"permission": "only_me",
"data_source_type": "upload_file",
"indexing_technique": "",
"app_count": 2,
"document_count": 10,
"word_count": 1200,
"created_by": "",
"created_at": "",
"updated_by": "",
"updated_at": ""
},
...
],
"has_more": true,
"limit": 20,
"total": 50,
"page": 1
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets/{dataset_id}'
method='DELETE'
title='Delete knowledge'
name='#delete_dataset'
/>
<Row>
<Col>
### Params
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
Knowledge ID
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="DELETE"
label="/datasets/{dataset_id}"
targetCode={`curl --location --request DELETE '${props.apiBaseUrl}/datasets/{dataset_id}' \\\n--header 'Authorization: Bearer {api_key}'`}
>
```bash {{ title: 'cURL' }}
curl --location --request DELETE '${props.apiBaseUrl}/datasets/{dataset_id}' \
--header 'Authorization: Bearer {api_key}'
```
</CodeGroup>
<CodeGroup title="Response">
```text {{ title: 'Response' }}
204 No Content
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets/{dataset_id}/documents/{document_id}/update_by_text'
method='POST'
title='Update document via text'
name='#update_by_text'
/>
<Row>
<Col>
This api is based on an existing Knowledge and updates the document through text based on this Knowledge.
### Params
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
Knowledge ID
</Property>
<Property name='document_id' type='string' key='document_id'>
Document ID
</Property>
</Properties>
### Request Body
<Properties>
<Property name='name' type='string' key='name'>
Document name (optional)
</Property>
<Property name='text' type='string' key='text'>
Document content (optional)
</Property>
<Property name='process_rule' type='object' key='process_rule'>
Processing rules
- <code>mode</code> (string) Cleaning, segmentation mode, automatic / custom
- <code>rules</code> (object) Custom rules (in automatic mode, this field is empty)
- <code>pre_processing_rules</code> (array[object]) Preprocessing rules
- <code>id</code> (string) Unique identifier for the preprocessing rule
- enumerate
- <code>remove_extra_spaces</code> Replace consecutive spaces, newlines, tabs
- <code>remove_urls_emails</code> Delete URL, email address
- <code>enabled</code> (bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value.
- <code>segmentation</code> (object) segmentation rules
- <code>separator</code> Custom segment identifier, currently only allows one delimiter to be set. Default is \n
- <code>max_tokens</code> Maximum length (token) defaults to 1000
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="POST"
label="/datasets/{dataset_id}/documents/{document_id}/update_by_text"
targetCode={`curl --location --request POST '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{document_id}/update_by_text' \\\n--header 'Authorization: Bearer {api_key}' \\\n--header 'Content-Type: application/json' \\\n--data-raw '{"name": "name","text": "text"}'`}
>
```bash {{ title: 'cURL' }}
curl --location --request POST '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{document_id}/update_by_text' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json' \
--data-raw '{
"name": "name",
"text": "text"
}'
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"document": {
"id": "",
"position": 1,
"data_source_type": "upload_file",
"data_source_info": {
"upload_file_id": ""
},
"dataset_process_rule_id": "",
"name": "name.txt",
"created_from": "api",
"created_by": "",
"created_at": 1695308667,
"tokens": 0,
"indexing_status": "waiting",
"error": null,
"enabled": true,
"disabled_at": null,
"disabled_by": null,
"archived": false,
"display_status": "queuing",
"word_count": 0,
"hit_count": 0,
"doc_form": "text_model"
},
"batch": ""
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets/{dataset_id}/documents/{document_id}/update_by_file'
method='POST'
title='Update a document from a file'
name='#update_by_file'
/>
<Row>
<Col>
This api is based on an existing Knowledge, and updates documents through files based on this Knowledge
### Params
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
Knowledge ID
</Property>
<Property name='document_id' type='string' key='document_id'>
Document ID
</Property>
</Properties>
### Request Body
<Properties>
<Property name='name' type='string' key='name'>
Document name (optional)
</Property>
<Property name='file' type='multipart/form-data' key='file'>
Files to be uploaded
</Property>
<Property name='process_rule' type='object' key='process_rule'>
Processing rules
- <code>mode</code> (string) Cleaning, segmentation mode, automatic / custom
- <code>rules</code> (object) Custom rules (in automatic mode, this field is empty)
- <code>pre_processing_rules</code> (array[object]) Preprocessing rules
- <code>id</code> (string) Unique identifier for the preprocessing rule
- enumerate
- <code>remove_extra_spaces</code> Replace consecutive spaces, newlines, tabs
- <code>remove_urls_emails</code> Delete URL, email address
- <code>enabled</code> (bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value.
- <code>segmentation</code> (object) segmentation rules
- <code>separator</code> Custom segment identifier, currently only allows one delimiter to be set. Default is \n
- <code>max_tokens</code> Maximum length (token) defaults to 1000
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="POST"
label="/datasets/{dataset_id}/documents/{document_id}/update_by_file"
targetCode={`curl --location --request POST '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{document_id}/update_by_file' \\\n--header 'Authorization: Bearer {api_key}' \\\n--form 'data="{"name":"Dify","indexing_technique":"high_quality","process_rule":{"rules":{"pre_processing_rules":[{"id":"remove_extra_spaces","enabled":true},{"id":"remove_urls_emails","enabled":true}],"segmentation":{"separator":"###","max_tokens":500}},"mode":"custom"}}";type=text/plain' \\\n--form 'file=@"/path/to/file"'`}
>
```bash {{ title: 'cURL' }}
curl --location --request POST '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{document_id}/update_by_file' \
--header 'Authorization: Bearer {api_key}' \
--form 'data="{\"name\":\"Dify\",\"indexing_technique\":\"high_quality\",\"process_rule\":{\"rules\":{\"pre_processing_rules\":[{\"id\":\"remove_extra_spaces\",\"enabled\":true},{\"id\":\"remove_urls_emails\",\"enabled\":true}],\"segmentation\":{\"separator\":\"###\",\"max_tokens\":500}},\"mode\":\"custom\"}}";type=text/plain' \
--form 'file=@"/path/to/file"'
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"document": {
"id": "",
"position": 1,
"data_source_type": "upload_file",
"data_source_info": {
"upload_file_id": ""
},
"dataset_process_rule_id": "",
"name": "Dify.txt",
"created_from": "api",
"created_by": "",
"created_at": 1695308667,
"tokens": 0,
"indexing_status": "waiting",
"error": null,
"enabled": true,
"disabled_at": null,
"disabled_by": null,
"archived": false,
"display_status": "queuing",
"word_count": 0,
"hit_count": 0,
"doc_form": "text_model"
},
"batch": "20230921150427533684"
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets/{dataset_id}/documents/{batch}/indexing-status'
method='GET'
title='Get document embedding status (progress)'
name='#indexing_status'
/>
<Row>
<Col>
### Params
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
Knowledge ID
</Property>
<Property name='batch' type='string' key='batch'>
Batch number of uploaded documents
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="GET"
label="/datasets/{dataset_id}/documents/{batch}/indexing-status"
targetCode={`curl --location --request GET '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{batch}/indexing-status' \\\n--header 'Authorization: Bearer {api_key}'`}
>
```bash {{ title: 'cURL' }}
curl --location --request GET '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{batch}/indexing-status' \
--header 'Authorization: Bearer {api_key}' \
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"data":[{
"id": "",
"indexing_status": "indexing",
"processing_started_at": 1681623462.0,
"parsing_completed_at": 1681623462.0,
"cleaning_completed_at": 1681623462.0,
"splitting_completed_at": 1681623462.0,
"completed_at": null,
"paused_at": null,
"error": null,
"stopped_at": null,
"completed_segments": 24,
"total_segments": 100
}]
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets/{dataset_id}/documents/{document_id}'
method='DELETE'
title='Delete document'
name='#delete_document'
/>
<Row>
<Col>
### Params
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
Knowledge ID
</Property>
<Property name='document_id' type='string' key='document_id'>
Document ID
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="DELETE"
label="/datasets/{dataset_id}/documents/{document_id}"
targetCode={`curl --location --request DELETE '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{document_id}' \\\n--header 'Authorization: Bearer {api_key}'`}
>
```bash {{ title: 'cURL' }}
curl --location --request DELETE '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{document_id}' \
--header 'Authorization: Bearer {api_key}' \
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"result": "success"
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets/{dataset_id}/documents'
method='GET'
title='Knowledge document list'
name='#dataset_document_list'
/>
<Row>
<Col>
### Params
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
Knowledge ID
</Property>
</Properties>
### Query
<Properties>
<Property name='keyword' type='string' key='keyword'>
Search keywords, currently only search document names(optional)
</Property>
<Property name='page' type='string' key='page'>
Page number(optional)
</Property>
<Property name='limit' type='string' key='limit'>
Number of items returned, default 20, range 1-100(optional)
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="GET"
label="/datasets/{dataset_id}/documents"
targetCode={`curl --location --request GET '${props.apiBaseUrl}/datasets/{dataset_id}/documents' \\\n--header 'Authorization: Bearer {api_key}'`}
>
```bash {{ title: 'cURL' }}
curl --location --request GET '${props.apiBaseUrl}/datasets/{dataset_id}/documents' \
--header 'Authorization: Bearer {api_key}' \
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"data": [
{
"id": "",
"position": 1,
"data_source_type": "file_upload",
"data_source_info": null,
"dataset_process_rule_id": null,
"name": "dify",
"created_from": "",
"created_by": "",
"created_at": 1681623639,
"tokens": 0,
"indexing_status": "waiting",
"error": null,
"enabled": true,
"disabled_at": null,
"disabled_by": null,
"archived": false
},
],
"has_more": false,
"limit": 20,
"total": 9,
"page": 1
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets/{dataset_id}/documents/{document_id}/segments'
method='POST'
title='Add segment'
name='#create_new_segment'
/>
<Row>
<Col>
### Params
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
Knowledge ID
</Property>
<Property name='document_id' type='string' key='document_id'>
Document ID
</Property>
</Properties>
### Request Body
<Properties>
<Property name='segments' type='object list' key='segments'>
- <code>content</code> (text) Text content/question content, required
- <code>answer</code> (text) Answer content, if the mode of the Knowledge is qa mode, pass the value(optional)
- <code>keywords</code> (list) Keywords(optional)
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="POST"
label="/datasets/{dataset_id}/documents/{document_id}/segments"
targetCode={`curl --location --request POST '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{document_id}/segments' \\\n--header 'Authorization: Bearer {api_key}' \\\n--header 'Content-Type: application/json' \\\n--data-raw '{"segments": [{"content": "1","answer": "1","keywords": ["a"]}]}'`}
>
```bash {{ title: 'cURL' }}
curl --location --request POST '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{document_id}/segments' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json' \
--data-raw '{
"segments": [
{
"content": "1",
"answer": "1",
"keywords": ["a"]
}
]
}'
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"data": [{
"id": "",
"position": 1,
"document_id": "",
"content": "1",
"answer": "1",
"word_count": 25,
"tokens": 0,
"keywords": [
"a"
],
"index_node_id": "",
"index_node_hash": "",
"hit_count": 0,
"enabled": true,
"disabled_at": null,
"disabled_by": null,
"status": "completed",
"created_by": "",
"created_at": 1695312007,
"indexing_at": 1695312007,
"completed_at": 1695312007,
"error": null,
"stopped_at": null
}],
"doc_form": "text_model"
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets/{dataset_id}/documents/{document_id}/segments'
method='GET'
title='get documents segments'
name='#get_segment'
/>
<Row>
<Col>
### Path
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
Knowledge ID
</Property>
<Property name='document_id' type='string' key='document_id'>
Document ID
</Property>
</Properties>
### Query
<Properties>
<Property name='keyword' type='string' key='keyword'>
keywordchoosable
</Property>
<Property name='status' type='string' key='status'>
Search statuscompleted
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="GET"
label="/datasets/{dataset_id}/documents/{document_id}/segments"
targetCode={`curl --location --request GET '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{document_id}/segments' \\\n--header 'Authorization: Bearer {api_key}' \\\n--header 'Content-Type: application/json'`}
>
```bash {{ title: 'cURL' }}
curl --location --request GET '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{document_id}/segments' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json'
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"data": [{
"id": "",
"position": 1,
"document_id": "",
"content": "1",
"answer": "1",
"word_count": 25,
"tokens": 0,
"keywords": [
"a"
],
"index_node_id": "",
"index_node_hash": "",
"hit_count": 0,
"enabled": true,
"disabled_at": null,
"disabled_by": null,
"status": "completed",
"created_by": "",
"created_at": 1695312007,
"indexing_at": 1695312007,
"completed_at": 1695312007,
"error": null,
"stopped_at": null
}],
"doc_form": "text_model"
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets/{dataset_id}/documents/{document_id}/segments/{segment_id}'
method='DELETE'
title='delete document segment'
name='#delete_segment'
/>
<Row>
<Col>
### Path
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
Knowledge ID
</Property>
<Property name='document_id' type='string' key='document_id'>
Document ID
</Property>
<Property name='segment_id' type='string' key='segment_id'>
Document Segment ID
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="DELETE"
label="/datasets/{dataset_id}/documents/{document_id}/segments/{segment_id}"
targetCode={`curl --location --request DELETE '${props.apiBaseUrl}/datasets/{dataset_id}/segments/{segment_id}' \\\n--header 'Authorization: Bearer {api_key}' \\\n--header 'Content-Type: application/json'`}
>
```bash {{ title: 'cURL' }}
curl --location --request DELETE '${props.apiBaseUrl}/datasets/{dataset_id}/segments/{segment_id}' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json'
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"result": "success"
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets/{dataset_id}/documents/{document_id}/segments/{segment_id}'
method='POST'
title='update document segment'
name='#update_segment'
/>
<Row>
<Col>
### POST
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
Knowledge ID
</Property>
<Property name='document_id' type='string' key='document_id'>
Document ID
</Property>
<Property name='segment_id' type='string' key='segment_id'>
Document Segment ID
</Property>
</Properties>
### Request Body
<Properties>
<Property name='segment' type='object' key='segment'>
- <code>content</code> (text) text content/question contentrequired
- <code>answer</code> (text) Answer content, not required, passed if the Knowledge is in qa mode
- <code>keywords</code> (list) keyword, not required
- <code>enabled</code> (bool) false/true, not required
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="POST"
label="/datasets/{dataset_id}/documents/{document_id}/segments/{segment_id}"
targetCode={`curl --location --request POST '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{document_id}/segments/{segment_id}' \\\n--header 'Authorization: Bearer {api_key}' \\\n--header 'Content-Type: application/json'\\\n--data-raw '{\"segment\": {\"content\": \"1\",\"answer\": \"1\", \"keywords\": [\"a\"], \"enabled\": false}}'`}
>
```bash {{ title: 'cURL' }}
curl --location --request POST '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{document_id}/segments/{segment_id}' \
--header 'Content-Type: application/json' \
--data-raw '{
"segment": {
"content": "1",
"answer": "1",
"keywords": ["a"],
"enabled": false
}
}'
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"data": [{
"id": "",
"position": 1,
"document_id": "",
"content": "1",
"answer": "1",
"word_count": 25,
"tokens": 0,
"keywords": [
"a"
],
"index_node_id": "",
"index_node_hash": "",
"hit_count": 0,
"enabled": true,
"disabled_at": null,
"disabled_by": null,
"status": "completed",
"created_by": "",
"created_at": 1695312007,
"indexing_at": 1695312007,
"completed_at": 1695312007,
"error": null,
"stopped_at": null
}],
"doc_form": "text_model"
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets/{dataset_id}/hit_testing'
method='POST'
title='Dataset hit testing'
name='#dataset_hit_testing'
/>
<Row>
<Col>
### Path
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID
</Property>
</Properties>
### Request Body
<Properties>
<Property name='query' type='string' key='query'>
retrieval keywordc
</Property>
<Property name='retrieval_model' type='object' key='retrieval_model'>
retrieval keyword(Optional, if not filled, it will be recalled according to the default method)
- <code>search_method</code> (text) Search method: One of the following four keywords is required
- <code>keyword_search</code> Keyword search
- <code>semantic_search</code> Semantic search
- <code>full_text_search</code> Full-text search
- <code>hybrid_search</code> Hybrid search
- <code>reranking_enable</code> (bool) Whether to enable reranking, optional, required if the search mode is semantic_search or hybrid_search
- <code>reranking_mode</code> (object) Rerank model configuration, optional, required if reranking is enabled
- <code>reranking_provider_name</code> (string) Rerank model provider
- <code>reranking_model_name</code> (string) Rerank model name
- <code>weights</code> (double) Semantic search weight setting in hybrid search mode
- <code>top_k</code> (integer) Number of results to return, optional
- <code>score_threshold_enabled</code> (bool) Whether to enable score threshold
- <code>score_threshold</code> (double) Score threshold
</Property>
<Property name='external_retrieval_model' type='object' key='external_retrieval_model'>
Unused field
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="POST"
label="/datasets/{dataset_id}/hit_testing"
targetCode={`curl --location --request GET '${props.apiBaseUrl}/datasets/{dataset_id}/hit_testing' \\\n--header 'Authorization: Bearer {api_key}'\\\n--header 'Content-Type: application/json'\\\n--data-raw '{
"query": "test",
"retrieval_model": {
"search_method": "keyword_search",
"reranking_enable": false,
"reranking_mode": null,
"reranking_model": {
"reranking_provider_name": "",
"reranking_model_name": ""
},
"weights": null,
"top_k": 1,
"score_threshold_enabled": false,
"score_threshold": null
}
}'`}
>
```bash {{ title: 'cURL' }}
curl --location --request POST '${props.apiBaseUrl}/datasets/{dataset_id}/hit_testing' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": "test",
"retrieval_model": {
"search_method": "keyword_search",
"reranking_enable": false,
"reranking_mode": null,
"reranking_model": {
"reranking_provider_name": "",
"reranking_model_name": ""
},
"weights": null,
"top_k": 2,
"score_threshold_enabled": false,
"score_threshold": null
}
}'
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"query": {
"content": "test"
},
"records": [
{
"segment": {
"id": "7fa6f24f-8679-48b3-bc9d-bdf28d73f218",
"position": 1,
"document_id": "a8c6c36f-9f5d-4d7a-8472-f5d7b75d71d2",
"content": "Operation guide",
"answer": null,
"word_count": 847,
"tokens": 280,
"keywords": [
"install",
"java",
"base",
"scripts",
"jdk",
"manual",
"internal",
"opens",
"add",
"vmoptions"
],
"index_node_id": "39dd8443-d960-45a8-bb46-7275ad7fbc8e",
"index_node_hash": "0189157697b3c6a418ccf8264a09699f25858975578f3467c76d6bfc94df1d73",
"hit_count": 0,
"enabled": true,
"disabled_at": null,
"disabled_by": null,
"status": "completed",
"created_by": "dbcb1ab5-90c8-41a7-8b78-73b235eb6f6f",
"created_at": 1728734540,
"indexing_at": 1728734552,
"completed_at": 1728734584,
"error": null,
"stopped_at": null,
"document": {
"id": "a8c6c36f-9f5d-4d7a-8472-f5d7b75d71d2",
"data_source_type": "upload_file",
"name": "readme.txt",
"doc_type": null
}
},
"score": 3.730463140527718e-05,
"tsne_position": null
}
]
}
```
</CodeGroup>
</Col>
</Row>
---
<Row>
<Col>
### Error message
<Properties>
<Property name='code' type='string' key='code'>
Error code
</Property>
</Properties>
<Properties>
<Property name='status' type='number' key='status'>
Error status
</Property>
</Properties>
<Properties>
<Property name='message' type='string' key='message'>
Error message
</Property>
</Properties>
</Col>
<Col>
<CodeGroup title="Example">
```json {{ title: 'Response' }}
{
"code": "no_file_uploaded",
"message": "Please upload your file.",
"status": 400
}
```
</CodeGroup>
</Col>
</Row>
<table className="max-w-auto border-collapse border border-slate-400" style={{ maxWidth: 'none', width: 'auto' }}>
<thead style={{ background: '#f9fafc' }}>
<tr>
<th className="p-2 border border-slate-300">code</th>
<th className="p-2 border border-slate-300">status</th>
<th className="p-2 border border-slate-300">message</th>
</tr>
</thead>
<tbody>
<tr>
<td className="p-2 border border-slate-300">no_file_uploaded</td>
<td className="p-2 border border-slate-300">400</td>
<td className="p-2 border border-slate-300">Please upload your file.</td>
</tr>
<tr>
<td className="p-2 border border-slate-300">too_many_files</td>
<td className="p-2 border border-slate-300">400</td>
<td className="p-2 border border-slate-300">Only one file is allowed.</td>
</tr>
<tr>
<td className="p-2 border border-slate-300">file_too_large</td>
<td className="p-2 border border-slate-300">413</td>
<td className="p-2 border border-slate-300">File size exceeded.</td>
</tr>
<tr>
<td className="p-2 border border-slate-300">unsupported_file_type</td>
<td className="p-2 border border-slate-300">415</td>
<td className="p-2 border border-slate-300">File type not allowed.</td>
</tr>
<tr>
<td className="p-2 border border-slate-300">high_quality_dataset_only</td>
<td className="p-2 border border-slate-300">400</td>
<td className="p-2 border border-slate-300">Current operation only supports 'high-quality' datasets.</td>
</tr>
<tr>
<td className="p-2 border border-slate-300">dataset_not_initialized</td>
<td className="p-2 border border-slate-300">400</td>
<td className="p-2 border border-slate-300">The dataset is still being initialized or indexing. Please wait a moment.</td>
</tr>
<tr>
<td className="p-2 border border-slate-300">archived_document_immutable</td>
<td className="p-2 border border-slate-300">403</td>
<td className="p-2 border border-slate-300">The archived document is not editable.</td>
</tr>
<tr>
<td className="p-2 border border-slate-300">dataset_name_duplicate</td>
<td className="p-2 border border-slate-300">409</td>
<td className="p-2 border border-slate-300">The dataset name already exists. Please modify your dataset name.</td>
</tr>
<tr>
<td className="p-2 border border-slate-300">invalid_action</td>
<td className="p-2 border border-slate-300">400</td>
<td className="p-2 border border-slate-300">Invalid action.</td>
</tr>
<tr>
<td className="p-2 border border-slate-300">document_already_finished</td>
<td className="p-2 border border-slate-300">400</td>
<td className="p-2 border border-slate-300">The document has been processed. Please refresh the page or go to the document details.</td>
</tr>
<tr>
<td className="p-2 border border-slate-300">document_indexing</td>
<td className="p-2 border border-slate-300">400</td>
<td className="p-2 border border-slate-300">The document is being processed and cannot be edited.</td>
</tr>
<tr>
<td className="p-2 border border-slate-300">invalid_metadata</td>
<td className="p-2 border border-slate-300">400</td>
<td className="p-2 border border-slate-300">The metadata content is incorrect. Please check and verify.</td>
</tr>
</tbody>
</table>
<div className="pb-4" />