From f4cb939317544308ada0949ff3ed5abc040982db Mon Sep 17 00:00:00 2001 From: writinwaters <93570324+writinwaters@users.noreply.github.com> Date: Tue, 29 Oct 2024 19:56:46 +0800 Subject: [PATCH] Updated HTTP API reference and Python API reference based on test results (#3090) ### What problem does this PR solve? ### Type of change - [x] Documentation Update --- api/http_api_reference.md | 16 +++++++++------- api/python_api_reference.md | 9 ++++----- 2 files changed, 13 insertions(+), 12 deletions(-) diff --git a/api/http_api_reference.md b/api/http_api_reference.md index 57a2d3743..156b8c230 100644 --- a/api/http_api_reference.md +++ b/api/http_api_reference.md @@ -94,8 +94,10 @@ curl --request POST \ The configuration settings for the dataset parser, a JSON object containing the following attributes: - `"chunk_token_count"`: Defaults to `128`. - `"layout_recognize"`: Defaults to `true`. + - `"html4excel"`: Indicates whether to convert Excel documents into HTML format. Defaults to `false`. - `"delimiter"`: Defaults to `"\n!?。;!?"`. - - `"task_page_size"`: Defaults to `12`. + - `"task_page_size"`: Defaults to `12`. For PDF only. + - `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`. ### Response @@ -177,7 +179,7 @@ curl --request DELETE \ #### Request parameters -- `"ids"`: (*Body parameter*), `list[string]` +- `"ids"`: (*Body parameter*), `list[string]` The IDs of the datasets to delete. If it is not specified, all datasets will be deleted. ### Response @@ -241,7 +243,7 @@ curl --request PUT \ - `"embedding_model"`: (*Body parameter*), `string` The updated embedding model name. - Ensure that `"chunk_count"` is `0` before updating `"embedding_model"`. -- `"chunk_method"`: (*Body parameter*), `enum` +- `"chunk_method"`: (*Body parameter*), `enum` The chunking method for the dataset. Available options: - `"naive"`: General - `"manual`: Manual @@ -510,12 +512,12 @@ curl --request PUT \ - `"one"`: One - `"knowledge_graph"`: Knowledge Graph - `"email"`: Email -- `"parser_config"`: (*Body parameter*), `object` +- `"parser_config"`: (*Body parameter*), `object` The parsing configuration for the document: - `"chunk_token_count"`: Defaults to `128`. - `"layout_recognize"`: Defaults to `true`. - `"delimiter"`: Defaults to `"\n!?。;!?"`. - - `"task_page_size"`: Defaults to `12`. + - `"task_page_size"`: Defaults to `12`. For PDF only. ### Response @@ -718,7 +720,7 @@ curl --request DELETE \ - `dataset_id`: (*Path parameter*) The associated dataset ID. -- `"ids"`: (*Body parameter*), `list[string]` +- `"ids"`: (*Body parameter*), `list[string]` The IDs of the documents to delete. If it is not specified, all documents in the specified dataset will be deleted. ### Response @@ -1169,7 +1171,7 @@ Failure: ## Retrieve chunks -**GET** `/api/v1/retrieval` +**POST** `/api/v1/retrieval` Retrieves chunks from specified datasets. diff --git a/api/python_api_reference.md b/api/python_api_reference.md index 12c0573e7..bc6f4d306 100644 --- a/api/python_api_reference.md +++ b/api/python_api_reference.md @@ -1253,7 +1253,7 @@ Asks a question to start an AI-powered conversation. #### question: `str` *Required* -The question to start an AI chat. +The question to start an AI-powered conversation. #### stream: `bool` @@ -1286,7 +1286,7 @@ A list of `Chunk` objects representing references to the message, each containin - `content` `str` The content of the chunk. - `image_id` `str` - The ID of the snapshot of the chunk. + The ID of the snapshot of the chunk. Applicable only when the source of the chunk is an image, PPT, PPTX, or PDF file. - `document_id` `str` The ID of the referenced document. - `document_name` `str` @@ -1295,14 +1295,13 @@ A list of `Chunk` objects representing references to the message, each containin The location information of the chunk within the referenced document. - `dataset_id` `str` The ID of the dataset to which the referenced document belongs. -- `similarity` `float` - A composite similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity. +- `similarity` `float` + A composite similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity. It is the weighted sum of `vector_similarity` and `term_similarity`. - `vector_similarity` `float` A vector similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity between vector embeddings. - `term_similarity` `float` A keyword similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity between keywords. - ### Examples ```python