
### What problem does this PR solve? This PR adds the support for latest OpenSearch2.19.1 as the store engine & search engine option for RAGFlow. ### Main Benefit 1. OpenSearch2.19.1 is licensed under the [Apache v2.0 License] which is much better than Elasticsearch 2. For search, OpenSearch2.19.1 supports full-text search、vector_search、hybrid_search those are similar with Elasticsearch on schema 3. For store, OpenSearch2.19.1 stores text、vector those are quite simliar with Elasticsearch on schema ### Changes - Support opensearch_python_connetor. I make a lot of adaptions since the schema and api/method between ES and Opensearch differs in many ways(especially the knn_search has a significant gap) : rag/utils/opensearch_coon.py - Support static config adaptions by changing: conf/service_conf.yaml、api/settings.py、rag/settings.py - Supprt some store&search schema changes between OpenSearch and ES: conf/os_mapping.json - Support OpenSearch python sdk : pyproject.toml - Support docker config for OpenSearch2.19.1 : docker/.env、docker/docker-compose-base.yml、docker/service_conf.yaml.template ### How to use - I didn't change the priority that ES as the default doc/search engine. Only if in docker/.env , we set DOC_ENGINE=${DOC_ENGINE:-opensearch}, it will work. ### Others Our team tested a lot of docs in our environment by using OpenSearch as the vector database ,it works very well. All the conifg for OpenSearch is necessary. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Yongteng Lei <yongtengrey@outlook.com> Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com> Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
README
📗 Table of Contents
🐳 Docker Compose
- docker-compose.yml
Sets up environment for RAGFlow and its dependencies. - docker-compose-base.yml
Sets up environment for RAGFlow's dependencies: Elasticsearch/Infinity, MySQL, MinIO, and Redis.
Caution
We do not actively maintain docker-compose-CN-oc9.yml, docker-compose-gpu-CN-oc9.yml, or docker-compose-gpu.yml, so use them at your own risk. However, you are welcome to file a pull request to improve any of them.
🐬 Docker environment variables
The .env file contains important environment variables for Docker.
Elasticsearch
STACK_VERSION
The version of Elasticsearch. Defaults to8.11.3
ES_PORT
The port used to expose the Elasticsearch service to the host machine, allowing external access to the service running inside the Docker container. Defaults to1200
.ELASTIC_PASSWORD
The password for Elasticsearch.
Kibana
KIBANA_PORT
The port used to expose the Kibana service to the host machine, allowing external access to the service running inside the Docker container. Defaults to6601
.KIBANA_USER
The username for Kibana. Defaults torag_flow
.KIBANA_PASSWORD
The password for Kibana. Defaults toinfini_rag_flow
.
Resource management
MEM_LIMIT
The maximum amount of the memory, in bytes, that a specific Docker container can use while running. Defaults to8073741824
.
MySQL
MYSQL_PASSWORD
The password for MySQL.MYSQL_PORT
The port used to expose the MySQL service to the host machine, allowing external access to the MySQL database running inside the Docker container. Defaults to5455
.
MinIO
MINIO_CONSOLE_PORT
The port used to expose the MinIO console interface to the host machine, allowing external access to the web-based console running inside the Docker container. Defaults to9001
MINIO_PORT
The port used to expose the MinIO API service to the host machine, allowing external access to the MinIO object storage service running inside the Docker container. Defaults to9000
.MINIO_USER
The username for MinIO.MINIO_PASSWORD
The password for MinIO.
Redis
REDIS_PORT
The port used to expose the Redis service to the host machine, allowing external access to the Redis service running inside the Docker container. Defaults to6379
.REDIS_PASSWORD
The password for Redis.
RAGFlow
-
SVR_HTTP_PORT
The port used to expose RAGFlow's HTTP API service to the host machine, allowing external access to the service running inside the Docker container. Defaults to9380
. -
RAGFLOW-IMAGE
The Docker image edition. Available editions:infiniflow/ragflow:v0.18.0-slim
(default): The RAGFlow Docker image without embedding models.infiniflow/ragflow:v0.18.0
: The RAGFlow Docker image with embedding models including:- Built-in embedding models:
BAAI/bge-large-zh-v1.5
maidalun1020/bce-embedding-base_v1
- Built-in embedding models:
Tip
If you cannot download the RAGFlow Docker image, try the following mirrors.
- For the
nightly-slim
edition:
RAGFLOW_IMAGE=swr.cn-north-4.myhuaweicloud.com/infiniflow/ragflow:nightly-slim
or,RAGFLOW_IMAGE=registry.cn-hangzhou.aliyuncs.com/infiniflow/ragflow:nightly-slim
.- For the
nightly
edition:
RAGFLOW_IMAGE=swr.cn-north-4.myhuaweicloud.com/infiniflow/ragflow:nightly
or,RAGFLOW_IMAGE=registry.cn-hangzhou.aliyuncs.com/infiniflow/ragflow:nightly
.
Timezone
TIMEZONE
The local time zone. Defaults to'Asia/Shanghai'
.
Hugging Face mirror site
HF_ENDPOINT
The mirror site for huggingface.co. It is disabled by default. You can uncomment this line if you have limited access to the primary Hugging Face domain.
MacOS
MACOS
Optimizations for macOS. It is disabled by default. You can uncomment this line if your OS is macOS.
Maximum file size
MAX_CONTENT_LENGTH
The maximum file size for each uploaded file, in bytes. You can uncomment this line if you wish to change the 128M file size limit. After making the change, ensure you updateclient_max_body_size
in nginx/nginx.conf correspondingly.
🐋 Service configuration
service_conf.yaml specifies the system-level configuration for RAGFlow and is used by its API server and task executor. In a dockerized setup, this file is automatically created based on the service_conf.yaml.template file (replacing all environment variables by their values).
-
ragflow
host
: The API server's IP address inside the Docker container. Defaults to0.0.0.0
.port
: The API server's serving port inside the Docker container. Defaults to9380
.
-
mysql
name
: The MySQL database name. Defaults torag_flow
.user
: The username for MySQL.password
: The password for MySQL.port
: The MySQL serving port inside the Docker container. Defaults to3306
.max_connections
: The maximum number of concurrent connections to the MySQL database. Defaults to100
.stale_timeout
: Timeout in seconds.
-
minio
user
: The username for MinIO.password
: The password for MinIO.host
: The MinIO serving IP and port inside the Docker container. Defaults tominio:9000
.
-
oss
access_key
: The access key ID used to authenticate requests to the OSS service.secret_key
: The secret access key used to authenticate requests to the OSS service.endpoint_url
: The URL of the OSS service endpoint.region
: The OSS region where the bucket is located.bucket
: The name of the OSS bucket where files will be stored. When you want to store all files in a specified bucket, you need this configuration item.prefix_path
: Optional. A prefix path to prepend to file names in the OSS bucket, which can help organize files within the bucket.
-
s3
:access_key
: The access key ID used to authenticate requests to the S3 service.secret_key
: The secret access key used to authenticate requests to the S3 service.endpoint_url
: The URL of the S3-compatible service endpoint. This is necessary when using an S3-compatible protocol instead of the default AWS S3 endpoint.bucket
: The name of the S3 bucket where files will be stored. When you want to store all files in a specified bucket, you need this configuration item.region
: The AWS region where the S3 bucket is located. This is important for directing requests to the correct data center.signature_version
: Optional. The version of the signature to use for authenticating requests. Common versions includev4
.addressing_style
: Optional. The style of addressing to use for the S3 endpoint. This can bepath
orvirtual
.prefix_path
: Optional. A prefix path to prepend to file names in the S3 bucket, which can help organize files within the bucket.
-
oauth
The OAuth configuration for signing up or signing in to RAGFlow using a third-party account. It is disabled by default. To enable this feature, uncomment the corresponding lines in service_conf.yaml.template.github
: The GitHub authentication settings for your application. Visit the Github Developer Settings page to obtain your client_id and secret_key.
-
user_default_llm
The default LLM to use for a new RAGFlow user. It is disabled by default. To enable this feature, uncomment the corresponding lines in service_conf.yaml.template.factory
: The LLM supplier. Available options:"OpenAI"
"DeepSeek"
"Moonshot"
"Tongyi-Qianwen"
"VolcEngine"
"ZHIPU-AI"
api_key
: The API key for the specified LLM. You will need to apply for your model API key online.
Tip
If you do not set the default LLM here, configure the default LLM on the Settings page in the RAGFlow UI.