Merge branch 'main' into feat/llm-struct-output

This commit is contained in:
Joel 2025-03-31 14:43:07 +08:00
commit 1193ab12fc
251 changed files with 6179 additions and 2319 deletions

View File

@ -6,8 +6,8 @@ on:
- "main" - "main"
- "deploy/dev" - "deploy/dev"
- "deploy/enterprise" - "deploy/enterprise"
release: tags:
types: [published] - "*"
concurrency: concurrency:
group: build-push-${{ github.head_ref || github.run_id }} group: build-push-${{ github.head_ref || github.run_id }}

View File

@ -76,7 +76,6 @@ jobs:
milvus-standalone milvus-standalone
pgvecto-rs pgvecto-rs
pgvector pgvector
opengauss
chroma chroma
elasticsearch elasticsearch

View File

@ -31,25 +31,24 @@ jobs:
uses: tj-actions/changed-files@v45 uses: tj-actions/changed-files@v45
with: with:
files: web/** files: web/**
# to run pnpm, should install package canvas, but it always install failed on amd64 under ubuntu-latest - name: Install pnpm
# - name: Install pnpm uses: pnpm/action-setup@v4
# uses: pnpm/action-setup@v4 with:
# with: version: 10
# version: 10 run_install: false
# run_install: false
# - name: Setup Node.js - name: Setup Node.js
# uses: actions/setup-node@v4 uses: actions/setup-node@v4
# if: steps.changed-files.outputs.any_changed == 'true' if: steps.changed-files.outputs.any_changed == 'true'
# with: with:
# node-version: 20 node-version: 20
# cache: pnpm cache: pnpm
# cache-dependency-path: ./web/package.json cache-dependency-path: ./web/package.json
# - name: Install dependencies - name: Install dependencies
# if: steps.changed-files.outputs.any_changed == 'true' if: steps.changed-files.outputs.any_changed == 'true'
# run: pnpm install --frozen-lockfile run: pnpm install --frozen-lockfile
# - name: Run tests - name: Run tests
# if: steps.changed-files.outputs.any_changed == 'true' if: steps.changed-files.outputs.any_changed == 'true'
# run: pnpm test run: pnpm test

1
.gitignore vendored
View File

@ -103,6 +103,7 @@ celerybeat.pid
# Environments # Environments
.env .env
.env-local
.venv .venv
env/ env/
venv/ venv/

View File

@ -18,7 +18,7 @@ Need to update an existing model runtime, tool, or squash some bugs? Head over t
Join the fun, contribute, and let's build something awesome together! 💡✨ Join the fun, contribute, and let's build something awesome together! 💡✨
Don't forget to link an existing issue or open an new issue in the PR's description. Don't forget to link an existing issue or open a new issue in the PR's description.
### Bug reports ### Bug reports
@ -68,7 +68,7 @@ How we prioritize:
4. Please add tests for your changes accordingly 4. Please add tests for your changes accordingly
5. Ensure your code passes the existing tests 5. Ensure your code passes the existing tests
6. Please link the issue in the PR description, `fixes #<issue_number>` 6. Please link the issue in the PR description, `fixes #<issue_number>`
7. Get merrged! 7. Get merged!
### Setup the project ### Setup the project
#### Frontend #### Frontend
@ -90,4 +90,4 @@ We recommend reviewing this document carefully before proceeding with the setup,
Feel free to reach out if you encounter any issues during the setup process. Feel free to reach out if you encounter any issues during the setup process.
## Getting Help ## Getting Help
If you ever get stuck or got a burning question while contributing, simply shoot your queries our way via the related GitHub issue, or hop onto our [Discord](https://discord.gg/8Tpq4AcN9c) for a quick chat. If you ever get stuck or get a burning question while contributing, simply shoot your queries our way via the related GitHub issue, or hop onto our [Discord](https://discord.gg/8Tpq4AcN9c) for a quick chat.

93
CONTRIBUTING_ES.md Normal file
View File

@ -0,0 +1,93 @@
# CONTRIBUIR
Así que estás buscando contribuir a Dify - eso es fantástico, estamos ansiosos por ver lo que haces. Como una startup con personal y financiación limitados, tenemos grandes ambiciones de diseñar el flujo de trabajo más intuitivo para construir y gestionar aplicaciones LLM. Cualquier ayuda de la comunidad cuenta, realmente.
Necesitamos ser ágiles y enviar rápidamente dado donde estamos, pero también queremos asegurarnos de que colaboradores como tú obtengan una experiencia lo más fluida posible al contribuir. Hemos elaborado esta guía de contribución con ese propósito, con el objetivo de familiarizarte con la base de código y cómo trabajamos con los colaboradores, para que puedas pasar rápidamente a la parte divertida.
Esta guía, como Dify mismo, es un trabajo en constante progreso. Agradecemos mucho tu comprensión si a veces se queda atrás del proyecto real, y damos la bienvenida a cualquier comentario para que podamos mejorar.
En términos de licencia, por favor tómate un minuto para leer nuestro breve [Acuerdo de Licencia y Colaborador](./LICENSE). La comunidad también se adhiere al [código de conducta](https://github.com/langgenius/.github/blob/main/CODE_OF_CONDUCT.md).
## Antes de empezar
¿Buscas algo en lo que trabajar? Explora nuestros [buenos primeros issues](https://github.com/langgenius/dify/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22good%20first%20issue%22) y elige uno para comenzar.
¿Tienes un nuevo modelo o herramienta genial para añadir? Abre un PR en nuestro [repositorio de plugins](https://github.com/langgenius/dify-plugins) y muéstranos lo que has construido.
¿Necesitas actualizar un modelo existente, herramienta o corregir algunos errores? Dirígete a nuestro [repositorio oficial de plugins](https://github.com/langgenius/dify-official-plugins) y haz tu magia.
¡Únete a la diversión, contribuye y construyamos algo increíble juntos! 💡✨
No olvides vincular un issue existente o abrir uno nuevo en la descripción del PR.
### Informes de errores
> [!IMPORTANT]
> Por favor, asegúrate de incluir la siguiente información al enviar un informe de error:
- Un título claro y descriptivo
- Una descripción detallada del error, incluyendo cualquier mensaje de error
- Pasos para reproducir el error
- Comportamiento esperado
- **Logs**, si están disponibles, para problemas del backend, esto es realmente importante, puedes encontrarlos en los logs de docker-compose
- Capturas de pantalla o videos, si es aplicable
Cómo priorizamos:
| Tipo de Issue | Prioridad |
| ------------------------------------------------------------ | --------------- |
| Errores en funciones principales (servicio en la nube, no poder iniciar sesión, aplicaciones que no funcionan, fallos de seguridad) | Crítica |
| Errores no críticos, mejoras de rendimiento | Prioridad Media |
| Correcciones menores (errores tipográficos, UI confusa pero funcional) | Prioridad Baja |
### Solicitudes de funcionalidades
> [!NOTE]
> Por favor, asegúrate de incluir la siguiente información al enviar una solicitud de funcionalidad:
- Un título claro y descriptivo
- Una descripción detallada de la funcionalidad
- Un caso de uso para la funcionalidad
- Cualquier otro contexto o capturas de pantalla sobre la solicitud de funcionalidad
Cómo priorizamos:
| Tipo de Funcionalidad | Prioridad |
| ------------------------------------------------------------ | --------------- |
| Funcionalidades de alta prioridad etiquetadas por un miembro del equipo | Prioridad Alta |
| Solicitudes populares de funcionalidades de nuestro [tablero de comentarios de la comunidad](https://github.com/langgenius/dify/discussions/categories/feedbacks) | Prioridad Media |
| Funcionalidades no principales y mejoras menores | Prioridad Baja |
| Valiosas pero no inmediatas | Futura-Funcionalidad |
## Enviando tu PR
### Proceso de Pull Request
1. Haz un fork del repositorio
2. Antes de redactar un PR, por favor crea un issue para discutir los cambios que quieres hacer
3. Crea una nueva rama para tus cambios
4. Por favor añade pruebas para tus cambios en consecuencia
5. Asegúrate de que tu código pasa las pruebas existentes
6. Por favor vincula el issue en la descripción del PR, `fixes #<número_del_issue>`
7. ¡Fusiona tu código!
### Configuración del proyecto
#### Frontend
Para configurar el servicio frontend, por favor consulta nuestra [guía completa](https://github.com/langgenius/dify/blob/main/web/README.md) en el archivo `web/README.md`. Este documento proporciona instrucciones detalladas para ayudarte a configurar el entorno frontend correctamente.
#### Backend
Para configurar el servicio backend, por favor consulta nuestras [instrucciones detalladas](https://github.com/langgenius/dify/blob/main/api/README.md) en el archivo `api/README.md`. Este documento contiene una guía paso a paso para ayudarte a poner en marcha el backend sin problemas.
#### Otras cosas a tener en cuenta
Recomendamos revisar este documento cuidadosamente antes de proceder con la configuración, ya que contiene información esencial sobre:
- Requisitos previos y dependencias
- Pasos de instalación
- Detalles de configuración
- Consejos comunes de solución de problemas
No dudes en contactarnos si encuentras algún problema durante el proceso de configuración.
## Obteniendo Ayuda
Si alguna vez te quedas atascado o tienes una pregunta urgente mientras contribuyes, simplemente envíanos tus consultas a través del issue relacionado de GitHub, o únete a nuestro [Discord](https://discord.gg/8Tpq4AcN9c) para una charla rápida.

93
CONTRIBUTING_FR.md Normal file
View File

@ -0,0 +1,93 @@
# CONTRIBUER
Vous cherchez donc à contribuer à Dify - c'est fantastique, nous avons hâte de voir ce que vous allez faire. En tant que startup avec un personnel et un financement limités, nous avons de grandes ambitions pour concevoir le flux de travail le plus intuitif pour construire et gérer des applications LLM. Toute aide de la communauté compte, vraiment.
Nous devons être agiles et livrer rapidement compte tenu de notre position, mais nous voulons aussi nous assurer que des contributeurs comme vous obtiennent une expérience aussi fluide que possible lors de leur contribution. Nous avons élaboré ce guide de contribution dans ce but, visant à vous familiariser avec la base de code et comment nous travaillons avec les contributeurs, afin que vous puissiez rapidement passer à la partie amusante.
Ce guide, comme Dify lui-même, est un travail en constante évolution. Nous apprécions grandement votre compréhension si parfois il est en retard par rapport au projet réel, et nous accueillons tout commentaire pour nous aider à nous améliorer.
En termes de licence, veuillez prendre une minute pour lire notre bref [Accord de Licence et de Contributeur](./LICENSE). La communauté adhère également au [code de conduite](https://github.com/langgenius/.github/blob/main/CODE_OF_CONDUCT.md).
## Avant de vous lancer
Vous cherchez quelque chose à réaliser ? Parcourez nos [problèmes pour débutants](https://github.com/langgenius/dify/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22good%20first%20issue%22) et choisissez-en un pour commencer !
Vous avez un nouveau modèle ou un nouvel outil à ajouter ? Ouvrez une PR dans notre [dépôt de plugins](https://github.com/langgenius/dify-plugins) et montrez-nous ce que vous avez créé.
Vous devez mettre à jour un modèle existant, un outil ou corriger des bugs ? Rendez-vous sur notre [dépôt officiel de plugins](https://github.com/langgenius/dify-official-plugins) et faites votre magie !
Rejoignez l'aventure, contribuez, et construisons ensemble quelque chose d'extraordinaire ! 💡✨
N'oubliez pas de lier un problème existant ou d'ouvrir un nouveau problème dans la description de votre PR.
### Rapports de bugs
> [!IMPORTANT]
> Veuillez vous assurer d'inclure les informations suivantes lors de la soumission d'un rapport de bug :
- Un titre clair et descriptif
- Une description détaillée du bug, y compris tous les messages d'erreur
- Les étapes pour reproduire le bug
- Comportement attendu
- **Logs**, si disponibles, pour les problèmes de backend, c'est vraiment important, vous pouvez les trouver dans les logs de docker-compose
- Captures d'écran ou vidéos, si applicable
Comment nous priorisons :
| Type de Problème | Priorité |
| ------------------------------------------------------------ | --------------- |
| Bugs dans les fonctions principales (service cloud, impossibilité de se connecter, applications qui ne fonctionnent pas, failles de sécurité) | Critique |
| Bugs non critiques, améliorations de performance | Priorité Moyenne |
| Corrections mineures (fautes de frappe, UI confuse mais fonctionnelle) | Priorité Basse |
### Demandes de fonctionnalités
> [!NOTE]
> Veuillez vous assurer d'inclure les informations suivantes lors de la soumission d'une demande de fonctionnalité :
- Un titre clair et descriptif
- Une description détaillée de la fonctionnalité
- Un cas d'utilisation pour la fonctionnalité
- Tout autre contexte ou captures d'écran concernant la demande de fonctionnalité
Comment nous priorisons :
| Type de Fonctionnalité | Priorité |
| ------------------------------------------------------------ | --------------- |
| Fonctionnalités hautement prioritaires étiquetées par un membre de l'équipe | Priorité Haute |
| Demandes populaires de fonctionnalités de notre [tableau de feedback communautaire](https://github.com/langgenius/dify/discussions/categories/feedbacks) | Priorité Moyenne |
| Fonctionnalités non essentielles et améliorations mineures | Priorité Basse |
| Précieuses mais non immédiates | Fonctionnalité Future |
## Soumettre votre PR
### Processus de Pull Request
1. Forkez le dépôt
2. Avant de rédiger une PR, veuillez créer un problème pour discuter des changements que vous souhaitez apporter
3. Créez une nouvelle branche pour vos changements
4. Veuillez ajouter des tests pour vos changements en conséquence
5. Assurez-vous que votre code passe les tests existants
6. Veuillez lier le problème dans la description de la PR, `fixes #<numéro_du_problème>`
7. Faites fusionner votre code !
### Configuration du projet
#### Frontend
Pour configurer le service frontend, veuillez consulter notre [guide complet](https://github.com/langgenius/dify/blob/main/web/README.md) dans le fichier `web/README.md`. Ce document fournit des instructions détaillées pour vous aider à configurer correctement l'environnement frontend.
#### Backend
Pour configurer le service backend, veuillez consulter nos [instructions détaillées](https://github.com/langgenius/dify/blob/main/api/README.md) dans le fichier `api/README.md`. Ce document contient un guide étape par étape pour vous aider à faire fonctionner le backend sans problème.
#### Autres choses à noter
Nous recommandons de revoir attentivement ce document avant de procéder à la configuration, car il contient des informations essentielles sur :
- Prérequis et dépendances
- Étapes d'installation
- Détails de configuration
- Conseils courants de dépannage
N'hésitez pas à nous contacter si vous rencontrez des problèmes pendant le processus de configuration.
## Obtenir de l'aide
Si jamais vous êtes bloqué ou avez une question urgente en contribuant, envoyez-nous simplement vos questions via le problème GitHub concerné, ou rejoignez notre [Discord](https://discord.gg/8Tpq4AcN9c) pour une discussion rapide.

93
CONTRIBUTING_KR.md Normal file
View File

@ -0,0 +1,93 @@
# 기여하기
Dify에 기여하려고 하시는군요 - 정말 멋집니다, 당신이 무엇을 할지 기대가 됩니다. 인력과 자금이 제한된 스타트업으로서, 우리는 LLM 애플리케이션을 구축하고 관리하기 위한 가장 직관적인 워크플로우를 설계하고자 하는 큰 야망을 가지고 있습니다. 커뮤니티의 모든 도움은 정말 중요합니다.
우리는 현재 상황에서 민첩하게 빠르게 배포해야 하지만, 동시에 당신과 같은 기여자들이 기여하는 과정에서 최대한 원활한 경험을 얻을 수 있도록 하고 싶습니다. 우리는 이러한 목적으로 이 기여 가이드를 작성했으며, 여러분이 코드베이스와 우리가 기여자들과 어떻게 협업하는지에 대해 친숙해질 수 있도록 돕고, 빠르게 재미있는 부분으로 넘어갈 수 있도록 하고자 합니다.
이 가이드는 Dify 자체와 마찬가지로 끊임없이 진행 중인 작업입니다. 때로는 실제 프로젝트보다 뒤처질 수 있다는 점을 이해해 주시면 감사하겠으며, 개선을 위한 피드백은 언제든지 환영합니다.
라이센스 측면에서, 간략한 [라이센스 및 기여자 동의서](./LICENSE)를 읽어보는 시간을 가져주세요. 커뮤니티는 또한 [행동 강령](https://github.com/langgenius/.github/blob/main/CODE_OF_CONDUCT.md)을 준수합니다.
## 시작하기 전에
처리할 작업을 찾고 계신가요? [초보자를 위한 이슈](https://github.com/langgenius/dify/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22good%20first%20issue%22)를 살펴보고 시작할 것을 선택하세요!
추가할 새로운 모델 런타임이나 도구가 있나요? 우리의 [플러그인 저장소](https://github.com/langgenius/dify-plugins)에 PR을 열고 당신이 만든 것을 보여주세요.
기존 모델 런타임, 도구를 업데이트하거나 버그를 수정해야 하나요? 우리의 [공식 플러그인 저장소](https://github.com/langgenius/dify-official-plugins)로 가서 당신의 마법을 펼치세요!
함께 즐기고, 기여하고, 멋진 것을 함께 만들어 봅시다! 💡✨
PR 설명에 기존 이슈를 연결하거나 새 이슈를 여는 것을 잊지 마세요.
### 버그 보고
> [!IMPORTANT]
> 버그 보고서를 제출할 때 다음 정보를 포함해 주세요:
- 명확하고 설명적인 제목
- 오류 메시지를 포함한 버그에 대한 상세한 설명
- 버그를 재현하는 단계
- 예상되는 동작
- 가능한 경우 **로그**, 백엔드 이슈의 경우 매우 중요합니다. docker-compose 로그에서 찾을 수 있습니다
- 해당되는 경우 스크린샷 또는 비디오
우선순위 결정 방법:
| 이슈 유형 | 우선순위 |
| ------------------------------------------------------------ | --------------- |
| 핵심 기능의 버그(클라우드 서비스, 로그인 불가, 애플리케이션 작동 불능, 보안 취약점) | 중대 |
| 비중요 버그, 성능 향상 | 중간 우선순위 |
| 사소한 수정(오타, 혼란스럽지만 작동하는 UI) | 낮은 우선순위 |
### 기능 요청
> [!NOTE]
> 기능 요청을 제출할 때 다음 정보를 포함해 주세요:
- 명확하고 설명적인 제목
- 기능에 대한 상세한 설명
- 해당 기능의 사용 사례
- 기능 요청에 관한 기타 컨텍스트 또는 스크린샷
우선순위 결정 방법:
| 기능 유형 | 우선순위 |
| ------------------------------------------------------------ | --------------- |
| 팀 구성원에 의해 레이블이 지정된 고우선순위 기능 | 높은 우선순위 |
| 우리의 [커뮤니티 피드백 보드](https://github.com/langgenius/dify/discussions/categories/feedbacks)에서 인기 있는 기능 요청 | 중간 우선순위 |
| 비핵심 기능 및 사소한 개선 | 낮은 우선순위 |
| 가치 있지만 즉시 필요하지 않은 기능 | 미래 기능 |
## PR 제출하기
### Pull Request 프로세스
1. 저장소를 포크하세요
2. PR을 작성하기 전에, 변경하고자 하는 내용에 대해 논의하기 위한 이슈를 생성해 주세요
3. 변경 사항을 위한 새 브랜치를 만드세요
4. 변경 사항에 대한 테스트를 적절히 추가해 주세요
5. 코드가 기존 테스트를 통과하는지 확인하세요
6. PR 설명에 이슈를 연결해 주세요, `fixes #<이슈_번호>`
7. 병합 완료!
### 프로젝트 설정하기
#### 프론트엔드
프론트엔드 서비스를 설정하려면, `web/README.md` 파일에 있는 우리의 [종합 가이드](https://github.com/langgenius/dify/blob/main/web/README.md)를 참조하세요. 이 문서는 프론트엔드 환경을 적절히 설정하는 데 도움이 되는 자세한 지침을 제공합니다.
#### 백엔드
백엔드 서비스를 설정하려면, `api/README.md` 파일에 있는 우리의 [상세 지침](https://github.com/langgenius/dify/blob/main/api/README.md)을 참조하세요. 이 문서는 백엔드를 원활하게 실행하는 데 도움이 되는 단계별 가이드를 포함하고 있습니다.
#### 기타 참고 사항
설정을 진행하기 전에 이 문서를 주의 깊게 검토하는 것을 권장합니다. 다음과 같은 필수 정보가 포함되어 있습니다:
- 필수 조건 및 종속성
- 설치 단계
- 구성 세부 정보
- 일반적인 문제 해결 팁
설정 과정에서 문제가 발생하면 언제든지 연락해 주세요.
## 도움 받기
기여하는 동안 막히거나 긴급한 질문이 있으면, 관련 GitHub 이슈를 통해 질문을 보내거나, 빠른 대화를 위해 우리의 [Discord](https://discord.gg/8Tpq4AcN9c)에 참여하세요.

93
CONTRIBUTING_PT.md Normal file
View File

@ -0,0 +1,93 @@
# CONTRIBUINDO
Então você está procurando contribuir para o Dify - isso é incrível, mal podemos esperar para ver o que você vai fazer. Como uma startup com equipe e financiamento limitados, temos grandes ambições de projetar o fluxo de trabalho mais intuitivo para construir e gerenciar aplicações LLM. Qualquer ajuda da comunidade conta, verdadeiramente.
Precisamos ser ágeis e entregar rapidamente considerando onde estamos, mas também queremos garantir que colaboradores como você tenham uma experiência o mais tranquila possível ao contribuir. Montamos este guia de contribuição com esse propósito, visando familiarizá-lo com a base de código e como trabalhamos com os colaboradores, para que você possa rapidamente passar para a parte divertida.
Este guia, como o próprio Dify, é um trabalho em constante evolução. Agradecemos muito a sua compreensão se às vezes ele ficar atrasado em relação ao projeto real, e damos as boas-vindas a qualquer feedback para que possamos melhorar.
Em termos de licenciamento, por favor, dedique um minuto para ler nosso breve [Acordo de Licença e Contribuidor](./LICENSE). A comunidade também adere ao [código de conduta](https://github.com/langgenius/.github/blob/main/CODE_OF_CONDUCT.md).
## Antes de começar
Procurando algo para resolver? Navegue por nossos [problemas para iniciantes](https://github.com/langgenius/dify/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22good%20first%20issue%22) e escolha um para começar!
Tem um novo modelo ou ferramenta para adicionar? Abra um PR em nosso [repositório de plugins](https://github.com/langgenius/dify-plugins) e mostre-nos o que você construiu.
Precisa atualizar um modelo existente, ferramenta ou corrigir alguns bugs? Vá para nosso [repositório oficial de plugins](https://github.com/langgenius/dify-official-plugins) e faça sua mágica!
Junte-se à diversão, contribua e vamos construir algo incrível juntos! 💡✨
Não se esqueça de vincular um problema existente ou abrir um novo problema na descrição do PR.
### Relatórios de bugs
> [!IMPORTANT]
> Por favor, certifique-se de incluir as seguintes informações ao enviar um relatório de bug:
- Um título claro e descritivo
- Uma descrição detalhada do bug, incluindo quaisquer mensagens de erro
- Passos para reproduzir o bug
- Comportamento esperado
- **Logs**, se disponíveis, para problemas de backend, isso é realmente importante, você pode encontrá-los nos logs do docker-compose
- Capturas de tela ou vídeos, se aplicável
Como priorizamos:
| Tipo de Problema | Prioridade |
| ------------------------------------------------------------ | --------------- |
| Bugs em funções centrais (serviço em nuvem, não conseguir fazer login, aplicações não funcionando, falhas de segurança) | Crítica |
| Bugs não críticos, melhorias de desempenho | Prioridade Média |
| Correções menores (erros de digitação, interface confusa mas funcional) | Prioridade Baixa |
### Solicitações de recursos
> [!NOTE]
> Por favor, certifique-se de incluir as seguintes informações ao enviar uma solicitação de recurso:
- Um título claro e descritivo
- Uma descrição detalhada do recurso
- Um caso de uso para o recurso
- Qualquer outro contexto ou capturas de tela sobre a solicitação de recurso
Como priorizamos:
| Tipo de Recurso | Prioridade |
| ------------------------------------------------------------ | --------------- |
| Recursos de alta prioridade conforme rotulado por um membro da equipe | Prioridade Alta |
| Solicitações populares de recursos do nosso [quadro de feedback da comunidade](https://github.com/langgenius/dify/discussions/categories/feedbacks) | Prioridade Média |
| Recursos não essenciais e melhorias menores | Prioridade Baixa |
| Valiosos mas não imediatos | Recurso Futuro |
## Enviando seu PR
### Processo de Pull Request
1. Faça um fork do repositório
2. Antes de elaborar um PR, por favor crie um problema para discutir as mudanças que você quer fazer
3. Crie um novo branch para suas alterações
4. Por favor, adicione testes para suas alterações conforme apropriado
5. Certifique-se de que seu código passa nos testes existentes
6. Por favor, vincule o problema na descrição do PR, `fixes #<número_do_problema>`
7. Faça o merge do seu código!
### Configurando o projeto
#### Frontend
Para configurar o serviço frontend, por favor consulte nosso [guia abrangente](https://github.com/langgenius/dify/blob/main/web/README.md) no arquivo `web/README.md`. Este documento fornece instruções detalhadas para ajudá-lo a configurar o ambiente frontend adequadamente.
#### Backend
Para configurar o serviço backend, por favor consulte nossas [instruções detalhadas](https://github.com/langgenius/dify/blob/main/api/README.md) no arquivo `api/README.md`. Este documento contém um guia passo a passo para ajudá-lo a colocar o backend em funcionamento sem problemas.
#### Outras coisas a observar
Recomendamos revisar este documento cuidadosamente antes de prosseguir com a configuração, pois ele contém informações essenciais sobre:
- Pré-requisitos e dependências
- Etapas de instalação
- Detalhes de configuração
- Dicas comuns de solução de problemas
Sinta-se à vontade para entrar em contato se encontrar quaisquer problemas durante o processo de configuração.
## Obtendo Ajuda
Se você ficar preso ou tiver uma dúvida urgente enquanto contribui, simplesmente envie suas perguntas através do problema relacionado no GitHub, ou entre no nosso [Discord](https://discord.gg/8Tpq4AcN9c) para uma conversa rápida.

93
CONTRIBUTING_TR.md Normal file
View File

@ -0,0 +1,93 @@
# KATKIDA BULUNMAK
Demek Dify'a katkıda bulunmak istiyorsunuz - bu harika, ne yapacağınızı görmek için sabırsızlanıyoruz. Sınırlı personel ve finansmana sahip bir startup olarak, LLM uygulamaları oluşturmak ve yönetmek için en sezgisel iş akışını tasarlama konusunda büyük hedeflerimiz var. Topluluktan gelen her türlü yardım gerçekten önemli.
Bulunduğumuz noktada çevik olmamız ve hızlı hareket etmemiz gerekiyor, ancak sizin gibi katkıda bulunanların mümkün olduğunca sorunsuz bir deneyim yaşamasını da sağlamak istiyoruz. Bu katkı rehberini bu amaçla hazırladık; sizi kod tabanıyla ve katkıda bulunanlarla nasıl çalıştığımızla tanıştırmayı, böylece hızlıca eğlenceli kısma geçebilmenizi hedefliyoruz.
Bu rehber, Dify'ın kendisi gibi, sürekli gelişen bir çalışmadır. Bazen gerçek projenin gerisinde kalırsa anlayışınız için çok minnettarız ve gelişmemize yardımcı olacak her türlü geri bildirimi memnuniyetle karşılıyoruz.
Lisanslama konusunda, lütfen kısa [Lisans ve Katkıda Bulunan Anlaşmamızı](./LICENSE) okumak için bir dakikanızı ayırın. Topluluk ayrıca [davranış kurallarına](https://github.com/langgenius/.github/blob/main/CODE_OF_CONDUCT.md) da uyar.
## Başlamadan Önce
Üzerinde çalışacak bir şey mi arıyorsunuz? [İlk katkıda bulunanlar için iyi sorunlarımıza](https://github.com/langgenius/dify/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22good%20first%20issue%22) göz atın ve başlamak için birini seçin!
Eklenecek harika bir yeni model runtime'ı veya aracınız mı var? [Eklenti depomuzda](https://github.com/langgenius/dify-plugins) bir PR açın ve ne yaptığınızı bize gösterin.
Mevcut bir model runtime'ını, aracı güncellemek veya bazı hataları düzeltmek mi istiyorsunuz? [Resmi eklenti depomuza](https://github.com/langgenius/dify-official-plugins) gidin ve sihrinizi gösterin!
Eğlenceye katılın, katkıda bulunun ve birlikte harika bir şeyler inşa edelim! 💡✨
PR açıklamasında mevcut bir sorunu bağlamayı veya yeni bir sorun açmayı unutmayın.
### Hata Raporları
> [!IMPORTANT]
> Lütfen bir hata raporu gönderirken aşağıdaki bilgileri dahil ettiğinizden emin olun:
- Net ve açıklayıcı bir başlık
- Hata mesajları dahil hatanın ayrıntılı bir açıklaması
- Hatayı tekrarlamak için adımlar
- Beklenen davranış
- Mümkünse **Loglar**, backend sorunları için, bu gerçekten önemlidir, bunları docker-compose loglarında bulabilirsiniz
- Uygunsa ekran görüntüleri veya videolar
Nasıl önceliklendiriyoruz:
| Sorun Türü | Öncelik |
| ------------------------------------------------------------ | --------------- |
| Temel işlevlerdeki hatalar (bulut hizmeti, giriş yapamama, çalışmayan uygulamalar, güvenlik açıkları) | Kritik |
| Kritik olmayan hatalar, performans artışları | Orta Öncelik |
| Küçük düzeltmeler (yazım hataları, kafa karıştırıcı ama çalışan UI) | Düşük Öncelik |
### Özellik İstekleri
> [!NOTE]
> Lütfen bir özellik isteği gönderirken aşağıdaki bilgileri dahil ettiğinizden emin olun:
- Net ve açıklayıcı bir başlık
- Özelliğin ayrıntılı bir açıklaması
- Özellik için bir kullanım durumu
- Özellik isteği hakkında diğer bağlamlar veya ekran görüntüleri
Nasıl önceliklendiriyoruz:
| Özellik Türü | Öncelik |
| ------------------------------------------------------------ | --------------- |
| Bir ekip üyesi tarafından etiketlenen Yüksek Öncelikli Özellikler | Yüksek Öncelik |
| [Topluluk geri bildirim panosundan](https://github.com/langgenius/dify/discussions/categories/feedbacks) popüler özellik istekleri | Orta Öncelik |
| Temel olmayan özellikler ve küçük geliştirmeler | Düşük Öncelik |
| Değerli ama acil olmayan | Gelecek-Özellik |
## PR'nizi Göndermek
### Pull Request Süreci
1. Depoyu fork edin
2. Bir PR taslağı oluşturmadan önce, yapmak istediğiniz değişiklikleri tartışmak için lütfen bir sorun oluşturun
3. Değişiklikleriniz için yeni bir dal oluşturun
4. Lütfen değişiklikleriniz için uygun testler ekleyin
5. Kodunuzun mevcut testleri geçtiğinden emin olun
6. Lütfen PR açıklamasında sorunu bağlayın, `fixes #<sorun_numarası>`
7. Kodunuzu birleştirin!
### Projeyi Kurma
#### Frontend
Frontend hizmetini kurmak için, lütfen `web/README.md` dosyasındaki kapsamlı [rehberimize](https://github.com/langgenius/dify/blob/main/web/README.md) bakın. Bu belge, frontend ortamını düzgün bir şekilde kurmanıza yardımcı olacak ayrıntılı talimatlar sağlar.
#### Backend
Backend hizmetini kurmak için, lütfen `api/README.md` dosyasındaki detaylı [talimatlarımıza](https://github.com/langgenius/dify/blob/main/api/README.md) bakın. Bu belge, backend'i sorunsuz bir şekilde çalıştırmanıza yardımcı olacak adım adım bir kılavuz içerir.
#### Dikkat Edilecek Diğer Şeyler
Kuruluma geçmeden önce bu belgeyi dikkatlice incelemenizi öneririz, çünkü şunlar hakkında temel bilgiler içerir:
- Ön koşullar ve bağımlılıklar
- Kurulum adımları
- Yapılandırma detayları
- Yaygın sorun giderme ipuçları
Kurulum süreci sırasında herhangi bir sorunla karşılaşırsanız bizimle iletişime geçmekten çekinmeyin.
## Yardım Almak
Katkıda bulunurken takılırsanız veya yanıcı bir sorunuz olursa, sorularınızı ilgili GitHub sorunu aracılığıyla bize gönderin veya hızlı bir sohbet için [Discord'umuza](https://discord.gg/8Tpq4AcN9c) katılın.

View File

@ -10,8 +10,6 @@ a. Multi-tenant service: Unless explicitly authorized by Dify in writing, you ma
b. LOGO and copyright information: In the process of using Dify's frontend, you may not remove or modify the LOGO or copyright information in the Dify console or applications. This restriction is inapplicable to uses of Dify that do not involve its frontend. b. LOGO and copyright information: In the process of using Dify's frontend, you may not remove or modify the LOGO or copyright information in the Dify console or applications. This restriction is inapplicable to uses of Dify that do not involve its frontend.
- Frontend Definition: For the purposes of this license, the "frontend" of Dify includes all components located in the `web/` directory when running Dify from the raw source code, or the "web" image when running Dify with Docker. - Frontend Definition: For the purposes of this license, the "frontend" of Dify includes all components located in the `web/` directory when running Dify from the raw source code, or the "web" image when running Dify with Docker.
Please contact business@dify.ai by email to inquire about licensing matters.
2. As a contributor, you should agree that: 2. As a contributor, you should agree that:
a. The producer can adjust the open-source agreement to be more strict or relaxed as deemed necessary. a. The producer can adjust the open-source agreement to be more strict or relaxed as deemed necessary.

View File

@ -137,7 +137,7 @@ WEB_API_CORS_ALLOW_ORIGINS=http://127.0.0.1:3000,*
CONSOLE_CORS_ALLOW_ORIGINS=http://127.0.0.1:3000,* CONSOLE_CORS_ALLOW_ORIGINS=http://127.0.0.1:3000,*
# Vector database configuration # Vector database configuration
# support: weaviate, qdrant, milvus, myscale, relyt, pgvecto_rs, pgvector, pgvector, chroma, opensearch, tidb_vector, couchbase, vikingdb, upstash, lindorm, oceanbase, opengauss # support: weaviate, qdrant, milvus, myscale, relyt, pgvecto_rs, pgvector, pgvector, chroma, opensearch, tidb_vector, couchbase, vikingdb, upstash, lindorm, oceanbase, opengauss, tablestore
VECTOR_STORE=weaviate VECTOR_STORE=weaviate
# Weaviate configuration # Weaviate configuration
@ -212,6 +212,12 @@ PGVECTOR_DATABASE=postgres
PGVECTOR_MIN_CONNECTION=1 PGVECTOR_MIN_CONNECTION=1
PGVECTOR_MAX_CONNECTION=5 PGVECTOR_MAX_CONNECTION=5
# TableStore Vector configuration
TABLESTORE_ENDPOINT=https://instance-name.cn-hangzhou.ots.aliyuncs.com
TABLESTORE_INSTANCE_NAME=instance-name
TABLESTORE_ACCESS_KEY_ID=xxx
TABLESTORE_ACCESS_KEY_SECRET=xxx
# Tidb Vector configuration # Tidb Vector configuration
TIDB_VECTOR_HOST=xxx.eu-central-1.xxx.aws.tidbcloud.com TIDB_VECTOR_HOST=xxx.eu-central-1.xxx.aws.tidbcloud.com
TIDB_VECTOR_PORT=4000 TIDB_VECTOR_PORT=4000
@ -297,6 +303,7 @@ OCEANBASE_VECTOR_USER=root@test
OCEANBASE_VECTOR_PASSWORD=difyai123456 OCEANBASE_VECTOR_PASSWORD=difyai123456
OCEANBASE_VECTOR_DATABASE=test OCEANBASE_VECTOR_DATABASE=test
OCEANBASE_MEMORY_LIMIT=6G OCEANBASE_MEMORY_LIMIT=6G
OCEANBASE_ENABLE_HYBRID_SEARCH=false
# openGauss configuration # openGauss configuration
OPENGAUSS_HOST=127.0.0.1 OPENGAUSS_HOST=127.0.0.1

View File

@ -276,6 +276,7 @@ def migrate_knowledge_vector_database():
VectorType.ORACLE, VectorType.ORACLE,
VectorType.ELASTICSEARCH, VectorType.ELASTICSEARCH,
VectorType.OPENGAUSS, VectorType.OPENGAUSS,
VectorType.TABLESTORE,
} }
lower_collection_vector_types = { lower_collection_vector_types = {
VectorType.ANALYTICDB, VectorType.ANALYTICDB,

View File

@ -1,6 +1,6 @@
from typing import Optional from typing import Optional
from pydantic import Field, NonNegativeInt, computed_field from pydantic import Field, NonNegativeInt
from pydantic_settings import BaseSettings from pydantic_settings import BaseSettings

View File

@ -33,6 +33,7 @@ from .vdb.pgvector_config import PGVectorConfig
from .vdb.pgvectors_config import PGVectoRSConfig from .vdb.pgvectors_config import PGVectoRSConfig
from .vdb.qdrant_config import QdrantConfig from .vdb.qdrant_config import QdrantConfig
from .vdb.relyt_config import RelytConfig from .vdb.relyt_config import RelytConfig
from .vdb.tablestore_config import TableStoreConfig
from .vdb.tencent_vector_config import TencentVectorDBConfig from .vdb.tencent_vector_config import TencentVectorDBConfig
from .vdb.tidb_on_qdrant_config import TidbOnQdrantConfig from .vdb.tidb_on_qdrant_config import TidbOnQdrantConfig
from .vdb.tidb_vector_config import TiDBVectorConfig from .vdb.tidb_vector_config import TiDBVectorConfig
@ -283,5 +284,6 @@ class MiddlewareConfig(
OceanBaseVectorConfig, OceanBaseVectorConfig,
BaiduVectorDBConfig, BaiduVectorDBConfig,
OpenGaussConfig, OpenGaussConfig,
TableStoreConfig,
): ):
pass pass

View File

@ -33,3 +33,9 @@ class OceanBaseVectorConfig(BaseSettings):
description="Name of the OceanBase Vector database to connect to", description="Name of the OceanBase Vector database to connect to",
default=None, default=None,
) )
OCEANBASE_ENABLE_HYBRID_SEARCH: bool = Field(
description="Enable hybrid search features (requires OceanBase >= 4.3.5.1). Set to false for compatibility "
"with older versions",
default=False,
)

View File

@ -43,3 +43,8 @@ class OpenGaussConfig(BaseSettings):
description="Max connection of the OpenGauss database", description="Max connection of the OpenGauss database",
default=5, default=5,
) )
OPENGAUSS_ENABLE_PQ: bool = Field(
description="Enable openGauss PQ acceleration feature",
default=False,
)

View File

@ -0,0 +1,30 @@
from typing import Optional
from pydantic import Field
from pydantic_settings import BaseSettings
class TableStoreConfig(BaseSettings):
"""
Configuration settings for TableStore.
"""
TABLESTORE_ENDPOINT: Optional[str] = Field(
description="Endpoint address of the TableStore server (e.g. 'https://instance-name.cn-hangzhou.ots.aliyuncs.com')",
default=None,
)
TABLESTORE_INSTANCE_NAME: Optional[str] = Field(
description="Instance name to access TableStore server (eg. 'instance-name')",
default=None,
)
TABLESTORE_ACCESS_KEY_ID: Optional[str] = Field(
description="AccessKey id for the instance name",
default=None,
)
TABLESTORE_ACCESS_KEY_SECRET: Optional[str] = Field(
description="AccessKey secret for the instance name",
default=None,
)

View File

@ -9,7 +9,7 @@ class PackagingInfo(BaseSettings):
CURRENT_VERSION: str = Field( CURRENT_VERSION: str = Field(
description="Dify version", description="Dify version",
default="1.1.2", default="1.1.3",
) )
COMMIT_SHA: str = Field( COMMIT_SHA: str = Field(

View File

@ -1,5 +1,3 @@
from typing import Optional
from pydantic import Field from pydantic import Field
from .apollo import ApolloSettingsSourceInfo from .apollo import ApolloSettingsSourceInfo

View File

@ -646,7 +646,6 @@ class DatasetRetrievalSettingApi(Resource):
| VectorType.BAIDU | VectorType.BAIDU
| VectorType.VIKINGDB | VectorType.VIKINGDB
| VectorType.UPSTASH | VectorType.UPSTASH
| VectorType.OCEANBASE
): ):
return {"retrieval_method": [RetrievalMethod.SEMANTIC_SEARCH.value]} return {"retrieval_method": [RetrievalMethod.SEMANTIC_SEARCH.value]}
case ( case (
@ -664,6 +663,8 @@ class DatasetRetrievalSettingApi(Resource):
| VectorType.COUCHBASE | VectorType.COUCHBASE
| VectorType.MILVUS | VectorType.MILVUS
| VectorType.OPENGAUSS | VectorType.OPENGAUSS
| VectorType.OCEANBASE
| VectorType.TABLESTORE
): ):
return { return {
"retrieval_method": [ "retrieval_method": [
@ -692,7 +693,6 @@ class DatasetRetrievalSettingMockApi(Resource):
| VectorType.BAIDU | VectorType.BAIDU
| VectorType.VIKINGDB | VectorType.VIKINGDB
| VectorType.UPSTASH | VectorType.UPSTASH
| VectorType.OCEANBASE
): ):
return {"retrieval_method": [RetrievalMethod.SEMANTIC_SEARCH.value]} return {"retrieval_method": [RetrievalMethod.SEMANTIC_SEARCH.value]}
case ( case (
@ -708,6 +708,8 @@ class DatasetRetrievalSettingMockApi(Resource):
| VectorType.PGVECTOR | VectorType.PGVECTOR
| VectorType.LINDORM | VectorType.LINDORM
| VectorType.OPENGAUSS | VectorType.OPENGAUSS
| VectorType.OCEANBASE
| VectorType.TABLESTORE
): ):
return { return {
"retrieval_method": [ "retrieval_method": [

View File

@ -6,6 +6,7 @@ from controllers.console.wraps import setup_required
from controllers.inner_api import api from controllers.inner_api import api
from controllers.inner_api.wraps import enterprise_inner_api_only from controllers.inner_api.wraps import enterprise_inner_api_only
from events.tenant_event import tenant_was_created from events.tenant_event import tenant_was_created
from extensions.ext_database import db
from models.account import Account from models.account import Account
from services.account_service import TenantService from services.account_service import TenantService
@ -19,7 +20,7 @@ class EnterpriseWorkspace(Resource):
parser.add_argument("owner_email", type=str, required=True, location="json") parser.add_argument("owner_email", type=str, required=True, location="json")
args = parser.parse_args() args = parser.parse_args()
account = Account.query.filter_by(email=args["owner_email"]).first() account = db.session.query(Account).filter_by(email=args["owner_email"]).first()
if account is None: if account is None:
return {"message": "owner account not found."}, 404 return {"message": "owner account not found."}, 404

View File

@ -27,6 +27,7 @@ from core.model_runtime.errors.invoke import InvokeError
from extensions.ext_database import db from extensions.ext_database import db
from fields.workflow_app_log_fields import workflow_app_log_pagination_fields from fields.workflow_app_log_fields import workflow_app_log_pagination_fields
from libs import helper from libs import helper
from libs.helper import TimestampField
from models.model import App, AppMode, EndUser from models.model import App, AppMode, EndUser
from models.workflow import WorkflowRun, WorkflowRunStatus from models.workflow import WorkflowRun, WorkflowRunStatus
from services.app_generate_service import AppGenerateService from services.app_generate_service import AppGenerateService
@ -44,8 +45,8 @@ workflow_run_fields = {
"error": fields.String, "error": fields.String,
"total_steps": fields.Integer, "total_steps": fields.Integer,
"total_tokens": fields.Integer, "total_tokens": fields.Integer,
"created_at": fields.DateTime, "created_at": TimestampField,
"finished_at": fields.DateTime, "finished_at": TimestampField,
"elapsed_time": fields.Float, "elapsed_time": fields.Float,
} }
@ -53,7 +54,7 @@ workflow_run_fields = {
class WorkflowRunDetailApi(Resource): class WorkflowRunDetailApi(Resource):
@validate_app_token @validate_app_token
@marshal_with(workflow_run_fields) @marshal_with(workflow_run_fields)
def get(self, app_model: App, workflow_id: str): def get(self, app_model: App, workflow_run_id: str):
""" """
Get a workflow task running detail Get a workflow task running detail
""" """
@ -61,7 +62,7 @@ class WorkflowRunDetailApi(Resource):
if app_mode != AppMode.WORKFLOW: if app_mode != AppMode.WORKFLOW:
raise NotWorkflowAppError() raise NotWorkflowAppError()
workflow_run = db.session.query(WorkflowRun).filter(WorkflowRun.id == workflow_id).first() workflow_run = db.session.query(WorkflowRun).filter(WorkflowRun.id == workflow_run_id).first()
return workflow_run return workflow_run
@ -162,6 +163,6 @@ class WorkflowAppLogApi(Resource):
api.add_resource(WorkflowRunApi, "/workflows/run") api.add_resource(WorkflowRunApi, "/workflows/run")
api.add_resource(WorkflowRunDetailApi, "/workflows/run/<string:workflow_id>") api.add_resource(WorkflowRunDetailApi, "/workflows/run/<string:workflow_run_id>")
api.add_resource(WorkflowTaskStopApi, "/workflows/tasks/<string:task_id>/stop") api.add_resource(WorkflowTaskStopApi, "/workflows/tasks/<string:task_id>/stop")
api.add_resource(WorkflowAppLogApi, "/workflows/logs") api.add_resource(WorkflowAppLogApi, "/workflows/logs")

View File

@ -142,6 +142,7 @@ class DatasetApi(DatasetApiResource):
Deletes a dataset given its ID. Deletes a dataset given its ID.
Args: Args:
_: ignore
dataset_id (UUID): The ID of the dataset to be deleted. dataset_id (UUID): The ID of the dataset to be deleted.
Returns: Returns:

View File

@ -1,3 +1,4 @@
from flask import request
from flask_login import current_user # type: ignore from flask_login import current_user # type: ignore
from flask_restful import marshal, reqparse # type: ignore from flask_restful import marshal, reqparse # type: ignore
from werkzeug.exceptions import NotFound from werkzeug.exceptions import NotFound
@ -13,10 +14,20 @@ from core.errors.error import LLMBadRequestError, ProviderTokenNotInitError
from core.model_manager import ModelManager from core.model_manager import ModelManager
from core.model_runtime.entities.model_entities import ModelType from core.model_runtime.entities.model_entities import ModelType
from extensions.ext_database import db from extensions.ext_database import db
from fields.segment_fields import segment_fields from fields.segment_fields import child_chunk_fields, segment_fields
from models.dataset import Dataset, DocumentSegment from models.dataset import Dataset
from services.dataset_service import DatasetService, DocumentService, SegmentService from services.dataset_service import DatasetService, DocumentService, SegmentService
from services.entities.knowledge_entities.knowledge_entities import SegmentUpdateArgs from services.entities.knowledge_entities.knowledge_entities import SegmentUpdateArgs
from services.errors.chunk import (
ChildChunkDeleteIndexError,
ChildChunkIndexingError,
)
from services.errors.chunk import (
ChildChunkDeleteIndexError as ChildChunkDeleteIndexServiceError,
)
from services.errors.chunk import (
ChildChunkIndexingError as ChildChunkIndexingServiceError,
)
class SegmentApi(DatasetApiResource): class SegmentApi(DatasetApiResource):
@ -70,10 +81,12 @@ class SegmentApi(DatasetApiResource):
return {"error": "Segments is required"}, 400 return {"error": "Segments is required"}, 400
def get(self, tenant_id, dataset_id, document_id): def get(self, tenant_id, dataset_id, document_id):
"""Create single segment.""" """Get segments."""
# check dataset # check dataset
dataset_id = str(dataset_id) dataset_id = str(dataset_id)
tenant_id = str(tenant_id) tenant_id = str(tenant_id)
page = request.args.get("page", default=1, type=int)
limit = request.args.get("limit", default=20, type=int)
dataset = db.session.query(Dataset).filter(Dataset.tenant_id == tenant_id, Dataset.id == dataset_id).first() dataset = db.session.query(Dataset).filter(Dataset.tenant_id == tenant_id, Dataset.id == dataset_id).first()
if not dataset: if not dataset:
raise NotFound("Dataset not found.") raise NotFound("Dataset not found.")
@ -107,19 +120,23 @@ class SegmentApi(DatasetApiResource):
status_list = args["status"] status_list = args["status"]
keyword = args["keyword"] keyword = args["keyword"]
query = DocumentSegment.query.filter( segments, total = SegmentService.get_segments(
DocumentSegment.document_id == str(document_id), DocumentSegment.tenant_id == current_user.current_tenant_id document_id=document_id,
tenant_id=current_user.current_tenant_id,
status_list=args["status"],
keyword=args["keyword"],
) )
if status_list: response = {
query = query.filter(DocumentSegment.status.in_(status_list)) "data": marshal(segments, segment_fields),
"doc_form": document.doc_form,
"total": total,
"has_more": len(segments) == limit,
"limit": limit,
"page": page,
}
if keyword: return response, 200
query = query.where(DocumentSegment.content.ilike(f"%{keyword}%"))
total = query.count()
segments = query.order_by(DocumentSegment.position).all()
return {"data": marshal(segments, segment_fields), "doc_form": document.doc_form, "total": total}, 200
class DatasetSegmentApi(DatasetApiResource): class DatasetSegmentApi(DatasetApiResource):
@ -138,9 +155,8 @@ class DatasetSegmentApi(DatasetApiResource):
if not document: if not document:
raise NotFound("Document not found.") raise NotFound("Document not found.")
# check segment # check segment
segment = DocumentSegment.query.filter( segment_id = str(segment_id)
DocumentSegment.id == str(segment_id), DocumentSegment.tenant_id == current_user.current_tenant_id segment = SegmentService.get_segment_by_id(segment_id=segment_id, tenant_id=current_user.current_tenant_id)
).first()
if not segment: if not segment:
raise NotFound("Segment not found.") raise NotFound("Segment not found.")
SegmentService.delete_segment(segment, document, dataset) SegmentService.delete_segment(segment, document, dataset)
@ -179,9 +195,7 @@ class DatasetSegmentApi(DatasetApiResource):
raise ProviderNotInitializeError(ex.description) raise ProviderNotInitializeError(ex.description)
# check segment # check segment
segment_id = str(segment_id) segment_id = str(segment_id)
segment = DocumentSegment.query.filter( segment = SegmentService.get_segment_by_id(segment_id=segment_id, tenant_id=current_user.current_tenant_id)
DocumentSegment.id == str(segment_id), DocumentSegment.tenant_id == current_user.current_tenant_id
).first()
if not segment: if not segment:
raise NotFound("Segment not found.") raise NotFound("Segment not found.")
@ -190,12 +204,200 @@ class DatasetSegmentApi(DatasetApiResource):
parser.add_argument("segment", type=dict, required=False, nullable=True, location="json") parser.add_argument("segment", type=dict, required=False, nullable=True, location="json")
args = parser.parse_args() args = parser.parse_args()
SegmentService.segment_create_args_validate(args["segment"], document) updated_segment = SegmentService.update_segment(
segment = SegmentService.update_segment(SegmentUpdateArgs(**args["segment"]), segment, document, dataset) SegmentUpdateArgs(**args["segment"]), segment, document, dataset
return {"data": marshal(segment, segment_fields), "doc_form": document.doc_form}, 200 )
return {"data": marshal(updated_segment, segment_fields), "doc_form": document.doc_form}, 200
class ChildChunkApi(DatasetApiResource):
"""Resource for child chunks."""
@cloud_edition_billing_resource_check("vector_space", "dataset")
@cloud_edition_billing_knowledge_limit_check("add_segment", "dataset")
def post(self, tenant_id, dataset_id, document_id, segment_id):
"""Create child chunk."""
# check dataset
dataset_id = str(dataset_id)
tenant_id = str(tenant_id)
dataset = db.session.query(Dataset).filter(Dataset.tenant_id == tenant_id, Dataset.id == dataset_id).first()
if not dataset:
raise NotFound("Dataset not found.")
# check document
document_id = str(document_id)
document = DocumentService.get_document(dataset.id, document_id)
if not document:
raise NotFound("Document not found.")
# check segment
segment_id = str(segment_id)
segment = SegmentService.get_segment_by_id(segment_id=segment_id, tenant_id=current_user.current_tenant_id)
if not segment:
raise NotFound("Segment not found.")
# check embedding model setting
if dataset.indexing_technique == "high_quality":
try:
model_manager = ModelManager()
model_manager.get_model_instance(
tenant_id=current_user.current_tenant_id,
provider=dataset.embedding_model_provider,
model_type=ModelType.TEXT_EMBEDDING,
model=dataset.embedding_model,
)
except LLMBadRequestError:
raise ProviderNotInitializeError(
"No Embedding Model available. Please configure a valid provider in the Settings -> Model Provider."
)
except ProviderTokenNotInitError as ex:
raise ProviderNotInitializeError(ex.description)
# validate args
parser = reqparse.RequestParser()
parser.add_argument("content", type=str, required=True, nullable=False, location="json")
args = parser.parse_args()
try:
child_chunk = SegmentService.create_child_chunk(args.get("content"), segment, document, dataset)
except ChildChunkIndexingServiceError as e:
raise ChildChunkIndexingError(str(e))
return {"data": marshal(child_chunk, child_chunk_fields)}, 200
def get(self, tenant_id, dataset_id, document_id, segment_id):
"""Get child chunks."""
# check dataset
dataset_id = str(dataset_id)
tenant_id = str(tenant_id)
dataset = db.session.query(Dataset).filter(Dataset.tenant_id == tenant_id, Dataset.id == dataset_id).first()
if not dataset:
raise NotFound("Dataset not found.")
# check document
document_id = str(document_id)
document = DocumentService.get_document(dataset.id, document_id)
if not document:
raise NotFound("Document not found.")
# check segment
segment_id = str(segment_id)
segment = SegmentService.get_segment_by_id(segment_id=segment_id, tenant_id=current_user.current_tenant_id)
if not segment:
raise NotFound("Segment not found.")
parser = reqparse.RequestParser()
parser.add_argument("limit", type=int, default=20, location="args")
parser.add_argument("keyword", type=str, default=None, location="args")
parser.add_argument("page", type=int, default=1, location="args")
args = parser.parse_args()
page = args["page"]
limit = min(args["limit"], 100)
keyword = args["keyword"]
child_chunks = SegmentService.get_child_chunks(segment_id, document_id, dataset_id, page, limit, keyword)
return {
"data": marshal(child_chunks.items, child_chunk_fields),
"total": child_chunks.total,
"total_pages": child_chunks.pages,
"page": page,
"limit": limit,
}, 200
class DatasetChildChunkApi(DatasetApiResource):
"""Resource for updating child chunks."""
@cloud_edition_billing_knowledge_limit_check("add_segment", "dataset")
def delete(self, tenant_id, dataset_id, document_id, segment_id, child_chunk_id):
"""Delete child chunk."""
# check dataset
dataset_id = str(dataset_id)
tenant_id = str(tenant_id)
dataset = db.session.query(Dataset).filter(Dataset.tenant_id == tenant_id, Dataset.id == dataset_id).first()
if not dataset:
raise NotFound("Dataset not found.")
# check document
document_id = str(document_id)
document = DocumentService.get_document(dataset.id, document_id)
if not document:
raise NotFound("Document not found.")
# check segment
segment_id = str(segment_id)
segment = SegmentService.get_segment_by_id(segment_id=segment_id, tenant_id=current_user.current_tenant_id)
if not segment:
raise NotFound("Segment not found.")
# check child chunk
child_chunk_id = str(child_chunk_id)
child_chunk = SegmentService.get_child_chunk_by_id(
child_chunk_id=child_chunk_id, tenant_id=current_user.current_tenant_id
)
if not child_chunk:
raise NotFound("Child chunk not found.")
try:
SegmentService.delete_child_chunk(child_chunk, dataset)
except ChildChunkDeleteIndexServiceError as e:
raise ChildChunkDeleteIndexError(str(e))
return {"result": "success"}, 200
@cloud_edition_billing_resource_check("vector_space", "dataset")
@cloud_edition_billing_knowledge_limit_check("add_segment", "dataset")
def patch(self, tenant_id, dataset_id, document_id, segment_id, child_chunk_id):
"""Update child chunk."""
# check dataset
dataset_id = str(dataset_id)
tenant_id = str(tenant_id)
dataset = db.session.query(Dataset).filter(Dataset.tenant_id == tenant_id, Dataset.id == dataset_id).first()
if not dataset:
raise NotFound("Dataset not found.")
# get document
document = DocumentService.get_document(dataset_id, document_id)
if not document:
raise NotFound("Document not found.")
# get segment
segment = SegmentService.get_segment_by_id(segment_id=segment_id, tenant_id=current_user.current_tenant_id)
if not segment:
raise NotFound("Segment not found.")
# get child chunk
child_chunk = SegmentService.get_child_chunk_by_id(
child_chunk_id=child_chunk_id, tenant_id=current_user.current_tenant_id
)
if not child_chunk:
raise NotFound("Child chunk not found.")
# validate args
parser = reqparse.RequestParser()
parser.add_argument("content", type=str, required=True, nullable=False, location="json")
args = parser.parse_args()
try:
child_chunk = SegmentService.update_child_chunk(
args.get("content"), child_chunk, segment, document, dataset
)
except ChildChunkIndexingServiceError as e:
raise ChildChunkIndexingError(str(e))
return {"data": marshal(child_chunk, child_chunk_fields)}, 200
api.add_resource(SegmentApi, "/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/segments") api.add_resource(SegmentApi, "/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/segments")
api.add_resource( api.add_resource(
DatasetSegmentApi, "/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/segments/<uuid:segment_id>" DatasetSegmentApi, "/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/segments/<uuid:segment_id>"
) )
api.add_resource(
ChildChunkApi, "/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/segments/<uuid:segment_id>/child_chunks"
)
api.add_resource(
DatasetChildChunkApi,
"/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/segments/<uuid:segment_id>/child_chunks/<uuid:child_chunk_id>",
)

View File

@ -332,7 +332,7 @@ class BaseAgentRunner(AppRunner):
agent_thought = updated_agent_thought agent_thought = updated_agent_thought
if thought: if thought:
agent_thought.thought = thought agent_thought.thought += thought
if tool_name: if tool_name:
agent_thought.tool = tool_name agent_thought.tool = tool_name

View File

@ -16,7 +16,6 @@ class ModelConfigConverter:
""" """
Convert app model config dict to entity. Convert app model config dict to entity.
:param app_config: app config :param app_config: app config
:param skip_check: skip check
:raises ProviderTokenNotInitError: provider token not init error :raises ProviderTokenNotInitError: provider token not init error
:return: app orchestration config entity :return: app orchestration config entity
""" """

View File

@ -88,7 +88,7 @@ class AdvancedChatAppGenerator(MessageBasedAppGenerator):
:param user: account or end user :param user: account or end user
:param args: request args :param args: request args
:param invoke_from: invoke from source :param invoke_from: invoke from source
:param stream: is stream :param streaming: is stream
""" """
if not args.get("query"): if not args.get("query"):
raise ValueError("query is required") raise ValueError("query is required")
@ -181,10 +181,10 @@ class AdvancedChatAppGenerator(MessageBasedAppGenerator):
:param app_model: App :param app_model: App
:param workflow: Workflow :param workflow: Workflow
:param node_id: the node id
:param user: account or end user :param user: account or end user
:param args: request args :param args: request args
:param invoke_from: invoke from source :param streaming: is streamed
:param stream: is stream
""" """
if not node_id: if not node_id:
raise ValueError("node_id is required") raise ValueError("node_id is required")
@ -238,10 +238,10 @@ class AdvancedChatAppGenerator(MessageBasedAppGenerator):
:param app_model: App :param app_model: App
:param workflow: Workflow :param workflow: Workflow
:param node_id: the node id
:param user: account or end user :param user: account or end user
:param args: request args :param args: request args
:param invoke_from: invoke from source :param streaming: is stream
:param stream: is stream
""" """
if not node_id: if not node_id:
raise ValueError("node_id is required") raise ValueError("node_id is required")

View File

@ -80,7 +80,7 @@ class AgentChatAppGenerator(MessageBasedAppGenerator):
:param user: account or end user :param user: account or end user
:param args: request args :param args: request args
:param invoke_from: invoke from source :param invoke_from: invoke from source
:param stream: is stream :param streaming: is stream
""" """
if not streaming: if not streaming:
raise ValueError("Agent Chat App does not support blocking mode") raise ValueError("Agent Chat App does not support blocking mode")

View File

@ -157,6 +157,7 @@ class AppRunner:
:param files: files :param files: files
:param query: query :param query: query
:param memory: memory :param memory: memory
:param image_detail_config: the image quality config
:return: :return:
""" """
# get prompt without memory and context # get prompt without memory and context

View File

@ -76,7 +76,7 @@ class ChatAppGenerator(MessageBasedAppGenerator):
:param user: account or end user :param user: account or end user
:param args: request args :param args: request args
:param invoke_from: invoke from source :param invoke_from: invoke from source
:param stream: is stream :param streaming: is stream
""" """
if not args.get("query"): if not args.get("query"):
raise ValueError("query is required") raise ValueError("query is required")

View File

@ -74,7 +74,7 @@ class CompletionAppGenerator(MessageBasedAppGenerator):
:param user: account or end user :param user: account or end user
:param args: request args :param args: request args
:param invoke_from: invoke from source :param invoke_from: invoke from source
:param stream: is stream :param streaming: is stream
""" """
query = args["query"] query = args["query"]
if not isinstance(query, str): if not isinstance(query, str):

View File

@ -148,6 +148,13 @@ class MessageBasedAppGenerator(BaseAppGenerator):
# get conversation introduction # get conversation introduction
introduction = self._get_conversation_introduction(application_generate_entity) introduction = self._get_conversation_introduction(application_generate_entity)
# get conversation name
if isinstance(application_generate_entity, AdvancedChatAppGenerateEntity):
query = application_generate_entity.query or "New conversation"
else:
query = next(iter(application_generate_entity.inputs.values()), "New conversation")
conversation_name = (query[:20] + "") if len(query) > 20 else query
if not conversation: if not conversation:
conversation = Conversation( conversation = Conversation(
app_id=app_config.app_id, app_id=app_config.app_id,
@ -156,7 +163,7 @@ class MessageBasedAppGenerator(BaseAppGenerator):
model_id=model_id, model_id=model_id,
override_model_configs=json.dumps(override_model_configs) if override_model_configs else None, override_model_configs=json.dumps(override_model_configs) if override_model_configs else None,
mode=app_config.app_mode.value, mode=app_config.app_mode.value,
name="New conversation", name=conversation_name,
inputs=application_generate_entity.inputs, inputs=application_generate_entity.inputs,
introduction=introduction, introduction=introduction,
system_instruction="", system_instruction="",

View File

@ -158,7 +158,7 @@ class WorkflowAppGenerator(BaseAppGenerator):
:param user: account or end user :param user: account or end user
:param application_generate_entity: application generate entity :param application_generate_entity: application generate entity
:param invoke_from: invoke from source :param invoke_from: invoke from source
:param stream: is stream :param streaming: is stream
:param workflow_thread_pool_id: workflow thread pool id :param workflow_thread_pool_id: workflow thread pool id
""" """
# init queue manager # init queue manager
@ -208,10 +208,10 @@ class WorkflowAppGenerator(BaseAppGenerator):
:param app_model: App :param app_model: App
:param workflow: Workflow :param workflow: Workflow
:param node_id: the node id
:param user: account or end user :param user: account or end user
:param args: request args :param args: request args
:param invoke_from: invoke from source :param streaming: is streamed
:param stream: is stream
""" """
if not node_id: if not node_id:
raise ValueError("node_id is required") raise ValueError("node_id is required")
@ -264,10 +264,10 @@ class WorkflowAppGenerator(BaseAppGenerator):
:param app_model: App :param app_model: App
:param workflow: Workflow :param workflow: Workflow
:param node_id: the node id
:param user: account or end user :param user: account or end user
:param args: request args :param args: request args
:param invoke_from: invoke from source :param streaming: is streamed
:param stream: is stream
""" """
if not node_id: if not node_id:
raise ValueError("node_id is required") raise ValueError("node_id is required")

View File

@ -44,9 +44,6 @@ class WorkflowAppRunner(WorkflowBasedAppRunner):
def run(self) -> None: def run(self) -> None:
""" """
Run application Run application
:param application_generate_entity: application generate entity
:param queue_manager: application queue manager
:return:
""" """
app_config = self.application_generate_entity.app_config app_config = self.application_generate_entity.app_config
app_config = cast(WorkflowAppConfig, app_config) app_config = cast(WorkflowAppConfig, app_config)

View File

@ -48,7 +48,7 @@ class MessageCycleManage:
def _generate_conversation_name(self, *, conversation_id: str, query: str) -> Optional[Thread]: def _generate_conversation_name(self, *, conversation_id: str, query: str) -> Optional[Thread]:
""" """
Generate conversation name. Generate conversation name.
:param conversation: conversation :param conversation_id: conversation id
:param query: query :param query: query
:return: thread :return: thread
""" """

View File

@ -44,6 +44,7 @@ from core.app.entities.task_entities import (
WorkflowFinishStreamResponse, WorkflowFinishStreamResponse,
WorkflowStartStreamResponse, WorkflowStartStreamResponse,
) )
from core.app.task_pipeline.exc import WorkflowRunNotFoundError
from core.file import FILE_MODEL_IDENTITY, File from core.file import FILE_MODEL_IDENTITY, File
from core.model_runtime.utils.encoders import jsonable_encoder from core.model_runtime.utils.encoders import jsonable_encoder
from core.ops.entities.trace_entity import TraceTaskName from core.ops.entities.trace_entity import TraceTaskName
@ -66,8 +67,6 @@ from models.workflow import (
WorkflowRunStatus, WorkflowRunStatus,
) )
from .exc import WorkflowRunNotFoundError
class WorkflowCycleManage: class WorkflowCycleManage:
def __init__( def __init__(
@ -154,7 +153,7 @@ class WorkflowCycleManage:
) -> WorkflowRun: ) -> WorkflowRun:
""" """
Workflow run success Workflow run success
:param workflow_run: workflow run :param workflow_run_id: workflow run id
:param start_at: start time :param start_at: start time
:param total_tokens: total tokens :param total_tokens: total tokens
:param total_steps: total steps :param total_steps: total steps
@ -166,7 +165,7 @@ class WorkflowCycleManage:
outputs = WorkflowEntry.handle_special_values(outputs) outputs = WorkflowEntry.handle_special_values(outputs)
workflow_run.status = WorkflowRunStatus.SUCCEEDED.value workflow_run.status = WorkflowRunStatus.SUCCEEDED
workflow_run.outputs = json.dumps(outputs or {}) workflow_run.outputs = json.dumps(outputs or {})
workflow_run.elapsed_time = time.perf_counter() - start_at workflow_run.elapsed_time = time.perf_counter() - start_at
workflow_run.total_tokens = total_tokens workflow_run.total_tokens = total_tokens
@ -201,7 +200,7 @@ class WorkflowCycleManage:
workflow_run = self._get_workflow_run(session=session, workflow_run_id=workflow_run_id) workflow_run = self._get_workflow_run(session=session, workflow_run_id=workflow_run_id)
outputs = WorkflowEntry.handle_special_values(dict(outputs) if outputs else None) outputs = WorkflowEntry.handle_special_values(dict(outputs) if outputs else None)
workflow_run.status = WorkflowRunStatus.PARTIAL_SUCCESSED.value workflow_run.status = WorkflowRunStatus.PARTIAL_SUCCEEDED.value
workflow_run.outputs = json.dumps(outputs or {}) workflow_run.outputs = json.dumps(outputs or {})
workflow_run.elapsed_time = time.perf_counter() - start_at workflow_run.elapsed_time = time.perf_counter() - start_at
workflow_run.total_tokens = total_tokens workflow_run.total_tokens = total_tokens
@ -237,7 +236,7 @@ class WorkflowCycleManage:
) -> WorkflowRun: ) -> WorkflowRun:
""" """
Workflow run failed Workflow run failed
:param workflow_run: workflow run :param workflow_run_id: workflow run id
:param start_at: start time :param start_at: start time
:param total_tokens: total tokens :param total_tokens: total tokens
:param total_steps: total steps :param total_steps: total steps

View File

@ -4,12 +4,10 @@ import time
from typing import Optional from typing import Optional
from configs import dify_config from configs import dify_config
from constants import IMAGE_EXTENSIONS
from core.helper.url_signer import UrlSigner from core.helper.url_signer import UrlSigner
from extensions.ext_storage import storage from extensions.ext_storage import storage
IMAGE_EXTENSIONS = ["jpg", "jpeg", "png", "webp", "gif", "svg"]
IMAGE_EXTENSIONS.extend([ext.upper() for ext in IMAGE_EXTENSIONS])
class UploadFileParser: class UploadFileParser:
@classmethod @classmethod
@ -38,7 +36,7 @@ class UploadFileParser:
""" """
get signed url from upload file get signed url from upload file
:param upload_file: UploadFile object :param upload_file_id: the id of UploadFile object
:return: :return:
""" """
base_url = dify_config.FILES_URL base_url = dify_config.FILES_URL

View File

@ -60,6 +60,7 @@ class CodeExecutor:
""" """
Execute code Execute code
:param language: code language :param language: code language
:param preload: the preload script
:param code: code :param code: code
:return: :return:
""" """

View File

@ -53,7 +53,7 @@ def pin_position_map(original_position_map: dict[str, int], pin_list: list[str])
""" """
Pin the items in the pin list to the beginning of the position map. Pin the items in the pin list to the beginning of the position map.
Overall logic: exclude > include > pin Overall logic: exclude > include > pin
:param position_map: the position map to be sorted and filtered :param original_position_map: the position map to be sorted and filtered
:param pin_list: the list of pins to be put at the beginning :param pin_list: the list of pins to be put at the beginning
:return: the sorted position map :return: the sorted position map
""" """

View File

@ -38,12 +38,7 @@ class ToolParameterCache:
return None return None
def set(self, parameters: dict) -> None: def set(self, parameters: dict) -> None:
""" """Cache model provider credentials."""
Cache model provider credentials.
:param credentials: provider credentials
:return:
"""
redis_client.setex(self.cache_key, 86400, json.dumps(parameters)) redis_client.setex(self.cache_key, 86400, json.dumps(parameters))
def delete(self) -> None: def delete(self) -> None:

View File

@ -187,7 +187,7 @@ class IndexingRunner:
}, },
) )
if dataset_document.doc_form == IndexType.PARENT_CHILD_INDEX: if dataset_document.doc_form == IndexType.PARENT_CHILD_INDEX:
child_chunks = document_segment.child_chunks child_chunks = document_segment.get_child_chunks()
if child_chunks: if child_chunks:
child_documents = [] child_documents = []
for child_chunk in child_chunks: for child_chunk in child_chunks:

View File

@ -1,6 +1,6 @@
# Written by YORKI MINAKO🤡, Edited by Xiaoyi # Written by YORKI MINAKO🤡, Edited by Xiaoyi
CONVERSATION_TITLE_PROMPT = """You need to decompose the user's input into "subject" and "intention" in order to accurately figure out what the user's input language actually is. CONVERSATION_TITLE_PROMPT = """You need to decompose the user's input into "subject" and "intention" in order to accurately figure out what the user's input language actually is.
Notice: the language type user use could be diverse, which can be English, Chinese, Español, Arabic, Japanese, French, and etc. Notice: the language type user use could be diverse, which can be English, Chinese, Italian, Español, Arabic, Japanese, French, and etc.
MAKE SURE your output is the SAME language as the user's input! MAKE SURE your output is the SAME language as the user's input!
Your output is restricted only to: (Input language) Intention + Subject(short as possible) Your output is restricted only to: (Input language) Intention + Subject(short as possible)
Your output MUST be a valid JSON. Your output MUST be a valid JSON.

View File

@ -38,7 +38,6 @@ class TTSModel(AIModel):
:param credentials: model credentials :param credentials: model credentials
:param voice: model timbre :param voice: model timbre
:param content_text: text content to be translated :param content_text: text content to be translated
:param streaming: output is streaming
:param user: unique user id :param user: unique user id
:return: translated audio file :return: translated audio file
""" """

View File

@ -1,170 +0,0 @@
from collections.abc import Mapping
from typing import Optional
import openai
from httpx import Timeout
from openai import OpenAI
from openai.types import ModerationCreateResponse
from core.model_runtime.entities.model_entities import ModelPropertyKey
from core.model_runtime.errors.invoke import (
InvokeAuthorizationError,
InvokeBadRequestError,
InvokeConnectionError,
InvokeError,
InvokeRateLimitError,
InvokeServerUnavailableError,
)
from core.model_runtime.errors.validate import CredentialsValidateFailedError
from core.model_runtime.model_providers.__base.moderation_model import ModerationModel
class OpenAIModerationModel(ModerationModel):
"""
Model class for OpenAI text moderation model.
"""
def _invoke(self, model: str, credentials: dict, text: str, user: Optional[str] = None) -> bool:
"""
Invoke moderation model
:param model: model name
:param credentials: model credentials
:param text: text to moderate
:param user: unique user id
:return: false if text is safe, true otherwise
"""
# transform credentials to kwargs for model instance
credentials_kwargs = self._to_credential_kwargs(credentials)
# init model client
client = OpenAI(**credentials_kwargs)
# chars per chunk
length = self._get_max_characters_per_chunk(model, credentials)
text_chunks = [text[i : i + length] for i in range(0, len(text), length)]
max_text_chunks = self._get_max_chunks(model, credentials)
chunks = [text_chunks[i : i + max_text_chunks] for i in range(0, len(text_chunks), max_text_chunks)]
for text_chunk in chunks:
moderation_result = self._moderation_invoke(model=model, client=client, texts=text_chunk)
for result in moderation_result.results:
if result.flagged is True:
return True
return False
def validate_credentials(self, model: str, credentials: dict) -> None:
"""
Validate model credentials
:param model: model name
:param credentials: model credentials
:return:
"""
try:
# transform credentials to kwargs for model instance
credentials_kwargs = self._to_credential_kwargs(credentials)
client = OpenAI(**credentials_kwargs)
# call moderation model
self._moderation_invoke(
model=model,
client=client,
texts=["ping"],
)
except Exception as ex:
raise CredentialsValidateFailedError(str(ex))
def _moderation_invoke(self, model: str, client: OpenAI, texts: list[str]) -> ModerationCreateResponse:
"""
Invoke moderation model
:param model: model name
:param client: model client
:param texts: texts to moderate
:return: false if text is safe, true otherwise
"""
# call moderation model
moderation_result = client.moderations.create(model=model, input=texts)
return moderation_result
def _get_max_characters_per_chunk(self, model: str, credentials: dict) -> int:
"""
Get max characters per chunk
:param model: model name
:param credentials: model credentials
:return: max characters per chunk
"""
model_schema = self.get_model_schema(model, credentials)
if model_schema and ModelPropertyKey.MAX_CHARACTERS_PER_CHUNK in model_schema.model_properties:
max_characters_per_chunk: int = model_schema.model_properties[ModelPropertyKey.MAX_CHARACTERS_PER_CHUNK]
return max_characters_per_chunk
return 2000
def _get_max_chunks(self, model: str, credentials: dict) -> int:
"""
Get max chunks for given embedding model
:param model: model name
:param credentials: model credentials
:return: max chunks
"""
model_schema = self.get_model_schema(model, credentials)
if model_schema and ModelPropertyKey.MAX_CHUNKS in model_schema.model_properties:
max_chunks: int = model_schema.model_properties[ModelPropertyKey.MAX_CHUNKS]
return max_chunks
return 1
def _to_credential_kwargs(self, credentials: Mapping) -> dict:
"""
Transform credentials to kwargs for model instance
:param credentials:
:return:
"""
credentials_kwargs = {
"api_key": credentials["openai_api_key"],
"timeout": Timeout(315.0, read=300.0, write=10.0, connect=5.0),
"max_retries": 1,
}
if credentials.get("openai_api_base"):
openai_api_base = credentials["openai_api_base"].rstrip("/")
credentials_kwargs["base_url"] = openai_api_base + "/v1"
if "openai_organization" in credentials:
credentials_kwargs["organization"] = credentials["openai_organization"]
return credentials_kwargs
@property
def _invoke_error_mapping(self) -> dict[type[InvokeError], list[type[Exception]]]:
"""
Map model invoke error to unified error
The key is the error type thrown to the caller
The value is the error type thrown by the model,
which needs to be converted into a unified error type for the caller.
:return: Invoke error mapping
"""
return {
InvokeConnectionError: [openai.APIConnectionError, openai.APITimeoutError],
InvokeServerUnavailableError: [openai.InternalServerError],
InvokeRateLimitError: [openai.RateLimitError],
InvokeAuthorizationError: [openai.AuthenticationError, openai.PermissionDeniedError],
InvokeBadRequestError: [
openai.BadRequestError,
openai.NotFoundError,
openai.UnprocessableEntityError,
openai.APIError,
],
}

View File

@ -1,22 +0,0 @@
- claude-3-haiku@20240307
- claude-3-opus@20240229
- claude-3-sonnet@20240229
- claude-3-5-sonnet-v2@20241022
- claude-3-5-sonnet@20240620
- gemini-1.0-pro-vision-001
- gemini-1.0-pro-002
- gemini-1.5-flash-001
- gemini-1.5-flash-002
- gemini-1.5-pro-001
- gemini-1.5-pro-002
- gemini-2.0-flash-001
- gemini-2.0-flash-exp
- gemini-2.0-flash-lite-preview-02-05
- gemini-2.0-flash-thinking-exp-01-21
- gemini-2.0-flash-thinking-exp-1219
- gemini-2.0-pro-exp-02-05
- gemini-exp-1114
- gemini-exp-1121
- gemini-exp-1206
- gemini-flash-experimental
- gemini-pro-experimental

View File

@ -1,41 +0,0 @@
model: gemini-2.0-flash-001
label:
en_US: Gemini 2.0 Flash 001
model_type: llm
features:
- agent-thought
- vision
- tool-call
- stream-tool-call
- document
- video
- audio
model_properties:
mode: chat
context_size: 1048576
parameter_rules:
- name: temperature
use_template: temperature
- name: top_p
use_template: top_p
- name: top_k
label:
zh_Hans: 取样数量
en_US: Top k
type: int
help:
zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
en_US: Only sample from the top K options for each subsequent token.
required: false
- name: max_output_tokens
use_template: max_tokens
default: 8192
min: 1
max: 8192
- name: json_schema
use_template: json_schema
pricing:
input: '0.00'
output: '0.00'
unit: '0.000001'
currency: USD

View File

@ -1,41 +0,0 @@
model: gemini-2.0-flash-lite-preview-02-05
label:
en_US: Gemini 2.0 Flash Lite Preview 0205
model_type: llm
features:
- agent-thought
- vision
- tool-call
- stream-tool-call
- document
- video
- audio
model_properties:
mode: chat
context_size: 1048576
parameter_rules:
- name: temperature
use_template: temperature
- name: top_p
use_template: top_p
- name: top_k
label:
zh_Hans: 取样数量
en_US: Top k
type: int
help:
zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
en_US: Only sample from the top K options for each subsequent token.
required: false
- name: max_output_tokens
use_template: max_tokens
default: 8192
min: 1
max: 8192
- name: json_schema
use_template: json_schema
pricing:
input: '0.00'
output: '0.00'
unit: '0.000001'
currency: USD

View File

@ -1,39 +0,0 @@
model: gemini-2.0-flash-thinking-exp-01-21
label:
en_US: Gemini 2.0 Flash Thinking Exp 0121
model_type: llm
features:
- agent-thought
- vision
- document
- video
- audio
model_properties:
mode: chat
context_size: 32767
parameter_rules:
- name: temperature
use_template: temperature
- name: top_p
use_template: top_p
- name: top_k
label:
zh_Hans: 取样数量
en_US: Top k
type: int
help:
zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
en_US: Only sample from the top K options for each subsequent token.
required: false
- name: max_output_tokens
use_template: max_tokens
default: 8192
min: 1
max: 8192
- name: json_schema
use_template: json_schema
pricing:
input: '0.00'
output: '0.00'
unit: '0.000001'
currency: USD

View File

@ -1,39 +0,0 @@
model: gemini-2.0-flash-thinking-exp-1219
label:
en_US: Gemini 2.0 Flash Thinking Exp 1219
model_type: llm
features:
- agent-thought
- vision
- document
- video
- audio
model_properties:
mode: chat
context_size: 32767
parameter_rules:
- name: temperature
use_template: temperature
- name: top_p
use_template: top_p
- name: top_k
label:
zh_Hans: 取样数量
en_US: Top k
type: int
help:
zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
en_US: Only sample from the top K options for each subsequent token.
required: false
- name: max_output_tokens
use_template: max_tokens
default: 8192
min: 1
max: 8192
- name: json_schema
use_template: json_schema
pricing:
input: '0.00'
output: '0.00'
unit: '0.000001'
currency: USD

View File

@ -1,37 +0,0 @@
model: gemini-2.0-pro-exp-02-05
label:
en_US: Gemini 2.0 Pro Exp 0205
model_type: llm
features:
- agent-thought
- document
model_properties:
mode: chat
context_size: 2000000
parameter_rules:
- name: temperature
use_template: temperature
- name: top_p
use_template: top_p
- name: top_k
label:
en_US: Top k
type: int
help:
en_US: Only sample from the top K options for each subsequent token.
required: false
- name: presence_penalty
use_template: presence_penalty
- name: frequency_penalty
use_template: frequency_penalty
- name: max_output_tokens
use_template: max_tokens
required: true
default: 8192
min: 1
max: 8192
pricing:
input: '0.00'
output: '0.00'
unit: '0.000001'
currency: USD

View File

@ -1,41 +0,0 @@
model: gemini-exp-1114
label:
en_US: Gemini exp 1114
model_type: llm
features:
- agent-thought
- vision
- tool-call
- stream-tool-call
- document
- video
- audio
model_properties:
mode: chat
context_size: 32767
parameter_rules:
- name: temperature
use_template: temperature
- name: top_p
use_template: top_p
- name: top_k
label:
zh_Hans: 取样数量
en_US: Top k
type: int
help:
zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
en_US: Only sample from the top K options for each subsequent token.
required: false
- name: max_output_tokens
use_template: max_tokens
default: 8192
min: 1
max: 8192
- name: json_schema
use_template: json_schema
pricing:
input: '0.00'
output: '0.00'
unit: '0.000001'
currency: USD

View File

@ -1,41 +0,0 @@
model: gemini-exp-1121
label:
en_US: Gemini exp 1121
model_type: llm
features:
- agent-thought
- vision
- tool-call
- stream-tool-call
- document
- video
- audio
model_properties:
mode: chat
context_size: 32767
parameter_rules:
- name: temperature
use_template: temperature
- name: top_p
use_template: top_p
- name: top_k
label:
zh_Hans: 取样数量
en_US: Top k
type: int
help:
zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
en_US: Only sample from the top K options for each subsequent token.
required: false
- name: max_output_tokens
use_template: max_tokens
default: 8192
min: 1
max: 8192
- name: json_schema
use_template: json_schema
pricing:
input: '0.00'
output: '0.00'
unit: '0.000001'
currency: USD

View File

@ -1,41 +0,0 @@
model: gemini-exp-1206
label:
en_US: Gemini exp 1206
model_type: llm
features:
- agent-thought
- vision
- tool-call
- stream-tool-call
- document
- video
- audio
model_properties:
mode: chat
context_size: 2097152
parameter_rules:
- name: temperature
use_template: temperature
- name: top_p
use_template: top_p
- name: top_k
label:
zh_Hans: 取样数量
en_US: Top k
type: int
help:
zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
en_US: Only sample from the top K options for each subsequent token.
required: false
- name: max_output_tokens
use_template: max_tokens
default: 8192
min: 1
max: 8192
- name: json_schema
use_template: json_schema
pricing:
input: '0.00'
output: '0.00'
unit: '0.000001'
currency: USD

View File

@ -1,66 +0,0 @@
model: glm-4-air-0111
label:
en_US: glm-4-air-0111
model_type: llm
features:
- multi-tool-call
- agent-thought
- stream-tool-call
model_properties:
mode: chat
context_size: 131072
parameter_rules:
- name: temperature
use_template: temperature
default: 0.95
min: 0.0
max: 1.0
help:
zh_Hans: 采样温度,控制输出的随机性,必须为正数取值范围是:(0.0,1.0],不能等于 0,默认值为 0.95 值越大,会使输出更随机,更具创造性;值越小,输出会更加稳定或确定建议您根据应用场景调整 top_p 或 temperature 参数,但不要同时调整两个参数。
en_US: Sampling temperature, controls the randomness of the output, must be a positive number. The value range is (0.0,1.0], which cannot be equal to 0. The default value is 0.95. The larger the value, the more random and creative the output will be; the smaller the value, The output will be more stable or certain. It is recommended that you adjust the top_p or temperature parameters according to the application scenario, but do not adjust both parameters at the same time.
- name: top_p
use_template: top_p
default: 0.7
help:
zh_Hans: 用温度取样的另一种方法,称为核取样取值范围是:(0.0, 1.0) 开区间,不能等于 0 或 1默认值为 0.7 模型考虑具有 top_p 概率质量tokens的结果例如0.1 意味着模型解码器只考虑从前 10% 的概率的候选集中取 tokens 建议您根据应用场景调整 top_p 或 temperature 参数,但不要同时调整两个参数。
en_US: Another method of temperature sampling is called kernel sampling. The value range is (0.0, 1.0) open interval, which cannot be equal to 0 or 1. The default value is 0.7. The model considers the results with top_p probability mass tokens. For example 0.1 means The model decoder only considers tokens from the candidate set with the top 10% probability. It is recommended that you adjust the top_p or temperature parameters according to the application scenario, but do not adjust both parameters at the same time.
- name: do_sample
label:
zh_Hans: 采样策略
en_US: Sampling strategy
type: boolean
help:
zh_Hans: do_sample 为 true 时启用采样策略do_sample 为 false 时采样策略 temperature、top_p 将不生效。默认值为 true。
en_US: When `do_sample` is set to true, the sampling strategy is enabled. When `do_sample` is set to false, the sampling strategies such as `temperature` and `top_p` will not take effect. The default value is true.
default: true
- name: max_tokens
use_template: max_tokens
default: 1024
min: 1
max: 4095
- name: web_search
type: boolean
label:
zh_Hans: 联网搜索
en_US: Web Search
default: false
help:
zh_Hans: 模型内置了互联网搜索服务,该参数控制模型在生成文本时是否参考使用互联网搜索结果。启用互联网搜索,模型会将搜索结果作为文本生成过程中的参考信息,但模型会基于其内部逻辑“自行判断”是否使用互联网搜索结果。
en_US: The model has a built-in Internet search service. This parameter controls whether the model refers to Internet search results when generating text. When Internet search is enabled, the model will use the search results as reference information in the text generation process, but the model will "judge" whether to use Internet search results based on its internal logic.
- name: response_format
label:
zh_Hans: 回复格式
en_US: Response Format
type: string
help:
zh_Hans: 指定模型必须输出的格式
en_US: specifying the format that the model must output
required: false
options:
- text
- json_object
pricing:
input: '0.0005'
output: '0.0005'
unit: '0.001'
currency: RMB

View File

@ -8,6 +8,7 @@ from datetime import timedelta
from typing import Any, Optional, Union from typing import Any, Optional, Union
from uuid import UUID, uuid4 from uuid import UUID, uuid4
from cachetools import LRUCache
from flask import current_app from flask import current_app
from sqlalchemy import select from sqlalchemy import select
from sqlalchemy.orm import Session from sqlalchemy.orm import Session
@ -70,6 +71,8 @@ provider_config_map: dict[str, dict[str, Any]] = {
class OpsTraceManager: class OpsTraceManager:
ops_trace_instances_cache: LRUCache = LRUCache(maxsize=128)
@classmethod @classmethod
def encrypt_tracing_config( def encrypt_tracing_config(
cls, tenant_id: str, tracing_provider: str, tracing_config: dict, current_trace_config=None cls, tenant_id: str, tracing_provider: str, tracing_config: dict, current_trace_config=None
@ -204,29 +207,33 @@ class OpsTraceManager:
return None return None
app_ops_trace_config = json.loads(app.tracing) if app.tracing else None app_ops_trace_config = json.loads(app.tracing) if app.tracing else None
if app_ops_trace_config is None: if app_ops_trace_config is None:
return None return None
if not app_ops_trace_config.get("enabled"):
return None
tracing_provider = app_ops_trace_config.get("tracing_provider") tracing_provider = app_ops_trace_config.get("tracing_provider")
if tracing_provider is None or tracing_provider not in provider_config_map: if tracing_provider is None or tracing_provider not in provider_config_map:
return None return None
# decrypt_token # decrypt_token
decrypt_trace_config = cls.get_decrypted_tracing_config(app_id, tracing_provider) decrypt_trace_config = cls.get_decrypted_tracing_config(app_id, tracing_provider)
if app_ops_trace_config.get("enabled"): if not decrypt_trace_config:
return None
trace_instance, config_class = ( trace_instance, config_class = (
provider_config_map[tracing_provider]["trace_instance"], provider_config_map[tracing_provider]["trace_instance"],
provider_config_map[tracing_provider]["config_class"], provider_config_map[tracing_provider]["config_class"],
) )
if not decrypt_trace_config: decrypt_trace_config_key = str(decrypt_trace_config)
return None tracing_instance = cls.ops_trace_instances_cache.get(decrypt_trace_config_key)
if tracing_instance is None:
# create new tracing_instance and update the cache if it absent
tracing_instance = trace_instance(config_class(**decrypt_trace_config)) tracing_instance = trace_instance(config_class(**decrypt_trace_config))
cls.ops_trace_instances_cache[decrypt_trace_config_key] = tracing_instance
logging.info(f"new tracing_instance for app_id: {app_id}")
return tracing_instance return tracing_instance
return None
@classmethod @classmethod
def get_app_config_through_message_id(cls, message_id: str): def get_app_config_through_message_id(cls, message_id: str):
app_model_config = None app_model_config = None

View File

@ -97,6 +97,7 @@ class RetrievalService:
all_documents=all_documents, all_documents=all_documents,
retrieval_method=retrieval_method, retrieval_method=retrieval_method,
exceptions=exceptions, exceptions=exceptions,
document_ids_filter=document_ids_filter,
) )
) )
concurrent.futures.wait(futures, timeout=30, return_when=concurrent.futures.ALL_COMPLETED) concurrent.futures.wait(futures, timeout=30, return_when=concurrent.futures.ALL_COMPLETED)
@ -222,6 +223,7 @@ class RetrievalService:
all_documents: list, all_documents: list,
retrieval_method: str, retrieval_method: str,
exceptions: list, exceptions: list,
document_ids_filter: Optional[list[str]] = None,
): ):
with flask_app.app_context(): with flask_app.app_context():
try: try:
@ -231,7 +233,9 @@ class RetrievalService:
vector_processor = Vector(dataset=dataset) vector_processor = Vector(dataset=dataset)
documents = vector_processor.search_by_full_text(cls.escape_query_for_search(query), top_k=top_k) documents = vector_processor.search_by_full_text(
cls.escape_query_for_search(query), top_k=top_k, document_ids_filter=document_ids_filter
)
if documents: if documents:
if ( if (
reranking_model reranking_model

View File

@ -102,8 +102,6 @@ class LindormVectorStore(BaseVector):
if response["errors"]: if response["errors"]:
for item in response["items"]: for item in response["items"]:
print(f"{item['index']['status']}: {item['index']['error']['type']}") print(f"{item['index']['status']}: {item['index']['error']['type']}")
else:
self.refresh()
def get_ids_by_metadata_field(self, key: str, value: str): def get_ids_by_metadata_field(self, key: str, value: str):
query: dict[str, Any] = { query: dict[str, Any] = {
@ -167,7 +165,7 @@ class LindormVectorStore(BaseVector):
if not all(isinstance(x, float) for x in query_vector): if not all(isinstance(x, float) for x in query_vector):
raise ValueError("All elements in query_vector should be floats") raise ValueError("All elements in query_vector should be floats")
top_k = kwargs.get("top_k", 10) top_k = kwargs.get("top_k", 3)
document_ids_filter = kwargs.get("document_ids_filter") document_ids_filter = kwargs.get("document_ids_filter")
filters = [] filters = []
if document_ids_filter: if document_ids_filter:
@ -210,7 +208,7 @@ class LindormVectorStore(BaseVector):
must_not = kwargs.get("must_not") must_not = kwargs.get("must_not")
should = kwargs.get("should") should = kwargs.get("should")
minimum_should_match = kwargs.get("minimum_should_match", 0) minimum_should_match = kwargs.get("minimum_should_match", 0)
top_k = kwargs.get("top_k", 10) top_k = kwargs.get("top_k", 3)
filters = kwargs.get("filter", []) filters = kwargs.get("filter", [])
document_ids_filter = kwargs.get("document_ids_filter") document_ids_filter = kwargs.get("document_ids_filter")
if document_ids_filter: if document_ids_filter:
@ -295,7 +293,7 @@ class LindormVectorStore(BaseVector):
def default_text_mapping(dimension: int, method_name: str, **kwargs: Any) -> dict: def default_text_mapping(dimension: int, method_name: str, **kwargs: Any) -> dict:
excludes_from_source = kwargs.get("excludes_from_source") excludes_from_source = kwargs.get("excludes_from_source", False)
analyzer = kwargs.get("analyzer", "ik_max_word") analyzer = kwargs.get("analyzer", "ik_max_word")
text_field = kwargs.get("text_field", Field.CONTENT_KEY.value) text_field = kwargs.get("text_field", Field.CONTENT_KEY.value)
engine = kwargs["engine"] engine = kwargs["engine"]
@ -356,12 +354,12 @@ def default_text_mapping(dimension: int, method_name: str, **kwargs: Any) -> dic
if excludes_from_source: if excludes_from_source:
# e.g. {"excludes": ["vector_field"]} # e.g. {"excludes": ["vector_field"]}
mapping["mappings"]["_source"] = {"excludes": excludes_from_source} mapping["mappings"]["_source"] = {"excludes": [vector_field]}
if using_ugc and method_name == "ivfpq": if using_ugc and method_name == "ivfpq":
mapping["settings"]["index"]["knn_routing"] = True mapping["settings"]["index"]["knn_routing"] = True
mapping["settings"]["index"]["knn.offline.construction"] = True mapping["settings"]["index"]["knn.offline.construction"] = True
elif using_ugc and method_name == "hnsw" or using_ugc and method_name == "flat": elif (using_ugc and method_name == "hnsw") or (using_ugc and method_name == "flat"):
mapping["settings"]["index"]["knn_routing"] = True mapping["settings"]["index"]["knn_routing"] = True
return mapping return mapping
@ -458,7 +456,7 @@ def default_vector_search_query(
"query": {"knn": {vector_field: {"vector": query_vector, "k": k}}}, "query": {"knn": {vector_field: {"vector": query_vector, "k": k}}},
} }
if filters is not None: if filters is not None and len(filters) > 0:
# when using filter, transform filter from List[Dict] to Dict as valid format # when using filter, transform filter from List[Dict] to Dict as valid format
filter_dict = {"bool": {"must": filters}} if len(filters) > 1 else filters[0] filter_dict = {"bool": {"must": filters}} if len(filters) > 1 else filters[0]
search_query["query"]["knn"][vector_field]["filter"] = filter_dict # filter should be Dict search_query["query"]["knn"][vector_field]["filter"] = filter_dict # filter should be Dict

View File

@ -231,8 +231,8 @@ class MilvusVector(BaseVector):
document_ids_filter = kwargs.get("document_ids_filter") document_ids_filter = kwargs.get("document_ids_filter")
filter = "" filter = ""
if document_ids_filter: if document_ids_filter:
document_ids = ", ".join(f"'{id}'" for id in document_ids_filter) document_ids = ", ".join(f'"{id}"' for id in document_ids_filter)
filter = f'metadata["document_id"] in ({document_ids})' filter = f'metadata["document_id"] in [{document_ids}]'
results = self._client.search( results = self._client.search(
collection_name=self._collection_name, collection_name=self._collection_name,
data=[query_vector], data=[query_vector],
@ -259,7 +259,7 @@ class MilvusVector(BaseVector):
filter = "" filter = ""
if document_ids_filter: if document_ids_filter:
document_ids = ", ".join(f"'{id}'" for id in document_ids_filter) document_ids = ", ".join(f"'{id}'" for id in document_ids_filter)
filter = f'metadata["document_id"] in ({document_ids})' filter = f'metadata["document_id"] in [{document_ids}]'
results = self._client.search( results = self._client.search(
collection_name=self._collection_name, collection_name=self._collection_name,

View File

@ -31,6 +31,7 @@ class OceanBaseVectorConfig(BaseModel):
user: str user: str
password: str password: str
database: str database: str
enable_hybrid_search: bool = False
@model_validator(mode="before") @model_validator(mode="before")
@classmethod @classmethod
@ -57,6 +58,7 @@ class OceanBaseVector(BaseVector):
password=self._config.password, password=self._config.password,
db_name=self._config.database, db_name=self._config.database,
) )
self._hybrid_search_enabled = self._check_hybrid_search_support() # Check if hybrid search is supported
def get_type(self) -> str: def get_type(self) -> str:
return VectorType.OCEANBASE return VectorType.OCEANBASE
@ -98,6 +100,16 @@ class OceanBaseVector(BaseVector):
columns=cols, columns=cols,
vidxs=vidx_params, vidxs=vidx_params,
) )
try:
if self._hybrid_search_enabled:
self._client.perform_raw_text_sql(f"""ALTER TABLE {self._collection_name}
ADD FULLTEXT INDEX fulltext_index_for_col_text (text) WITH PARSER ik""")
except Exception as e:
raise Exception(
"Failed to add fulltext index to the target table, your OceanBase version must be 4.3.5.1 or above "
+ "to support fulltext index and vector index in the same table",
e,
)
vals = [] vals = []
params = self._client.perform_raw_text_sql("SHOW PARAMETERS LIKE '%ob_vector_memory_limit_percentage%'") params = self._client.perform_raw_text_sql("SHOW PARAMETERS LIKE '%ob_vector_memory_limit_percentage%'")
for row in params: for row in params:
@ -116,6 +128,27 @@ class OceanBaseVector(BaseVector):
) )
redis_client.set(collection_exist_cache_key, 1, ex=3600) redis_client.set(collection_exist_cache_key, 1, ex=3600)
def _check_hybrid_search_support(self) -> bool:
"""
Check if the current OceanBase version supports hybrid search.
Returns True if the version is >= 4.3.5.1, otherwise False.
"""
if not self._config.enable_hybrid_search:
return False
try:
from packaging import version
# return OceanBase_CE 4.3.5.1 (r101000042025031818-bxxxx) (Built Mar 18 2025 18:13:36)
result = self._client.perform_raw_text_sql("SELECT @@version_comment AS version")
ob_full_version = result.fetchone()[0]
ob_version = ob_full_version.split()[1]
logger.debug("Current OceanBase version is %s", ob_version)
return version.parse(ob_version).base_version >= version.parse("4.3.5.1").base_version
except Exception as e:
logger.warning(f"Failed to check OceanBase version: {str(e)}. Disabling hybrid search.")
return False
def add_texts(self, documents: list[Document], embeddings: list[list[float]], **kwargs): def add_texts(self, documents: list[Document], embeddings: list[list[float]], **kwargs):
ids = self._get_uuids(documents) ids = self._get_uuids(documents)
for id, doc, emb in zip(ids, documents, embeddings): for id, doc, emb in zip(ids, documents, embeddings):
@ -130,7 +163,7 @@ class OceanBaseVector(BaseVector):
) )
def text_exists(self, id: str) -> bool: def text_exists(self, id: str) -> bool:
cur = self._client.get(table_name=self._collection_name, id=id) cur = self._client.get(table_name=self._collection_name, ids=id)
return bool(cur.rowcount != 0) return bool(cur.rowcount != 0)
def delete_by_ids(self, ids: list[str]) -> None: def delete_by_ids(self, ids: list[str]) -> None:
@ -139,9 +172,12 @@ class OceanBaseVector(BaseVector):
self._client.delete(table_name=self._collection_name, ids=ids) self._client.delete(table_name=self._collection_name, ids=ids)
def get_ids_by_metadata_field(self, key: str, value: str) -> list[str]: def get_ids_by_metadata_field(self, key: str, value: str) -> list[str]:
from sqlalchemy import text
cur = self._client.get( cur = self._client.get(
table_name=self._collection_name, table_name=self._collection_name,
where_clause=f"metadata->>'$.{key}' = '{value}'", ids=None,
where_clause=[text(f"metadata->>'$.{key}' = '{value}'")],
output_column_name=["id"], output_column_name=["id"],
) )
return [row[0] for row in cur] return [row[0] for row in cur]
@ -151,19 +187,65 @@ class OceanBaseVector(BaseVector):
self.delete_by_ids(ids) self.delete_by_ids(ids)
def search_by_full_text(self, query: str, **kwargs: Any) -> list[Document]: def search_by_full_text(self, query: str, **kwargs: Any) -> list[Document]:
if not self._hybrid_search_enabled:
return []
try:
top_k = kwargs.get("top_k", 5)
if not isinstance(top_k, int) or top_k <= 0:
raise ValueError("top_k must be a positive integer")
document_ids_filter = kwargs.get("document_ids_filter")
where_clause = ""
if document_ids_filter:
document_ids = ", ".join(f"'{id}'" for id in document_ids_filter)
where_clause = f" AND metadata->>'$.document_id' IN ({document_ids})"
full_sql = f"""SELECT metadata, text, MATCH (text) AGAINST (:query) AS score
FROM {self._collection_name}
WHERE MATCH (text) AGAINST (:query) > 0
{where_clause}
ORDER BY score DESC
LIMIT {top_k}"""
with self._client.engine.connect() as conn:
with conn.begin():
from sqlalchemy import text
result = conn.execute(text(full_sql), {"query": query})
rows = result.fetchall()
docs = []
for row in rows:
metadata_str, _text, score = row
try:
metadata = json.loads(metadata_str)
except json.JSONDecodeError:
print(f"Invalid JSON metadata: {metadata_str}")
metadata = {}
metadata["score"] = score
docs.append(Document(page_content=_text, metadata=metadata))
return docs
except Exception as e:
logger.warning(f"Failed to fulltext search: {str(e)}.")
return [] return []
def search_by_vector(self, query_vector: list[float], **kwargs: Any) -> list[Document]: def search_by_vector(self, query_vector: list[float], **kwargs: Any) -> list[Document]:
document_ids_filter = kwargs.get("document_ids_filter") document_ids_filter = kwargs.get("document_ids_filter")
where_clause = None _where_clause = None
if document_ids_filter: if document_ids_filter:
document_ids = ", ".join(f"'{id}'" for id in document_ids_filter) document_ids = ", ".join(f"'{id}'" for id in document_ids_filter)
where_clause = f"metadata->>'$.document_id' in ({document_ids})" where_clause = f"metadata->>'$.document_id' in ({document_ids})"
from sqlalchemy import text
_where_clause = [text(where_clause)]
ef_search = kwargs.get("ef_search", self._hnsw_ef_search) ef_search = kwargs.get("ef_search", self._hnsw_ef_search)
if ef_search != self._hnsw_ef_search: if ef_search != self._hnsw_ef_search:
self._client.set_ob_hnsw_ef_search(ef_search) self._client.set_ob_hnsw_ef_search(ef_search)
self._hnsw_ef_search = ef_search self._hnsw_ef_search = ef_search
topk = kwargs.get("top_k", 10) topk = kwargs.get("top_k", 10)
try:
cur = self._client.ann_search( cur = self._client.ann_search(
table_name=self._collection_name, table_name=self._collection_name,
vec_column_name="vector", vec_column_name="vector",
@ -172,15 +254,17 @@ class OceanBaseVector(BaseVector):
distance_func=func.l2_distance, distance_func=func.l2_distance,
output_column_names=["text", "metadata"], output_column_names=["text", "metadata"],
with_dist=True, with_dist=True,
where_clause=where_clause, where_clause=_where_clause,
) )
except Exception as e:
raise Exception("Failed to search by vector. ", e)
docs = [] docs = []
for text, metadata, distance in cur: for _text, metadata, distance in cur:
metadata = json.loads(metadata) metadata = json.loads(metadata)
metadata["score"] = 1 - distance / math.sqrt(2) metadata["score"] = 1 - distance / math.sqrt(2)
docs.append( docs.append(
Document( Document(
page_content=text, page_content=_text,
metadata=metadata, metadata=metadata,
) )
) )
@ -212,5 +296,6 @@ class OceanBaseVectorFactory(AbstractVectorFactory):
user=dify_config.OCEANBASE_VECTOR_USER or "", user=dify_config.OCEANBASE_VECTOR_USER or "",
password=(dify_config.OCEANBASE_VECTOR_PASSWORD or ""), password=(dify_config.OCEANBASE_VECTOR_PASSWORD or ""),
database=dify_config.OCEANBASE_VECTOR_DATABASE or "", database=dify_config.OCEANBASE_VECTOR_DATABASE or "",
enable_hybrid_search=dify_config.OCEANBASE_ENABLE_HYBRID_SEARCH or False,
), ),
) )

View File

@ -25,6 +25,7 @@ class OpenGaussConfig(BaseModel):
database: str database: str
min_connection: int min_connection: int
max_connection: int max_connection: int
enable_pq: bool = False # Enable PQ acceleration
@model_validator(mode="before") @model_validator(mode="before")
@classmethod @classmethod
@ -57,6 +58,11 @@ CREATE TABLE IF NOT EXISTS {table_name} (
); );
""" """
SQL_CREATE_INDEX_PQ = """
CREATE INDEX IF NOT EXISTS embedding_{table_name}_pq_idx ON {table_name}
USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64, enable_pq=on, pq_m={pq_m});
"""
SQL_CREATE_INDEX = """ SQL_CREATE_INDEX = """
CREATE INDEX IF NOT EXISTS embedding_cosine_{table_name}_idx ON {table_name} CREATE INDEX IF NOT EXISTS embedding_cosine_{table_name}_idx ON {table_name}
USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64); USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);
@ -68,6 +74,7 @@ class OpenGauss(BaseVector):
super().__init__(collection_name) super().__init__(collection_name)
self.pool = self._create_connection_pool(config) self.pool = self._create_connection_pool(config)
self.table_name = f"embedding_{collection_name}" self.table_name = f"embedding_{collection_name}"
self.pq_enabled = config.enable_pq
def get_type(self) -> str: def get_type(self) -> str:
return VectorType.OPENGAUSS return VectorType.OPENGAUSS
@ -97,7 +104,26 @@ class OpenGauss(BaseVector):
def create(self, texts: list[Document], embeddings: list[list[float]], **kwargs): def create(self, texts: list[Document], embeddings: list[list[float]], **kwargs):
dimension = len(embeddings[0]) dimension = len(embeddings[0])
self._create_collection(dimension) self._create_collection(dimension)
return self.add_texts(texts, embeddings) self.add_texts(texts, embeddings)
self._create_index(dimension)
def _create_index(self, dimension: int):
index_cache_key = f"vector_index_{self._collection_name}"
lock_name = f"{index_cache_key}_lock"
with redis_client.lock(lock_name, timeout=60):
index_exist_cache_key = f"vector_index_{self._collection_name}"
if redis_client.get(index_exist_cache_key):
return
with self._get_cursor() as cur:
if dimension <= 2000:
if self.pq_enabled:
cur.execute(SQL_CREATE_INDEX_PQ.format(table_name=self.table_name, pq_m=int(dimension / 4)))
cur.execute("SET hnsw_earlystop_threshold = 320")
if not self.pq_enabled:
cur.execute(SQL_CREATE_INDEX.format(table_name=self.table_name))
redis_client.set(index_exist_cache_key, 1, ex=3600)
def add_texts(self, documents: list[Document], embeddings: list[list[float]], **kwargs): def add_texts(self, documents: list[Document], embeddings: list[list[float]], **kwargs):
values = [] values = []
@ -151,7 +177,6 @@ class OpenGauss(BaseVector):
Search the nearest neighbors to a vector. Search the nearest neighbors to a vector.
:param query_vector: The input vector to search for similar items. :param query_vector: The input vector to search for similar items.
:param top_k: The number of nearest neighbors to return, default is 5.
:return: List of Documents that are nearest to the query vector. :return: List of Documents that are nearest to the query vector.
""" """
top_k = kwargs.get("top_k", 4) top_k = kwargs.get("top_k", 4)
@ -211,8 +236,6 @@ class OpenGauss(BaseVector):
with self._get_cursor() as cur: with self._get_cursor() as cur:
cur.execute(SQL_CREATE_TABLE.format(table_name=self.table_name, dimension=dimension)) cur.execute(SQL_CREATE_TABLE.format(table_name=self.table_name, dimension=dimension))
if dimension <= 2000:
cur.execute(SQL_CREATE_INDEX.format(table_name=self.table_name))
redis_client.set(collection_exist_cache_key, 1, ex=3600) redis_client.set(collection_exist_cache_key, 1, ex=3600)
@ -236,5 +259,6 @@ class OpenGaussFactory(AbstractVectorFactory):
database=dify_config.OPENGAUSS_DATABASE or "dify", database=dify_config.OPENGAUSS_DATABASE or "dify",
min_connection=dify_config.OPENGAUSS_MIN_CONNECTION, min_connection=dify_config.OPENGAUSS_MIN_CONNECTION,
max_connection=dify_config.OPENGAUSS_MAX_CONNECTION, max_connection=dify_config.OPENGAUSS_MAX_CONNECTION,
enable_pq=dify_config.OPENGAUSS_ENABLE_PQ or False,
), ),
) )

View File

@ -197,7 +197,6 @@ class OracleVector(BaseVector):
Search the nearest neighbors to a vector. Search the nearest neighbors to a vector.
:param query_vector: The input vector to search for similar items. :param query_vector: The input vector to search for similar items.
:param top_k: The number of nearest neighbors to return, default is 5.
:return: List of Documents that are nearest to the query vector. :return: List of Documents that are nearest to the query vector.
""" """
top_k = kwargs.get("top_k", 4) top_k = kwargs.get("top_k", 4)

View File

@ -167,7 +167,6 @@ class PGVector(BaseVector):
Search the nearest neighbors to a vector. Search the nearest neighbors to a vector.
:param query_vector: The input vector to search for similar items. :param query_vector: The input vector to search for similar items.
:param top_k: The number of nearest neighbors to return, default is 5.
:return: List of Documents that are nearest to the query vector. :return: List of Documents that are nearest to the query vector.
""" """
top_k = kwargs.get("top_k", 4) top_k = kwargs.get("top_k", 4)
@ -177,7 +176,7 @@ class PGVector(BaseVector):
where_clause = "" where_clause = ""
if document_ids_filter: if document_ids_filter:
document_ids = ", ".join(f"'{id}'" for id in document_ids_filter) document_ids = ", ".join(f"'{id}'" for id in document_ids_filter)
where_clause = f" WHERE metadata->>'document_id' in ({document_ids}) " where_clause = f" WHERE meta->>'document_id' in ({document_ids}) "
with self._get_cursor() as cur: with self._get_cursor() as cur:
cur.execute( cur.execute(
@ -205,7 +204,7 @@ class PGVector(BaseVector):
where_clause = "" where_clause = ""
if document_ids_filter: if document_ids_filter:
document_ids = ", ".join(f"'{id}'" for id in document_ids_filter) document_ids = ", ".join(f"'{id}'" for id in document_ids_filter)
where_clause = f" AND metadata->>'document_id' in ({document_ids}) " where_clause = f" AND meta->>'document_id' in ({document_ids}) "
if self.pg_bigm: if self.pg_bigm:
cur.execute("SET pg_bigm.similarity_limit TO 0.000001") cur.execute("SET pg_bigm.similarity_limit TO 0.000001")
cur.execute( cur.execute(

View File

@ -0,0 +1,295 @@
import json
import logging
from typing import Any, Optional
import tablestore # type: ignore
from pydantic import BaseModel, model_validator
from configs import dify_config
from core.rag.datasource.vdb.field import Field
from core.rag.datasource.vdb.vector_base import BaseVector
from core.rag.datasource.vdb.vector_factory import AbstractVectorFactory
from core.rag.datasource.vdb.vector_type import VectorType
from core.rag.embedding.embedding_base import Embeddings
from core.rag.models.document import Document
from extensions.ext_redis import redis_client
from models import Dataset
class TableStoreConfig(BaseModel):
access_key_id: Optional[str] = None
access_key_secret: Optional[str] = None
instance_name: Optional[str] = None
endpoint: Optional[str] = None
@model_validator(mode="before")
@classmethod
def validate_config(cls, values: dict) -> dict:
if not values["access_key_id"]:
raise ValueError("config ACCESS_KEY_ID is required")
if not values["access_key_secret"]:
raise ValueError("config ACCESS_KEY_SECRET is required")
if not values["instance_name"]:
raise ValueError("config INSTANCE_NAME is required")
if not values["endpoint"]:
raise ValueError("config ENDPOINT is required")
return values
class TableStoreVector(BaseVector):
def __init__(self, collection_name: str, config: TableStoreConfig):
super().__init__(collection_name)
self._config = config
self._tablestore_client = tablestore.OTSClient(
config.endpoint,
config.access_key_id,
config.access_key_secret,
config.instance_name,
)
self._table_name = f"{collection_name}"
self._index_name = f"{collection_name}_idx"
self._tags_field = f"{Field.METADATA_KEY.value}_tags"
def get_type(self) -> str:
return VectorType.TABLESTORE
def create(self, texts: list[Document], embeddings: list[list[float]], **kwargs):
dimension = len(embeddings[0])
self._create_collection(dimension)
self.add_texts(documents=texts, embeddings=embeddings, **kwargs)
def add_texts(self, documents: list[Document], embeddings: list[list[float]], **kwargs):
uuids = self._get_uuids(documents)
for i in range(len(documents)):
self._write_row(
primary_key=uuids[i],
attributes={
Field.CONTENT_KEY.value: documents[i].page_content,
Field.VECTOR.value: embeddings[i],
Field.METADATA_KEY.value: documents[i].metadata,
},
)
return uuids
def text_exists(self, id: str) -> bool:
_, return_row, _ = self._tablestore_client.get_row(
table_name=self._table_name, primary_key=[("id", id)], columns_to_get=["id"]
)
return return_row is not None
def delete_by_ids(self, ids: list[str]) -> None:
if not ids:
return
for id in ids:
self._delete_row(id=id)
def get_ids_by_metadata_field(self, key: str, value: str):
return self._search_by_metadata(key, value)
def delete_by_metadata_field(self, key: str, value: str) -> None:
ids = self.get_ids_by_metadata_field(key, value)
self.delete_by_ids(ids)
def search_by_vector(self, query_vector: list[float], **kwargs: Any) -> list[Document]:
top_k = kwargs.get("top_k", 4)
return self._search_by_vector(query_vector, top_k)
def search_by_full_text(self, query: str, **kwargs: Any) -> list[Document]:
return self._search_by_full_text(query)
def delete(self) -> None:
self._delete_table_if_exist()
def _create_collection(self, dimension: int):
lock_name = f"vector_indexing_lock_{self._collection_name}"
with redis_client.lock(lock_name, timeout=20):
collection_exist_cache_key = f"vector_indexing_{self._collection_name}"
if redis_client.get(collection_exist_cache_key):
logging.info(f"Collection {self._collection_name} already exists.")
return
self._create_table_if_not_exist()
self._create_search_index_if_not_exist(dimension)
redis_client.set(collection_exist_cache_key, 1, ex=3600)
def _create_table_if_not_exist(self) -> None:
table_list = self._tablestore_client.list_table()
if self._table_name in table_list:
logging.info("Tablestore system table[%s] already exists", self._table_name)
return None
schema_of_primary_key = [("id", "STRING")]
table_meta = tablestore.TableMeta(self._table_name, schema_of_primary_key)
table_options = tablestore.TableOptions()
reserved_throughput = tablestore.ReservedThroughput(tablestore.CapacityUnit(0, 0))
self._tablestore_client.create_table(table_meta, table_options, reserved_throughput)
logging.info("Tablestore create table[%s] successfully.", self._table_name)
def _create_search_index_if_not_exist(self, dimension: int) -> None:
search_index_list = self._tablestore_client.list_search_index(table_name=self._table_name)
if self._index_name in [t[1] for t in search_index_list]:
logging.info("Tablestore system index[%s] already exists", self._index_name)
return None
field_schemas = [
tablestore.FieldSchema(
Field.CONTENT_KEY.value,
tablestore.FieldType.TEXT,
analyzer=tablestore.AnalyzerType.MAXWORD,
index=True,
enable_sort_and_agg=False,
store=False,
),
tablestore.FieldSchema(
Field.VECTOR.value,
tablestore.FieldType.VECTOR,
vector_options=tablestore.VectorOptions(
data_type=tablestore.VectorDataType.VD_FLOAT_32,
dimension=dimension,
metric_type=tablestore.VectorMetricType.VM_COSINE,
),
),
tablestore.FieldSchema(
Field.METADATA_KEY.value,
tablestore.FieldType.KEYWORD,
index=True,
store=False,
),
tablestore.FieldSchema(
self._tags_field,
tablestore.FieldType.KEYWORD,
index=True,
store=False,
is_array=True,
),
]
index_meta = tablestore.SearchIndexMeta(field_schemas)
self._tablestore_client.create_search_index(self._table_name, self._index_name, index_meta)
logging.info("Tablestore create system index[%s] successfully.", self._index_name)
def _delete_table_if_exist(self):
search_index_list = self._tablestore_client.list_search_index(table_name=self._table_name)
for resp_tuple in search_index_list:
self._tablestore_client.delete_search_index(resp_tuple[0], resp_tuple[1])
logging.info("Tablestore delete index[%s] successfully.", self._index_name)
self._tablestore_client.delete_table(self._table_name)
logging.info("Tablestore delete system table[%s] successfully.", self._index_name)
def _delete_search_index(self) -> None:
self._tablestore_client.delete_search_index(self._table_name, self._index_name)
logging.info("Tablestore delete index[%s] successfully.", self._index_name)
def _write_row(self, primary_key: str, attributes: dict[str, Any]) -> None:
pk = [("id", primary_key)]
tags = []
for key, value in attributes[Field.METADATA_KEY.value].items():
tags.append(str(key) + "=" + str(value))
attribute_columns = [
(Field.CONTENT_KEY.value, attributes[Field.CONTENT_KEY.value]),
(Field.VECTOR.value, json.dumps(attributes[Field.VECTOR.value])),
(
Field.METADATA_KEY.value,
json.dumps(attributes[Field.METADATA_KEY.value]),
),
(self._tags_field, json.dumps(tags)),
]
row = tablestore.Row(pk, attribute_columns)
self._tablestore_client.put_row(self._table_name, row)
def _delete_row(self, id: str) -> None:
primary_key = [("id", id)]
row = tablestore.Row(primary_key)
self._tablestore_client.delete_row(self._table_name, row, None)
logging.info("Tablestore delete row successfully. id:%s", id)
def _search_by_metadata(self, key: str, value: str) -> list[str]:
query = tablestore.SearchQuery(
tablestore.TermQuery(self._tags_field, str(key) + "=" + str(value)),
limit=100,
get_total_count=False,
)
search_response = self._tablestore_client.search(
table_name=self._table_name,
index_name=self._index_name,
search_query=query,
columns_to_get=tablestore.ColumnsToGet(return_type=tablestore.ColumnReturnType.ALL_FROM_INDEX),
)
return [row[0][0][1] for row in search_response.rows]
def _search_by_vector(self, query_vector: list[float], top_k: int) -> list[Document]:
ots_query = tablestore.KnnVectorQuery(
field_name=Field.VECTOR.value,
top_k=top_k,
float32_query_vector=query_vector,
)
sort = tablestore.Sort(sorters=[tablestore.ScoreSort(sort_order=tablestore.SortOrder.DESC)])
search_query = tablestore.SearchQuery(ots_query, limit=top_k, get_total_count=False, sort=sort)
search_response = self._tablestore_client.search(
table_name=self._table_name,
index_name=self._index_name,
search_query=search_query,
columns_to_get=tablestore.ColumnsToGet(return_type=tablestore.ColumnReturnType.ALL_FROM_INDEX),
)
logging.info(
"Tablestore search successfully. request_id:%s",
search_response.request_id,
)
return self._to_query_result(search_response)
def _to_query_result(self, search_response: tablestore.SearchResponse) -> list[Document]:
documents = []
for row in search_response.rows:
documents.append(
Document(
page_content=row[1][2][1],
vector=json.loads(row[1][3][1]),
metadata=json.loads(row[1][0][1]),
)
)
return documents
def _search_by_full_text(self, query: str) -> list[Document]:
search_query = tablestore.SearchQuery(
query=tablestore.MatchQuery(text=query, field_name=Field.CONTENT_KEY.value),
sort=tablestore.Sort(sorters=[tablestore.ScoreSort(sort_order=tablestore.SortOrder.DESC)]),
limit=100,
)
search_response = self._tablestore_client.search(
table_name=self._table_name,
index_name=self._index_name,
search_query=search_query,
columns_to_get=tablestore.ColumnsToGet(return_type=tablestore.ColumnReturnType.ALL_FROM_INDEX),
)
return self._to_query_result(search_response)
class TableStoreVectorFactory(AbstractVectorFactory):
def init_vector(self, dataset: Dataset, attributes: list, embeddings: Embeddings) -> TableStoreVector:
if dataset.index_struct_dict:
class_prefix: str = dataset.index_struct_dict["vector_store"]["class_prefix"]
collection_name = class_prefix
else:
dataset_id = dataset.id
collection_name = Dataset.gen_collection_name_by_id(dataset_id)
dataset.index_struct = json.dumps(self.gen_index_struct_dict(VectorType.TABLESTORE, collection_name))
return TableStoreVector(
collection_name=collection_name,
config=TableStoreConfig(
endpoint=dify_config.TABLESTORE_ENDPOINT,
instance_name=dify_config.TABLESTORE_INSTANCE_NAME,
access_key_id=dify_config.TABLESTORE_ACCESS_KEY_ID,
access_key_secret=dify_config.TABLESTORE_ACCESS_KEY_SECRET,
),
)

View File

@ -1,8 +1,9 @@
import json import json
import math
from typing import Any, Optional from typing import Any, Optional
from pydantic import BaseModel from pydantic import BaseModel
from tcvectordb import VectorDBClient # type: ignore from tcvectordb import RPCVectorDBClient, VectorDBException # type: ignore
from tcvectordb.model import document, enum # type: ignore from tcvectordb.model import document, enum # type: ignore
from tcvectordb.model import index as vdb_index # type: ignore from tcvectordb.model import index as vdb_index # type: ignore
from tcvectordb.model.document import Filter # type: ignore from tcvectordb.model.document import Filter # type: ignore
@ -27,6 +28,7 @@ class TencentConfig(BaseModel):
metric_type: str = "L2" metric_type: str = "L2"
shard: int = 1 shard: int = 1
replicas: int = 2 replicas: int = 2
max_upsert_batch_size: int = 128
def to_tencent_params(self): def to_tencent_params(self):
return {"url": self.url, "username": self.username, "key": self.api_key, "timeout": self.timeout} return {"url": self.url, "username": self.username, "key": self.api_key, "timeout": self.timeout}
@ -41,19 +43,10 @@ class TencentVector(BaseVector):
def __init__(self, collection_name: str, config: TencentConfig): def __init__(self, collection_name: str, config: TencentConfig):
super().__init__(collection_name) super().__init__(collection_name)
self._client_config = config self._client_config = config
self._client = VectorDBClient(**self._client_config.to_tencent_params()) self._client = RPCVectorDBClient(**self._client_config.to_tencent_params())
self._db = self._init_database()
def _init_database(self): def _init_database(self):
exists = False return self._client.create_database_if_not_exists(database_name=self._client_config.database)
for db in self._client.list_databases():
if db.database_name == self._client_config.database:
exists = True
break
if exists:
return self._client.database(self._client_config.database)
else:
return self._client.create_database(database_name=self._client_config.database)
def get_type(self) -> str: def get_type(self) -> str:
return VectorType.TENCENT return VectorType.TENCENT
@ -62,8 +55,11 @@ class TencentVector(BaseVector):
return {"type": self.get_type(), "vector_store": {"class_prefix": self._collection_name}} return {"type": self.get_type(), "vector_store": {"class_prefix": self._collection_name}}
def _has_collection(self) -> bool: def _has_collection(self) -> bool:
collections = self._db.list_collections() return bool(
return any(collection.collection_name == self._collection_name for collection in collections) self._client.exists_collection(
database_name=self._client_config.database, collection_name=self.collection_name
)
)
def _create_collection(self, dimension: int) -> None: def _create_collection(self, dimension: int) -> None:
lock_name = "vector_indexing_lock_{}".format(self._collection_name) lock_name = "vector_indexing_lock_{}".format(self._collection_name)
@ -75,7 +71,6 @@ class TencentVector(BaseVector):
if self._has_collection(): if self._has_collection():
return return
self.delete()
index_type = None index_type = None
for k, v in enum.IndexType.__members__.items(): for k, v in enum.IndexType.__members__.items():
if k == self._client_config.index_type: if k == self._client_config.index_type:
@ -89,6 +84,31 @@ class TencentVector(BaseVector):
if metric_type is None: if metric_type is None:
raise ValueError("unsupported metric_type") raise ValueError("unsupported metric_type")
params = vdb_index.HNSWParams(m=16, efconstruction=200) params = vdb_index.HNSWParams(m=16, efconstruction=200)
index = vdb_index.Index(
vdb_index.FilterIndex(self.field_id, enum.FieldType.String, enum.IndexType.PRIMARY_KEY),
vdb_index.VectorIndex(
self.field_vector,
dimension,
index_type,
metric_type,
params,
),
vdb_index.FilterIndex(self.field_text, enum.FieldType.String, enum.IndexType.FILTER),
vdb_index.FilterIndex(self.field_metadata, enum.FieldType.Json, enum.IndexType.FILTER),
)
try:
self._client.create_collection(
database_name=self._client_config.database,
collection_name=self._collection_name,
shard=self._client_config.shard,
replicas=self._client_config.replicas,
description="Collection for Dify",
index=index,
)
except VectorDBException as e:
if "fieldType:json" not in e.message:
raise e
# vdb version not support json, use string
index = vdb_index.Index( index = vdb_index.Index(
vdb_index.FilterIndex(self.field_id, enum.FieldType.String, enum.IndexType.PRIMARY_KEY), vdb_index.FilterIndex(self.field_id, enum.FieldType.String, enum.IndexType.PRIMARY_KEY),
vdb_index.VectorIndex( vdb_index.VectorIndex(
@ -101,9 +121,9 @@ class TencentVector(BaseVector):
vdb_index.FilterIndex(self.field_text, enum.FieldType.String, enum.IndexType.FILTER), vdb_index.FilterIndex(self.field_text, enum.FieldType.String, enum.IndexType.FILTER),
vdb_index.FilterIndex(self.field_metadata, enum.FieldType.String, enum.IndexType.FILTER), vdb_index.FilterIndex(self.field_metadata, enum.FieldType.String, enum.IndexType.FILTER),
) )
self._client.create_collection(
self._db.create_collection( database_name=self._client_config.database,
name=self._collection_name, collection_name=self._collection_name,
shard=self._client_config.shard, shard=self._client_config.shard,
replicas=self._client_config.replicas, replicas=self._client_config.replicas,
description="Collection for Dify", description="Collection for Dify",
@ -119,8 +139,13 @@ class TencentVector(BaseVector):
texts = [doc.page_content for doc in documents] texts = [doc.page_content for doc in documents]
metadatas = [doc.metadata for doc in documents] metadatas = [doc.metadata for doc in documents]
total_count = len(embeddings) total_count = len(embeddings)
batch_size = self._client_config.max_upsert_batch_size
batch = math.ceil(total_count / batch_size)
for j in range(batch):
docs = [] docs = []
for i in range(0, total_count): start_idx = j * batch_size
end_idx = min(total_count, (j + 1) * batch_size)
for i in range(start_idx, end_idx):
if metadatas is None: if metadatas is None:
continue continue
metadata = metadatas[i] or {} metadata = metadatas[i] or {}
@ -128,13 +153,20 @@ class TencentVector(BaseVector):
id=metadata.get("doc_id"), id=metadata.get("doc_id"),
vector=embeddings[i], vector=embeddings[i],
text=texts[i], text=texts[i],
metadata=json.dumps(metadata), metadata=metadata,
) )
docs.append(doc) docs.append(doc)
self._db.collection(self._collection_name).upsert(docs, self._client_config.timeout) self._client.upsert(
database_name=self._client_config.database,
collection_name=self.collection_name,
documents=docs,
timeout=self._client_config.timeout,
)
def text_exists(self, id: str) -> bool: def text_exists(self, id: str) -> bool:
docs = self._db.collection(self._collection_name).query(document_ids=[id]) docs = self._client.query(
database_name=self._client_config.database, collection_name=self.collection_name, document_ids=[id]
)
if docs and len(docs) > 0: if docs and len(docs) > 0:
return True return True
return False return False
@ -142,17 +174,25 @@ class TencentVector(BaseVector):
def delete_by_ids(self, ids: list[str]) -> None: def delete_by_ids(self, ids: list[str]) -> None:
if not ids: if not ids:
return return
self._db.collection(self._collection_name).delete(document_ids=ids) self._client.delete(
database_name=self._client_config.database, collection_name=self.collection_name, document_ids=ids
)
def delete_by_metadata_field(self, key: str, value: str) -> None: def delete_by_metadata_field(self, key: str, value: str) -> None:
self._db.collection(self._collection_name).delete(filter=Filter(Filter.In(f"metadata.{key}", [value]))) self._client.delete(
database_name=self._client_config.database,
collection_name=self.collection_name,
filter=Filter(Filter.In(f"metadata.{key}", [value])),
)
def search_by_vector(self, query_vector: list[float], **kwargs: Any) -> list[Document]: def search_by_vector(self, query_vector: list[float], **kwargs: Any) -> list[Document]:
document_ids_filter = kwargs.get("document_ids_filter") document_ids_filter = kwargs.get("document_ids_filter")
filter = None filter = None
if document_ids_filter: if document_ids_filter:
filter = Filter(Filter.In("metadata.document_id", document_ids_filter)) filter = Filter(Filter.In("metadata.document_id", document_ids_filter))
res = self._db.collection(self._collection_name).search( res = self._client.search(
database_name=self._client_config.database,
collection_name=self.collection_name,
vectors=[query_vector], vectors=[query_vector],
filter=filter, filter=filter,
params=document.HNSWSearchParams(ef=kwargs.get("ef", 10)), params=document.HNSWSearchParams(ef=kwargs.get("ef", 10)),
@ -173,8 +213,6 @@ class TencentVector(BaseVector):
for result in res[0]: for result in res[0]:
meta = result.get(self.field_metadata) meta = result.get(self.field_metadata)
if meta is not None:
meta = json.loads(meta)
score = 1 - result.get("score", 0.0) score = 1 - result.get("score", 0.0)
if score > score_threshold: if score > score_threshold:
meta["score"] = score meta["score"] = score
@ -184,7 +222,7 @@ class TencentVector(BaseVector):
return docs return docs
def delete(self) -> None: def delete(self) -> None:
self._db.drop_collection(name=self._collection_name) self._client.drop_collection(database_name=self._client_config.database, collection_name=self.collection_name)
class TencentVectorFactory(AbstractVectorFactory): class TencentVectorFactory(AbstractVectorFactory):

View File

@ -22,7 +22,6 @@ class TidbService:
:param iam_url: The URL of the TiDB Cloud IAM API (required). :param iam_url: The URL of the TiDB Cloud IAM API (required).
:param public_key: The public key for the API (required). :param public_key: The public key for the API (required).
:param private_key: The private key for the API (required). :param private_key: The private key for the API (required).
:param display_name: The user-friendly display name of the cluster (required).
:param region: The region where the cluster will be created (required). :param region: The region where the cluster will be created (required).
:return: The response from the API. :return: The response from the API.
@ -149,13 +148,12 @@ class TidbService:
): ):
""" """
Update the status of a new TiDB Serverless cluster. Update the status of a new TiDB Serverless cluster.
:param tidb_serverless_list: The TiDB serverless list (required).
:param project_id: The project ID of the TiDB Cloud project (required). :param project_id: The project ID of the TiDB Cloud project (required).
:param api_url: The URL of the TiDB Cloud API (required). :param api_url: The URL of the TiDB Cloud API (required).
:param iam_url: The URL of the TiDB Cloud IAM API (required). :param iam_url: The URL of the TiDB Cloud IAM API (required).
:param public_key: The public key for the API (required). :param public_key: The public key for the API (required).
:param private_key: The private key for the API (required). :param private_key: The private key for the API (required).
:param display_name: The user-friendly display name of the cluster (required).
:param region: The region where the cluster will be created (required).
:return: The response from the API. :return: The response from the API.
""" """
@ -186,12 +184,12 @@ class TidbService:
) -> list[dict]: ) -> list[dict]:
""" """
Creates a new TiDB Serverless cluster. Creates a new TiDB Serverless cluster.
:param batch_size: The batch size (required).
:param project_id: The project ID of the TiDB Cloud project (required). :param project_id: The project ID of the TiDB Cloud project (required).
:param api_url: The URL of the TiDB Cloud API (required). :param api_url: The URL of the TiDB Cloud API (required).
:param iam_url: The URL of the TiDB Cloud IAM API (required). :param iam_url: The URL of the TiDB Cloud IAM API (required).
:param public_key: The public key for the API (required). :param public_key: The public key for the API (required).
:param private_key: The private key for the API (required). :param private_key: The private key for the API (required).
:param display_name: The user-friendly display name of the cluster (required).
:param region: The region where the cluster will be created (required). :param region: The region where the cluster will be created (required).
:return: The response from the API. :return: The response from the API.

View File

@ -152,6 +152,10 @@ class Vector:
from core.rag.datasource.vdb.opengauss.opengauss import OpenGaussFactory from core.rag.datasource.vdb.opengauss.opengauss import OpenGaussFactory
return OpenGaussFactory return OpenGaussFactory
case VectorType.TABLESTORE:
from core.rag.datasource.vdb.tablestore.tablestore_vector import TableStoreVectorFactory
return TableStoreVectorFactory
case _: case _:
raise ValueError(f"Vector store {vector_type} is not supported.") raise ValueError(f"Vector store {vector_type} is not supported.")

View File

@ -25,3 +25,4 @@ class VectorType(StrEnum):
TIDB_ON_QDRANT = "tidb_on_qdrant" TIDB_ON_QDRANT = "tidb_on_qdrant"
OCEANBASE = "oceanbase" OCEANBASE = "oceanbase"
OPENGAUSS = "opengauss" OPENGAUSS = "opengauss"
TABLESTORE = "tablestore"

View File

@ -226,7 +226,6 @@ class WeaviateVector(BaseVector):
Args: Args:
query: Text to look up documents similar to. query: Text to look up documents similar to.
k: Number of Documents to return. Defaults to 4.
Returns: Returns:
List of Documents most similar to the query. List of Documents most similar to the query.

View File

@ -7,11 +7,10 @@ class FirecrawlWebExtractor(BaseExtractor):
""" """
Crawl and scrape websites and return content in clean llm-ready markdown. Crawl and scrape websites and return content in clean llm-ready markdown.
Args: Args:
url: The URL to scrape. url: The URL to scrape.
api_key: The API key for Firecrawl. job_id: The crawl job id.
base_url: The base URL for the Firecrawl API. Defaults to 'https://api.firecrawl.dev'. tenant_id: The tenant id.
mode: The mode of operation. Defaults to 'scrape'. Options are 'crawl', 'scrape' and 'crawl_return_urls'. mode: The mode of operation. Defaults to 'scrape'. Options are 'crawl', 'scrape' and 'crawl_return_urls'.
only_main_content: Only return the main content of the page excluding headers, navs, footers, etc. only_main_content: Only return the main content of the page excluding headers, navs, footers, etc.
""" """

View File

@ -1,6 +1,8 @@
import logging import logging
from typing import Optional from typing import Optional
import pypandoc # type: ignore
from core.rag.extractor.extractor_base import BaseExtractor from core.rag.extractor.extractor_base import BaseExtractor
from core.rag.models.document import Document from core.rag.models.document import Document
@ -34,6 +36,7 @@ class UnstructuredEpubExtractor(BaseExtractor):
else: else:
from unstructured.partition.epub import partition_epub from unstructured.partition.epub import partition_epub
pypandoc.download_pandoc()
elements = partition_epub(filename=self._file_path, xml_keep_tags=True) elements = partition_epub(filename=self._file_path, xml_keep_tags=True)
from unstructured.chunking.title import chunk_by_title from unstructured.chunking.title import chunk_by_title

View File

@ -14,15 +14,6 @@ class UnstructuredMarkdownExtractor(BaseExtractor):
Args: Args:
file_path: Path to the file to load. file_path: Path to the file to load.
remove_hyperlinks: Whether to remove hyperlinks from the text.
remove_images: Whether to remove images from the text.
encoding: File encoding to use. If `None`, the file will be loaded
with the default system encoding.
autodetect_encoding: Whether to try to autodetect the file encoding
if the specified encoding fails.
""" """
def __init__(self, file_path: str, api_url: Optional[str] = None, api_key: str = ""): def __init__(self, file_path: str, api_url: Optional[str] = None, api_key: str = ""):

View File

@ -1,7 +1,7 @@
from enum import Enum from enum import Enum, StrEnum
class BuiltInField(str, Enum): class BuiltInField(StrEnum):
document_name = "document_name" document_name = "document_name"
uploader = "uploader" uploader = "uploader"
upload_date = "upload_date" upload_date = "upload_date"

View File

@ -1,7 +1,7 @@
from enum import Enum from enum import StrEnum
class IndexType(str, Enum): class IndexType(StrEnum):
PARAGRAPH_INDEX = "text_model" PARAGRAPH_INDEX = "text_model"
QA_INDEX = "qa_model" QA_INDEX = "qa_model"
PARENT_CHILD_INDEX = "hierarchical_model" PARENT_CHILD_INDEX = "hierarchical_model"

View File

@ -39,6 +39,8 @@ class ParentChildIndexProcessor(BaseIndexProcessor):
all_documents = [] # type: ignore all_documents = [] # type: ignore
if rules.parent_mode == ParentMode.PARAGRAPH: if rules.parent_mode == ParentMode.PARAGRAPH:
# Split the text documents into nodes. # Split the text documents into nodes.
if not rules.segmentation:
raise ValueError("No segmentation found in rules.")
splitter = self._get_splitter( splitter = self._get_splitter(
processing_rule_mode=process_rule.get("mode"), processing_rule_mode=process_rule.get("mode"),
max_tokens=rules.segmentation.max_tokens, max_tokens=rules.segmentation.max_tokens,

View File

@ -100,6 +100,7 @@ class DatasetRetrieval:
:param hit_callback: hit callback :param hit_callback: hit callback
:param message_id: message id :param message_id: message id
:param memory: memory :param memory: memory
:param inputs: inputs
:return: :return:
""" """
dataset_ids = config.dataset_ids dataset_ids = config.dataset_ids
@ -610,7 +611,11 @@ class DatasetRetrieval:
if dataset.indexing_technique == "economy": if dataset.indexing_technique == "economy":
# use keyword table query # use keyword table query
documents = RetrievalService.retrieve( documents = RetrievalService.retrieve(
retrieval_method="keyword_search", dataset_id=dataset.id, query=query, top_k=top_k retrieval_method="keyword_search",
dataset_id=dataset.id,
query=query,
top_k=top_k,
document_ids_filter=document_ids_filter,
) )
if documents: if documents:
all_documents.extend(documents) all_documents.extend(documents)
@ -730,6 +735,7 @@ class DatasetRetrieval:
Calculate keywords scores Calculate keywords scores
:param query: search query :param query: search query
:param documents: documents for reranking :param documents: documents for reranking
:param top_k: top k
:return: :return:
""" """
@ -846,8 +852,9 @@ class DatasetRetrieval:
) )
if automatic_metadata_filters: if automatic_metadata_filters:
conditions = [] conditions = []
for filter in automatic_metadata_filters: for sequence, filter in enumerate(automatic_metadata_filters):
self._process_metadata_filter_func( self._process_metadata_filter_func(
sequence,
filter.get("condition"), # type: ignore filter.get("condition"), # type: ignore
filter.get("metadata_name"), # type: ignore filter.get("metadata_name"), # type: ignore
filter.get("value"), filter.get("value"),
@ -867,14 +874,18 @@ class DatasetRetrieval:
elif metadata_filtering_mode == "manual": elif metadata_filtering_mode == "manual":
if metadata_filtering_conditions: if metadata_filtering_conditions:
metadata_condition = MetadataCondition(**metadata_filtering_conditions.model_dump()) metadata_condition = MetadataCondition(**metadata_filtering_conditions.model_dump())
for condition in metadata_filtering_conditions.conditions: # type: ignore for sequence, condition in enumerate(metadata_filtering_conditions.conditions): # type: ignore
metadata_name = condition.name metadata_name = condition.name
expected_value = condition.value expected_value = condition.value
if expected_value is not None or condition.comparison_operator in ("empty", "not empty"): if expected_value is not None or condition.comparison_operator in ("empty", "not empty"):
if isinstance(expected_value, str): if isinstance(expected_value, str):
expected_value = self._replace_metadata_filter_value(expected_value, inputs) expected_value = self._replace_metadata_filter_value(expected_value, inputs)
filters = self._process_metadata_filter_func( filters = self._process_metadata_filter_func(
condition.comparison_operator, metadata_name, expected_value, filters sequence,
condition.comparison_operator,
metadata_name,
expected_value,
filters,
) )
else: else:
raise ValueError("Invalid metadata filtering mode") raise ValueError("Invalid metadata filtering mode")
@ -896,7 +907,10 @@ class DatasetRetrieval:
return str(inputs.get(key, f"{{{{{key}}}}}")) return str(inputs.get(key, f"{{{{{key}}}}}"))
pattern = re.compile(r"\{\{(\w+)\}\}") pattern = re.compile(r"\{\{(\w+)\}\}")
return pattern.sub(replacer, text) output = pattern.sub(replacer, text)
if isinstance(output, str):
output = re.sub(r"[\r\n\t]+", " ", output).strip()
return output
def _automatic_metadata_filter_func( def _automatic_metadata_filter_func(
self, dataset_ids: list, query: str, tenant_id: str, user_id: str, metadata_model_config: ModelConfig self, dataset_ids: list, query: str, tenant_id: str, user_id: str, metadata_model_config: ModelConfig
@ -953,26 +967,36 @@ class DatasetRetrieval:
return None return None
return automatic_metadata_filters return automatic_metadata_filters
def _process_metadata_filter_func(self, condition: str, metadata_name: str, value: Optional[Any], filters: list): def _process_metadata_filter_func(
self, sequence: int, condition: str, metadata_name: str, value: Optional[Any], filters: list
):
key = f"{metadata_name}_{sequence}"
key_value = f"{metadata_name}_{sequence}_value"
match condition: match condition:
case "contains": case "contains":
filters.append( filters.append(
(text("documents.doc_metadata ->> :key LIKE :value")).params(key=metadata_name, value=f"%{value}%") (text(f"documents.doc_metadata ->> :{key} LIKE :{key_value}")).params(
**{key: metadata_name, key_value: f"%{value}%"}
)
) )
case "not contains": case "not contains":
filters.append( filters.append(
(text("documents.doc_metadata ->> :key NOT LIKE :value")).params( (text(f"documents.doc_metadata ->> :{key} NOT LIKE :{key_value}")).params(
key=metadata_name, value=f"%{value}%" **{key: metadata_name, key_value: f"%{value}%"}
) )
) )
case "start with": case "start with":
filters.append( filters.append(
(text("documents.doc_metadata ->> :key LIKE :value")).params(key=metadata_name, value=f"{value}%") (text(f"documents.doc_metadata ->> :{key} LIKE :{key_value}")).params(
**{key: metadata_name, key_value: f"{value}%"}
)
) )
case "end with": case "end with":
filters.append( filters.append(
(text("documents.doc_metadata ->> :key LIKE :value")).params(key=metadata_name, value=f"%{value}") (text(f"documents.doc_metadata ->> :{key} LIKE :{key_value}")).params(
**{key: metadata_name, key_value: f"%{value}"}
)
) )
case "is" | "=": case "is" | "=":
if isinstance(value, str): if isinstance(value, str):
@ -996,7 +1020,7 @@ class DatasetRetrieval:
filters.append(sqlalchemy_cast(DatasetDocument.doc_metadata[metadata_name].astext, Integer) < value) filters.append(sqlalchemy_cast(DatasetDocument.doc_metadata[metadata_name].astext, Integer) < value)
case "after" | ">": case "after" | ">":
filters.append(sqlalchemy_cast(DatasetDocument.doc_metadata[metadata_name].astext, Integer) > value) filters.append(sqlalchemy_cast(DatasetDocument.doc_metadata[metadata_name].astext, Integer) > value)
case "" | ">=": case "" | "<=":
filters.append(sqlalchemy_cast(DatasetDocument.doc_metadata[metadata_name].astext, Integer) <= value) filters.append(sqlalchemy_cast(DatasetDocument.doc_metadata[metadata_name].astext, Integer) <= value)
case "" | ">=": case "" | ">=":
filters.append(sqlalchemy_cast(DatasetDocument.doc_metadata[metadata_name].astext, Integer) >= value) filters.append(sqlalchemy_cast(DatasetDocument.doc_metadata[metadata_name].astext, Integer) >= value)
@ -1009,8 +1033,6 @@ class DatasetRetrieval:
) -> tuple[ModelInstance, ModelConfigWithCredentialsEntity]: ) -> tuple[ModelInstance, ModelConfigWithCredentialsEntity]:
""" """
Fetch model config Fetch model config
:param node_data: node data
:return:
""" """
if model is None: if model is None:
raise ValueError("single_retrieval_config is required") raise ValueError("single_retrieval_config is required")

View File

@ -235,6 +235,7 @@ class ReactMultiDatasetRouter:
tools: List of tools the agent will have access to, used to format the tools: List of tools the agent will have access to, used to format the
prompt. prompt.
prefix: String to put before the list of tools. prefix: String to put before the list of tools.
format_instructions: The format instruction prompt.
Returns: Returns:
A PromptTemplate with the template assembled from the pieces here. A PromptTemplate with the template assembled from the pieces here.
""" """

View File

@ -29,9 +29,7 @@ class Tool(ABC):
def fork_tool_runtime(self, runtime: ToolRuntime) -> "Tool": def fork_tool_runtime(self, runtime: ToolRuntime) -> "Tool":
""" """
fork a new tool with meta data fork a new tool with metadata
:param meta: the meta data of a tool call processing, tenant_id is required
:return: the new tool :return: the new tool
""" """
return self.__class__( return self.__class__(
@ -206,6 +204,7 @@ class Tool(ABC):
create a blob message create a blob message
:param blob: the blob :param blob: the blob
:param meta: the meta info of blob object
:return: the blob message :return: the blob message
""" """
return ToolInvokeMessage( return ToolInvokeMessage(

View File

@ -35,7 +35,7 @@ class BuiltinToolProviderController(ToolProviderController):
provider_yaml["credentials_for_provider"][credential_name]["name"] = credential_name provider_yaml["credentials_for_provider"][credential_name]["name"] = credential_name
credentials_schema = [] credentials_schema = []
for credential in provider_yaml.get("credentials_for_provider", {}): for credential in provider_yaml.get("credentials_for_provider", {}).values():
credentials_schema.append(credential) credentials_schema.append(credential)
super().__init__( super().__init__(
@ -153,7 +153,7 @@ class BuiltinToolProviderController(ToolProviderController):
""" """
validate the credentials of the provider validate the credentials of the provider
:param tool_name: the name of the tool, defined in `get_tools` :param user_id: use id
:param credentials: the credentials of the tool :param credentials: the credentials of the tool
""" """
# validate credentials format # validate credentials format
@ -167,7 +167,7 @@ class BuiltinToolProviderController(ToolProviderController):
""" """
validate the credentials of the provider validate the credentials of the provider
:param tool_name: the name of the tool, defined in `get_tools` :param user_id: use id
:param credentials: the credentials of the tool :param credentials: the credentials of the tool
""" """
pass pass

View File

@ -12,4 +12,4 @@ identity:
icon: icon.svg icon: icon.svg
tags: tags:
- productivity - productivity
credentials_for_provider: [] credentials_for_provider: {}

View File

@ -28,9 +28,7 @@ class BuiltinTool(Tool):
def fork_tool_runtime(self, runtime: ToolRuntime) -> "BuiltinTool": def fork_tool_runtime(self, runtime: ToolRuntime) -> "BuiltinTool":
""" """
fork a new tool with meta data fork a new tool with metadata
:param meta: the meta data of a tool call processing, tenant_id is required
:return: the new tool :return: the new tool
""" """
return self.__class__( return self.__class__(
@ -43,7 +41,7 @@ class BuiltinTool(Tool):
""" """
invoke model invoke model
:param model_config: the model config :param user_id: the user id
:param prompt_messages: the prompt messages :param prompt_messages: the prompt messages
:param stop: the stop words :param stop: the stop words
:return: the model result :return: the model result
@ -64,7 +62,6 @@ class BuiltinTool(Tool):
""" """
get max tokens get max tokens
:param model_config: the model config
:return: the max tokens :return: the max tokens
""" """
if self.runtime is None: if self.runtime is None:

View File

@ -145,7 +145,6 @@ class ApiToolProviderController(ToolProviderController):
""" """
fetch tools from database fetch tools from database
:param user_id: the user id
:param tenant_id: the tenant id :param tenant_id: the tenant id
:return: the tools :return: the tools
""" """

View File

@ -35,9 +35,7 @@ class ApiTool(Tool):
def fork_tool_runtime(self, runtime: ToolRuntime): def fork_tool_runtime(self, runtime: ToolRuntime):
""" """
fork a new tool with meta data fork a new tool with metadata
:param meta: the meta data of a tool call processing, tenant_id is required
:return: the new tool :return: the new tool
""" """
if self.api_bundle is None: if self.api_bundle is None:
@ -195,7 +193,12 @@ class ApiTool(Tool):
properties = body_schema.get("properties", {}) properties = body_schema.get("properties", {})
for name, property in properties.items(): for name, property in properties.items():
if name in parameters: if name in parameters:
if property.get("format") == "binary": # multiple file upload: if the type is array and the items have format as binary
if property.get("type") == "array" and property.get("items", {}).get("format") == "binary":
# parameters[name] should be a list of file objects.
for f in parameters[name]:
files.append((name, (f.filename, download(f), f.mime_type)))
elif property.get("format") == "binary":
f = parameters[name] f = parameters[name]
files.append((name, (f.filename, download(f), f.mime_type))) files.append((name, (f.filename, download(f), f.mime_type)))
elif "$ref" in property: elif "$ref" in property:
@ -226,6 +229,13 @@ class ApiTool(Tool):
else: else:
body = body body = body
# if there is a file upload, remove the Content-Type header
# so that httpx can automatically generate the boundary header required for multipart/form-data.
# issue: https://github.com/langgenius/dify/issues/13684
# reference: https://stackoverflow.com/questions/39280438/fetch-missing-boundary-in-multipart-form-data-post
if files:
headers.pop("Content-Type", None)
if method in { if method in {
"get", "get",
"head", "head",

View File

@ -264,7 +264,7 @@ class ToolParameter(PluginParameter):
:param name: the name of the parameter :param name: the name of the parameter
:param llm_description: the description presented to the LLM :param llm_description: the description presented to the LLM
:param type: the type of the parameter :param typ: the type of the parameter
:param required: if the parameter is required :param required: if the parameter is required
:param options: the options of the parameter :param options: the options of the parameter
""" """

View File

@ -313,7 +313,6 @@ class ToolEngine:
""" """
Create message file Create message file
:param messages: messages
:return: message file ids :return: message file ids
""" """
result = [] result = []

View File

@ -161,8 +161,11 @@ class ToolManager:
get the tool runtime get the tool runtime
:param provider_type: the type of the provider :param provider_type: the type of the provider
:param provider_name: the name of the provider :param provider_id: the id of the provider
:param tool_name: the name of the tool :param tool_name: the name of the tool
:param tenant_id: the tenant id
:param invoke_from: invoke from
:param tool_invoke_from: the tool invoke from
:return: the tool :return: the tool
""" """
@ -427,8 +430,6 @@ class ToolManager:
get the absolute path of the icon of the hardcoded provider get the absolute path of the icon of the hardcoded provider
:param provider: the name of the provider :param provider: the name of the provider
:param tenant_id: the id of the tenant
:return: the absolute path of the icon, the mime type of the icon :return: the absolute path of the icon, the mime type of the icon
""" """
# get provider # get provider
@ -672,7 +673,8 @@ class ToolManager:
""" """
get the api provider get the api provider
:param provider_name: the name of the provider :param tenant_id: the id of the tenant
:param provider_id: the id of the provider
:return: the provider controller, the credentials :return: the provider controller, the credentials
""" """

View File

@ -84,12 +84,8 @@ class ModelInvocationUtils:
:param user_id: user id :param user_id: user id
:param tenant_id: tenant id, the tenant id of the creator of the tool :param tenant_id: tenant id, the tenant id of the creator of the tool
:param tool_provider: tool provider :param tool_type: tool type
:param tool_id: tool id
:param tool_name: tool name :param tool_name: tool name
:param provider: model provider
:param model: model name
:param model_parameters: model parameters
:param prompt_messages: prompt messages :param prompt_messages: prompt messages
:return: AssistantPromptMessage :return: AssistantPromptMessage
""" """

View File

@ -186,6 +186,9 @@ class ApiBasedToolSchemaParser:
return ToolParameter.ToolParameterType.BOOLEAN return ToolParameter.ToolParameterType.BOOLEAN
elif typ == "string": elif typ == "string":
return ToolParameter.ToolParameterType.STRING return ToolParameter.ToolParameterType.STRING
elif typ == "array":
items = parameter.get("items") or parameter.get("schema", {}).get("items")
return ToolParameter.ToolParameterType.FILES if items and items.get("format") == "binary" else None
else: else:
return None return None
@ -197,6 +200,8 @@ class ApiBasedToolSchemaParser:
parse openapi yaml to tool bundle parse openapi yaml to tool bundle
:param yaml: the yaml string :param yaml: the yaml string
:param extra_info: the extra info
:param warning: the warning message
:return: the tool bundle :return: the tool bundle
""" """
warning = warning if warning is not None else {} warning = warning if warning is not None else {}
@ -278,6 +283,8 @@ class ApiBasedToolSchemaParser:
parse openapi plugin yaml to tool bundle parse openapi plugin yaml to tool bundle
:param json: the json string :param json: the json string
:param extra_info: the extra info
:param warning: the warning message
:return: the tool bundle :return: the tool bundle
""" """
warning = warning if warning is not None else {} warning = warning if warning is not None else {}
@ -312,6 +319,8 @@ class ApiBasedToolSchemaParser:
auto parse to tool bundle auto parse to tool bundle
:param content: the content :param content: the content
:param extra_info: the extra info
:param warning: the warning message
:return: tools bundle, schema_type :return: tools bundle, schema_type
""" """
warning = warning if warning is not None else {} warning = warning if warning is not None else {}

View File

@ -182,7 +182,6 @@ class WorkflowToolProviderController(ToolProviderController):
""" """
fetch tools from database fetch tools from database
:param user_id: the user id
:param tenant_id: the tenant id :param tenant_id: the tenant id
:return: the tools :return: the tools
""" """

View File

@ -127,9 +127,8 @@ class WorkflowTool(Tool):
def fork_tool_runtime(self, runtime: ToolRuntime) -> "WorkflowTool": def fork_tool_runtime(self, runtime: ToolRuntime) -> "WorkflowTool":
""" """
fork a new tool with meta data fork a new tool with metadata
:param meta: the meta data of a tool call processing, tenant_id is required
:return: the new tool :return: the new tool
""" """
return self.__class__( return self.__class__(
@ -212,7 +211,6 @@ class WorkflowTool(Tool):
""" """
extract files from the result extract files from the result
:param result: the result
:return: the result, files :return: the result, files
""" """
files: list[File] = [] files: list[File] = []

View File

@ -9,6 +9,7 @@ from typing import Any, cast
import docx import docx
import pandas as pd import pandas as pd
import pypandoc # type: ignore
import pypdfium2 # type: ignore import pypdfium2 # type: ignore
import yaml # type: ignore import yaml # type: ignore
from docx.document import Document from docx.document import Document
@ -369,7 +370,7 @@ def _extract_text_from_ppt(file_content: bytes) -> str:
from unstructured.partition.ppt import partition_ppt from unstructured.partition.ppt import partition_ppt
try: try:
if dify_config.UNSTRUCTURED_API_URL and dify_config.UNSTRUCTURED_API_KEY: if dify_config.UNSTRUCTURED_API_URL:
with tempfile.NamedTemporaryFile(suffix=".ppt", delete=False) as temp_file: with tempfile.NamedTemporaryFile(suffix=".ppt", delete=False) as temp_file:
temp_file.write(file_content) temp_file.write(file_content)
temp_file.flush() temp_file.flush()
@ -378,7 +379,7 @@ def _extract_text_from_ppt(file_content: bytes) -> str:
file=file, file=file,
metadata_filename=temp_file.name, metadata_filename=temp_file.name,
api_url=dify_config.UNSTRUCTURED_API_URL, api_url=dify_config.UNSTRUCTURED_API_URL,
api_key=dify_config.UNSTRUCTURED_API_KEY, api_key=dify_config.UNSTRUCTURED_API_KEY, # type: ignore
) )
os.unlink(temp_file.name) os.unlink(temp_file.name)
else: else:
@ -395,7 +396,7 @@ def _extract_text_from_pptx(file_content: bytes) -> str:
from unstructured.partition.pptx import partition_pptx from unstructured.partition.pptx import partition_pptx
try: try:
if dify_config.UNSTRUCTURED_API_URL and dify_config.UNSTRUCTURED_API_KEY: if dify_config.UNSTRUCTURED_API_URL:
with tempfile.NamedTemporaryFile(suffix=".pptx", delete=False) as temp_file: with tempfile.NamedTemporaryFile(suffix=".pptx", delete=False) as temp_file:
temp_file.write(file_content) temp_file.write(file_content)
temp_file.flush() temp_file.flush()
@ -404,7 +405,7 @@ def _extract_text_from_pptx(file_content: bytes) -> str:
file=file, file=file,
metadata_filename=temp_file.name, metadata_filename=temp_file.name,
api_url=dify_config.UNSTRUCTURED_API_URL, api_url=dify_config.UNSTRUCTURED_API_URL,
api_key=dify_config.UNSTRUCTURED_API_KEY, api_key=dify_config.UNSTRUCTURED_API_KEY, # type: ignore
) )
os.unlink(temp_file.name) os.unlink(temp_file.name)
else: else:
@ -416,9 +417,24 @@ def _extract_text_from_pptx(file_content: bytes) -> str:
def _extract_text_from_epub(file_content: bytes) -> str: def _extract_text_from_epub(file_content: bytes) -> str:
from unstructured.partition.api import partition_via_api
from unstructured.partition.epub import partition_epub from unstructured.partition.epub import partition_epub
try: try:
if dify_config.UNSTRUCTURED_API_URL:
with tempfile.NamedTemporaryFile(suffix=".epub", delete=False) as temp_file:
temp_file.write(file_content)
temp_file.flush()
with open(temp_file.name, "rb") as file:
elements = partition_via_api(
file=file,
metadata_filename=temp_file.name,
api_url=dify_config.UNSTRUCTURED_API_URL,
api_key=dify_config.UNSTRUCTURED_API_KEY, # type: ignore
)
os.unlink(temp_file.name)
else:
pypandoc.download_pandoc()
with io.BytesIO(file_content) as file: with io.BytesIO(file_content) as file:
elements = partition_epub(file=file) elements = partition_epub(file=file)
return "\n".join([str(element) for element in elements]) return "\n".join([str(element) for element in elements])

View File

@ -1,5 +1,6 @@
import json import json
import logging import logging
import re
import time import time
from collections import defaultdict from collections import defaultdict
from collections.abc import Mapping, Sequence from collections.abc import Mapping, Sequence
@ -331,8 +332,9 @@ class KnowledgeRetrievalNode(LLMNode):
automatic_metadata_filters = self._automatic_metadata_filter_func(dataset_ids, query, node_data) automatic_metadata_filters = self._automatic_metadata_filter_func(dataset_ids, query, node_data)
if automatic_metadata_filters: if automatic_metadata_filters:
conditions = [] conditions = []
for filter in automatic_metadata_filters: for sequence, filter in enumerate(automatic_metadata_filters):
self._process_metadata_filter_func( self._process_metadata_filter_func(
sequence,
filter.get("condition", ""), filter.get("condition", ""),
filter.get("metadata_name", ""), filter.get("metadata_name", ""),
filter.get("value"), filter.get("value"),
@ -353,17 +355,26 @@ class KnowledgeRetrievalNode(LLMNode):
if node_data.metadata_filtering_conditions: if node_data.metadata_filtering_conditions:
metadata_condition = MetadataCondition(**node_data.metadata_filtering_conditions.model_dump()) metadata_condition = MetadataCondition(**node_data.metadata_filtering_conditions.model_dump())
if node_data.metadata_filtering_conditions: if node_data.metadata_filtering_conditions:
for condition in node_data.metadata_filtering_conditions.conditions: # type: ignore for sequence, condition in enumerate(node_data.metadata_filtering_conditions.conditions): # type: ignore
metadata_name = condition.name metadata_name = condition.name
expected_value = condition.value expected_value = condition.value
if expected_value is not None or condition.comparison_operator in ("empty", "not empty"): if expected_value is not None or condition.comparison_operator in ("empty", "not empty"):
if isinstance(expected_value, str): if isinstance(expected_value, str):
expected_value = self.graph_runtime_state.variable_pool.convert_template( expected_value = self.graph_runtime_state.variable_pool.convert_template(
expected_value expected_value
).text ).value[0]
if expected_value.value_type == "number": # type: ignore
expected_value = expected_value.value # type: ignore
elif expected_value.value_type == "string": # type: ignore
expected_value = re.sub(r"[\r\n\t]+", " ", expected_value.text).strip() # type: ignore
else:
raise ValueError("Invalid expected metadata value type")
filters = self._process_metadata_filter_func( filters = self._process_metadata_filter_func(
condition.comparison_operator, metadata_name, expected_value, filters sequence,
condition.comparison_operator,
metadata_name,
expected_value,
filters,
) )
else: else:
raise ValueError("Invalid metadata filtering mode") raise ValueError("Invalid metadata filtering mode")
@ -442,25 +453,35 @@ class KnowledgeRetrievalNode(LLMNode):
return [] return []
return automatic_metadata_filters return automatic_metadata_filters
def _process_metadata_filter_func(self, condition: str, metadata_name: str, value: Optional[str], filters: list): def _process_metadata_filter_func(
self, sequence: int, condition: str, metadata_name: str, value: Optional[Any], filters: list
):
key = f"{metadata_name}_{sequence}"
key_value = f"{metadata_name}_{sequence}_value"
match condition: match condition:
case "contains": case "contains":
filters.append( filters.append(
(text("documents.doc_metadata ->> :key LIKE :value")).params(key=metadata_name, value=f"%{value}%") (text(f"documents.doc_metadata ->> :{key} LIKE :{key_value}")).params(
**{key: metadata_name, key_value: f"%{value}%"}
)
) )
case "not contains": case "not contains":
filters.append( filters.append(
(text("documents.doc_metadata ->> :key NOT LIKE :value")).params( (text(f"documents.doc_metadata ->> :{key} NOT LIKE :{key_value}")).params(
key=metadata_name, value=f"%{value}%" **{key: metadata_name, key_value: f"%{value}%"}
) )
) )
case "start with": case "start with":
filters.append( filters.append(
(text("documents.doc_metadata ->> :key LIKE :value")).params(key=metadata_name, value=f"{value}%") (text(f"documents.doc_metadata ->> :{key} LIKE :{key_value}")).params(
**{key: metadata_name, key_value: f"{value}%"}
)
) )
case "end with": case "end with":
filters.append( filters.append(
(text("documents.doc_metadata ->> :key LIKE :value")).params(key=metadata_name, value=f"%{value}") (text(f"documents.doc_metadata ->> :{key} LIKE :{key_value}")).params(
**{key: metadata_name, key_value: f"%{value}"}
)
) )
case "=" | "is": case "=" | "is":
if isinstance(value, str): if isinstance(value, str):

View File

@ -375,11 +375,25 @@ def _process_sub_conditions(
for condition in sub_conditions: for condition in sub_conditions:
key = FileAttribute(condition.key) key = FileAttribute(condition.key)
values = [file_manager.get_attr(file=file, attr=key) for file in files] values = [file_manager.get_attr(file=file, attr=key) for file in files]
expected_value = condition.value
if key == FileAttribute.EXTENSION:
if not isinstance(expected_value, str):
raise TypeError("Expected value must be a string when key is FileAttribute.EXTENSION")
if expected_value and not expected_value.startswith("."):
expected_value = "." + expected_value
normalized_values = []
for value in values:
if value and isinstance(value, str):
if not value.startswith("."):
value = "." + value
normalized_values.append(value)
values = normalized_values
sub_group_results = [ sub_group_results = [
_evaluate_condition( _evaluate_condition(
value=value, value=value,
operator=condition.comparison_operator, operator=condition.comparison_operator,
expected=condition.value, expected=expected_value,
) )
for value in values for value in values
] ]

View File

@ -95,7 +95,6 @@ class VariableTemplateParser:
Args: Args:
inputs: A dictionary containing the values for the template variables. inputs: A dictionary containing the values for the template variables.
remove_template_variables: A boolean indicating whether to remove the template variables from the output.
Returns: Returns:
The formatted string with template variables replaced by their values. The formatted string with template variables replaced by their values.

View File

@ -204,6 +204,8 @@ class WorkflowEntry:
NOTE: only parameter_extractor/question_classifier are supported NOTE: only parameter_extractor/question_classifier are supported
:param node_data: node data :param node_data: node data
:param node_id: node id
:param tenant_id: tenant id
:param user_id: user id :param user_id: user id
:param user_inputs: user inputs :param user_inputs: user inputs
:return: :return:

View File

@ -196,7 +196,7 @@ def _build_from_remote_url(
raise ValueError("Invalid file url") raise ValueError("Invalid file url")
mime_type, filename, file_size = _get_remote_file_info(url) mime_type, filename, file_size = _get_remote_file_info(url)
extension = mimetypes.guess_extension(mime_type) or "." + filename.split(".")[-1] if "." in filename else ".bin" extension = mimetypes.guess_extension(mime_type) or ("." + filename.split(".")[-1] if "." in filename else ".bin")
file_type = FileType(mapping.get("type", "custom")) file_type = FileType(mapping.get("type", "custom"))
file_type = _standardize_file_type(file_type, extension=extension, mime_type=mime_type) file_type = _standardize_file_type(file_type, extension=extension, mime_type=mime_type)

View File

@ -720,6 +720,23 @@ class DocumentSegment(db.Model): # type: ignore[name-defined]
else: else:
return [] return []
def get_child_chunks(self):
process_rule = self.document.dataset_process_rule
if process_rule.mode == "hierarchical":
rules = Rule(**process_rule.rules_dict)
if rules.parent_mode:
child_chunks = (
db.session.query(ChildChunk)
.filter(ChildChunk.segment_id == self.id)
.order_by(ChildChunk.position.asc())
.all()
)
return child_chunks or []
else:
return []
else:
return []
@property @property
def sign_content(self): def sign_content(self):
return self.get_sign_content() return self.get_sign_content()

View File

@ -791,7 +791,7 @@ class Conversation(db.Model): # type: ignore[name-defined]
WorkflowRunStatus.SUCCEEDED: 0, WorkflowRunStatus.SUCCEEDED: 0,
WorkflowRunStatus.FAILED: 0, WorkflowRunStatus.FAILED: 0,
WorkflowRunStatus.STOPPED: 0, WorkflowRunStatus.STOPPED: 0,
WorkflowRunStatus.PARTIAL_SUCCESSED: 0, WorkflowRunStatus.PARTIAL_SUCCEEDED: 0,
} }
for message in messages: for message in messages:
@ -802,7 +802,7 @@ class Conversation(db.Model): # type: ignore[name-defined]
{ {
"success": status_counts[WorkflowRunStatus.SUCCEEDED], "success": status_counts[WorkflowRunStatus.SUCCEEDED],
"failed": status_counts[WorkflowRunStatus.FAILED], "failed": status_counts[WorkflowRunStatus.FAILED],
"partial_success": status_counts[WorkflowRunStatus.PARTIAL_SUCCESSED], "partial_success": status_counts[WorkflowRunStatus.PARTIAL_SUCCEEDED],
} }
if messages if messages
else None else None

View File

@ -109,7 +109,7 @@ class Workflow(Base):
tenant_id: Mapped[str] = mapped_column(StringUUID, nullable=False) tenant_id: Mapped[str] = mapped_column(StringUUID, nullable=False)
app_id: Mapped[str] = mapped_column(StringUUID, nullable=False) app_id: Mapped[str] = mapped_column(StringUUID, nullable=False)
type: Mapped[str] = mapped_column(db.String(255), nullable=False) type: Mapped[str] = mapped_column(db.String(255), nullable=False)
version: Mapped[str] version: Mapped[str] = mapped_column(db.String(255), nullable=False)
marked_name: Mapped[str] = mapped_column(default="", server_default="") marked_name: Mapped[str] = mapped_column(default="", server_default="")
marked_comment: Mapped[str] = mapped_column(default="", server_default="") marked_comment: Mapped[str] = mapped_column(default="", server_default="")
graph: Mapped[str] = mapped_column(sa.Text) graph: Mapped[str] = mapped_column(sa.Text)
@ -352,7 +352,7 @@ class WorkflowRunStatus(StrEnum):
SUCCEEDED = "succeeded" SUCCEEDED = "succeeded"
FAILED = "failed" FAILED = "failed"
STOPPED = "stopped" STOPPED = "stopped"
PARTIAL_SUCCESSED = "partial-succeeded" PARTIAL_SUCCEEDED = "partial-succeeded"
class WorkflowRun(Base): class WorkflowRun(Base):
@ -755,7 +755,8 @@ class WorkflowAppLog(Base):
__tablename__ = "workflow_app_logs" __tablename__ = "workflow_app_logs"
__table_args__ = ( __table_args__ = (
db.PrimaryKeyConstraint("id", name="workflow_app_log_pkey"), db.PrimaryKeyConstraint("id", name="workflow_app_log_pkey"),
db.Index("workflow_app_log_app_idx", "tenant_id", "app_id"), db.Index("workflow_app_log_app_idx", "tenant_id", "app_id", "created_at"),
db.Index("workflow_app_log_workflow_run_idx", "workflow_run_id"),
) )
id: Mapped[str] = mapped_column(StringUUID, server_default=db.text("uuid_generate_v4()")) id: Mapped[str] = mapped_column(StringUUID, server_default=db.text("uuid_generate_v4()"))

Some files were not shown because too many files have changed in this diff Show More