144 Commits

Author SHA1 Message Date
Yongteng Lei
a008b38cf5
Fix: local variable referenced before assignment (#6909)
### What problem does this PR solve?

Fix: local variable referenced before assignment. #6803 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-09 20:29:12 +08:00
Yongteng Lei
dc2c74b249
Feat: add primitive support for function calls (#6840)
### What problem does this PR solve?

This PR introduces ​**​primitive support for function calls​**​,
enabling the system to handle basic function call capabilities.
However, this feature is currently experimental and ​**​not yet enabled
for general use​**​, as it is only supported by a subset of models,
namely, Qwen and OpenAI models.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-04-08 16:09:03 +08:00
Zhichang Yu
e7a2a4b7ff
Log llm response on exception (#6750)
### What problem does this PR solve?

Log llm response on exception

### Type of change

- [x] Refactoring
2025-04-02 17:10:57 +08:00
Alex Chen
46b5e32cd7
Feat: support vision llm for gpustack (#6636)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/6138

This PR is going to support vision llm for gpustack, modify url path
from `/v1-openai` to `/v1`

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-03-31 15:33:52 +08:00
Marcus Yuan
c61df5dd25
Dynamic Context Window Size for Ollama Chat (#6582)
# Dynamic Context Window Size for Ollama Chat

## Problem Statement
Previously, the Ollama chat implementation used a fixed context window
size of 32768 tokens. This caused two main issues:
1. Performance degradation due to unnecessarily large context windows
for small conversations
2. Potential business logic failures when using smaller fixed sizes
(e.g., 2048 tokens)

## Solution
Implemented a dynamic context window size calculation that:
1. Uses a base context size of 8192 tokens
2. Applies a 1.2x buffer ratio to the total token count
3. Adds multiples of 8192 tokens based on the buffered token count
4. Implements a smart context size update strategy

## Implementation Details

### Token Counting Logic
```python
def count_tokens(text):
    """Calculate token count for text"""
    # Simple calculation: 1 token per ASCII character
    # 2 tokens for non-ASCII characters (Chinese, Japanese, Korean, etc.)
    total = 0
    for char in text:
        if ord(char) < 128:  # ASCII characters
            total += 1
        else:  # Non-ASCII characters
            total += 2
    return total
```

### Dynamic Context Calculation
```python
def _calculate_dynamic_ctx(self, history):
    """Calculate dynamic context window size"""
    # Calculate total tokens for all messages
    total_tokens = 0
    for message in history:
        content = message.get("content", "")
        content_tokens = count_tokens(content)
        role_tokens = 4  # Role marker token overhead
        total_tokens += content_tokens + role_tokens

    # Apply 1.2x buffer ratio
    total_tokens_with_buffer = int(total_tokens * 1.2)
    
    # Calculate context size in multiples of 8192
    if total_tokens_with_buffer <= 8192:
        ctx_size = 8192
    else:
        ctx_multiplier = (total_tokens_with_buffer // 8192) + 1
        ctx_size = ctx_multiplier * 8192
    
    return ctx_size
```

### Integration in Chat Method
```python
def chat(self, system, history, gen_conf):
    if system:
        history.insert(0, {"role": "system", "content": system})
    if "max_tokens" in gen_conf:
        del gen_conf["max_tokens"]
    try:
        # Calculate new context size
        new_ctx_size = self._calculate_dynamic_ctx(history)
        
        # Prepare options with context size
        options = {
            "num_ctx": new_ctx_size
        }
        # Add other generation options
        if "temperature" in gen_conf:
            options["temperature"] = gen_conf["temperature"]
        if "max_tokens" in gen_conf:
            options["num_predict"] = gen_conf["max_tokens"]
        if "top_p" in gen_conf:
            options["top_p"] = gen_conf["top_p"]
        if "presence_penalty" in gen_conf:
            options["presence_penalty"] = gen_conf["presence_penalty"]
        if "frequency_penalty" in gen_conf:
            options["frequency_penalty"] = gen_conf["frequency_penalty"]
            
        # Make API call with dynamic context size
        response = self.client.chat(
            model=self.model_name,
            messages=history,
            options=options,
            keep_alive=60
        )
        return response["message"]["content"].strip(), response.get("eval_count", 0) + response.get("prompt_eval_count", 0)
    except Exception as e:
        return "**ERROR**: " + str(e), 0
```

## Benefits
1. **Improved Performance**: Uses appropriate context windows based on
conversation length
2. **Better Resource Utilization**: Context window size scales with
content
3. **Maintained Compatibility**: Works with existing business logic
4. **Predictable Scaling**: Context growth in 8192-token increments
5. **Smart Updates**: Context size updates are optimized to reduce
unnecessary model reloads

## Future Considerations
1. Fine-tune buffer ratio based on usage patterns
2. Add monitoring for context window utilization
3. Consider language-specific token counting optimizations
4. Implement adaptive threshold based on conversation patterns
5. Add metrics for context size update frequency

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-03-28 12:38:27 +08:00
Kevin Hu
d2043ff9f2
Fix: LmStudioChat issue. (#6591)
### What problem does this PR solve?

#6577

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-27 14:59:15 +08:00
Yongteng Lei
df3890827d
Refa: change LLM chat output from full to delta (incremental) (#6534)
### What problem does this PR solve?

Change LLM chat output from full to delta (incremental)

### Type of change

- [x] Refactoring
2025-03-26 19:33:14 +08:00
Kevin Hu
12ad746ee6
Fix: Bedrock model invocation error. (#6533)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-26 11:27:12 +08:00
Kevin Hu
095fc84cf2
Fix: claude max tokens. (#6484)
### What problem does this PR solve?

#6458

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-25 10:41:55 +08:00
Kevin Hu
85eb3775d6
Refa: update Anthropic models. (#6445)
### What problem does this PR solve?

#6421

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-24 12:34:57 +08:00
fansir
efc4796f01
Fix ratelimit errors during document parsing (#6413)
### What problem does this PR solve?

When using the online large model API knowledge base to extract
knowledge graphs, frequent Rate Limit Errors were triggered,
causing document parsing to fail. This commit fixes the issue by
optimizing API calls in the following way:
Added exponential backoff and jitter to the API call to reduce the
frequency of Rate Limit Errors.


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-03-22 23:07:03 +08:00
Kevin Hu
a2a4bfe3e3
Fix: change ollama default num_ctx. (#6395)
### What problem does this PR solve?

#6163

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-21 16:22:03 +08:00
Kevin Hu
e9a6675c40
Fix: enable ollama api-key. (#6205)
### What problem does this PR solve?

#6189

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-18 13:37:34 +08:00
Kevin Hu
7e4d693054
Fix: in case response.choices[0].message.content is None. (#6190)
### What problem does this PR solve?

#6164

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-18 10:00:27 +08:00
writinwaters
9c8060f619
0.17.1 release notes (#6021)
### What problem does this PR solve?



### Type of change

- [x] Documentation Update
2025-03-13 14:43:24 +08:00
Kevin Hu
3571270191
Refa: refine the context window size warning. (#5993)
### What problem does this PR solve?


### Type of change
- [x] Refactoring
2025-03-12 19:40:54 +08:00
kuro5989
6e13922bdc
Feat: Add qwq model support to Tongyi-Qianwen factory (#5981)
### What problem does this PR solve?

add qwq model support to Tongyi-Qianwen factory
https://github.com/infiniflow/ragflow/issues/5869

### Type of change

- [x] New Feature (non-breaking change which adds functionality)


![image](https://github.com/user-attachments/assets/49f5c6a0-ecaf-41dd-a23a-2009f854d62c)


![image](https://github.com/user-attachments/assets/93ffa303-920e-4942-8188-bcd6b7209204)


![1741774779438](https://github.com/user-attachments/assets/25f2fd1d-8640-4df0-9a08-78ee9daaa8fe)


![image](https://github.com/user-attachments/assets/4763cf6c-1f76-43c4-80ee-74dfd666a184)

Co-authored-by: zhaozhicheng <zhicheng.zhao@fastonetech.com>
2025-03-12 18:54:15 +08:00
Kevin Hu
251ba7f058
Refa: remove max tokens since no one needs it. (#5690)
### What problem does this PR solve?

#5646 #5640

### Type of change

- [x] Refactoring
2025-03-06 11:29:40 +08:00
Kevin Hu
955801db2e
Resolve super class invokation error. (#5337)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-25 17:42:29 +08:00
Kevin Hu
daddfc9e1b
Remove dup gb2312, solve currupt error. (#5326)
### What problem does this PR solve?

#5252 
#5325

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-25 12:22:37 +08:00
Kevin Hu
df3d0f61bd
Fix base url missing for deepseek from Tongyi. (#5294)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-24 15:43:32 +08:00
Kevin Hu
ec96426c00
Tongyi adapts deepseek. (#5285)
### What problem does this PR solve?


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-02-24 14:04:25 +08:00
Omar Leonardo Sanchez Granados
4f2816c01c
Add support to boto3 default connection (#5246)
### What problem does this PR solve?
 
This pull request includes changes to the initialization logic of the
`ChatModel` and `EmbeddingModel` classes to enhance the handling of AWS
credentials.

Use cases:
- Use env variables for credentials instead of managing them on the DB 
- Easy connection when deploying on an AWS machine

### Type of change

- [X] New Feature (non-breaking change which adds functionality)
2025-02-24 11:01:14 +08:00
yrk111222
7ce675030b
Support downloading models from ModelScope Community. (#5073)
This PR supports downloading models from ModelScope. The main
modifications are as follows:
-New Feature (non-breaking change which adds functionality)
-Documentation Update

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-02-24 10:12:20 +08:00
Kevin Hu
1a755e75c5
Remove v1 (#5220)
### What problem does this PR solve?

#5201

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-21 15:15:38 +08:00
saikidev
d2929e432e
Feat: add LLM provider PPIO (#5013)
### What problem does this PR solve?

Add a LLM provider: PPIO

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
2025-02-17 12:03:26 +08:00
Kevin Hu
b08bb56f6c
Display thinking for deepseek r1 (#4904)
### What problem does this PR solve?
#4903
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-02-12 15:43:13 +08:00
Kevin Hu
2aa0cdde8f
Fix Gemini chat issue. (#4757)
### What problem does this PR solve?

#4753

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-07 12:00:19 +08:00
Kyle
036f37a627
fix: err object has no attribute 'iter_lines' (#4686)
### What problem does this PR solve?

ERROR: 'Stream' object has no attribute 'iter_lines' with reference to
Claude/Anthropic chat streams

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Kyle Olmstead <k.olmstead@offensive-security.com>
2025-02-01 22:39:30 +08:00
Kevin Hu
4776fa5e4e
Refactor for total_tokens. (#4652)
### What problem does this PR solve?

#4567
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-26 13:54:26 +08:00
Kevin Hu
dd0ebbea35
Light GraphRAG (#4585)
### What problem does this PR solve?

#4543

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-01-22 19:43:14 +08:00
Alex Chen
7944aacafa
Feat: add gpustack model provider (#4469)
### What problem does this PR solve?

Add GPUStack as a new model provider.
[GPUStack](https://github.com/gpustack/gpustack) is an open-source GPU
cluster manager for running LLMs. Currently, locally deployed models in
GPUStack cannot integrate well with RAGFlow. GPUStack provides both
OpenAI compatible APIs (Models / Chat Completions / Embeddings /
Speech2Text / TTS) and other APIs like Rerank. We would like to use
GPUStack as a model provider in ragflow.

[GPUStack Docs](https://docs.gpustack.ai/latest/quickstart/)

Related issue: https://github.com/infiniflow/ragflow/issues/4064.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)



### Testing Instructions
1. Install GPUStack and deploy the `llama-3.2-1b-instruct` llm, `bge-m3`
text embedding model, `bge-reranker-v2-m3` rerank model,
`faster-whisper-medium` Speech-to-Text model, `cosyvoice-300m-sft` in
GPUStack.
2. Add provider in ragflow settings.
3. Testing in ragflow.
2025-01-15 14:15:58 +08:00
Yingfeng
50f209204e
Synchronize with enterprise version (#4325)
### Type of change

- [x] Refactoring
2025-01-02 13:44:44 +08:00
Zhichang Yu
0d68a6cd1b
Fix errors detected by Ruff (#3918)
### What problem does this PR solve?

Fix errors detected by Ruff

### Type of change

- [x] Refactoring
2024-12-08 14:21:12 +08:00
Kevin Hu
593ffc4067
Fix HuggingFace model error. (#3870)
### What problem does this PR solve?

#3865

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-05 13:28:42 +08:00
Jin Hai
6657ca7cde
Change default error message to English (#3838)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-12-04 09:34:49 +08:00
Zhichang Yu
d94386e00a
Pass top_p to ollama (#3744)
### What problem does this PR solve?

Pass top_p to ollama. Close #1769

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-11-29 14:52:27 +08:00
Kevin Hu
0891a393d7
Let ThreadPool exit gracefully. (#3653)
### What problem does this PR solve?

#3646

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-11-26 16:31:07 +08:00
Kevin Hu
81c7b6afc5 Make spark model robuster to model name (#3514)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-11-20 20:53:44 +08:00
shijiefengjun
632b23486f
Fix the value issue of anthropic (#3351)
### What problem does this PR solve?

This pull request fixes the issue mentioned in
https://github.com/infiniflow/ragflow/issues/3263.

1. response should be parsed as dict, prevent the following code from
failing to take values:
ans = response["content"][0]["text"]
2. API Model ```claude-instant-1.2``` has retired (by
[model-deprecations](https://docs.anthropic.com/en/docs/resources/model-deprecations)),
it will trigger errors in the code, so I deleted it from the
conf/llm_factories.json file and updated the latest API Model
```claude-3-5-sonnet-20241022```



### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: chenhaodong <chenhaodong@ctrlvideo.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-11-13 16:13:52 +08:00
Kevin Hu
34d1daac67
fix: Anthropic param error (#3327)
### What problem does this PR solve?

#3263

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-11-11 11:54:14 +08:00
Kevin Hu
7e0148c058
fix local variable ans (#3077)
### What problem does this PR solve?
#3064

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-10-29 10:42:45 +08:00
Kevin Hu
f86826b7a0
refactor error message of qwen (#3074)
### What problem does this PR solve?
#3055

### Type of change
- [x] Refactoring
2024-10-29 10:08:08 +08:00
Kevin Hu
9457d20ef1
make gemini robust (#3012)
### What problem does this PR solve?

#3003

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-10-25 10:50:44 +08:00
Yinquan WANG
445dce4363
[Bug]: unnecessary auto-increment calculations in the tokens statistics of the chat model (#2969)
### What problem does this PR solve?

the details is shown in
https://github.com/infiniflow/ragflow/issues/2968

### Type of change

- [X] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-10-22 16:26:04 +08:00
Yinquan WANG
5aa9d7787e
[Bug]: When use OpenAI chat model , raise ERROR: 'CompletionUsage' object has no attribute 'get' #2948 (#2949)
[Bug]: When use OpenAI chat model , raise ERROR: 'CompletionUsage'
object has no attribute 'get' #2948

### What problem does this PR solve?

the detail of this PR is shown at
https://github.com/infiniflow/ragflow/issues/2948

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-10-22 11:40:05 +08:00
Kevin Hu
b2524eec49
fix sequence2txt error and usage total token issue (#2961)
### What problem does this PR solve?

#1363

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-10-22 11:38:37 +08:00
chongchuanbing
ac26d09a59
Feature/feat1017 (#2872)
### What problem does this PR solve?

1. fix: mid map show error in knowledge graph, juse because
```@antv/g6```version changed
2. feat: concurrent threads configuration support in graph extractor
3. fix: used tokens update failed for tenant
4. feat: timeout configuration support for llm
5. fix: regex error in graph extractor
6. feat: qwen rerank(```gte-rerank```) support
7. fix: timeout deal in knowledge graph index process. Now chat by
stream output, also, it is configuratable.
8. feat: ```qwen-long``` model configuration

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: chongchuanbing <chongchuanbing@gmail.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-10-21 12:11:08 +08:00
JobSmithManipulation
3f065c75da
support chat model in huggingface (#2802)
### What problem does this PR solve?

#2794

### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2024-10-11 14:45:48 +08:00
JobSmithManipulation
18f80743eb
support api-version and change default-model in adding azure-openai and openai (#2799)
### What problem does this PR solve?
#2701 #2712 #2749

### Type of change
-[x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-10-11 11:26:42 +08:00