ragflow

AI/ragflow

mirror of https://git.mirrors.martin98.com/https://github.com/infiniflow/ragflow.git synced 2025-08-17 01:55:58 +08:00

Author	SHA1	Message	Date
balibabu	f49588756e	Feat: Load the dialog page, prohibit calling the dialog/get interface #6798 (#6799 ) ### What problem does this PR solve? Feat: Load the dialog page, prohibit calling the dialog/get interface #6798 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-03 18:04:40 +08:00
liu an	57e760883e	Fix: update chunk, empty question issue. (#6800 ) ### What problem does this PR solve? fix issue #6539, refer to pr #6405 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-03 18:04:19 +08:00
liu an	b213e88cca	Test: Added test cases for List Chat Assistants HTTP API (#6792 ) ### What problem does this PR solve? cover [list chat assistant](https://ragflow.io/docs/v0.17.2/http_api_reference#list-chat-assistants) endpoints ### Type of change - [x] add test cases	2025-04-03 17:22:23 +08:00
zunceng	e8f46c9207	Fix: missing redis pvc storageclass in helm (#6788 ) fix redis pvc in helm deployment ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-03 16:55:47 +08:00
so95	cded812b97	Feat: add OpenAI compatible API for agent (#6329 ) ### What problem does this PR solve? add openai agent _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-04-03 16:51:37 +08:00
balibabu	2acb02366e	Feat: Clarify the use of OpenAI-API-compatible #6782 (#6783 ) ### What problem does this PR solve? Feat: Clarify the use of OpenAI-API-compatible #6782 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-03 11:38:21 +08:00
Kevin Hu	9ecc78feeb	Refa: copywriting refinement. (#6779 ) ### What problem does this PR solve? Close #6762 ### Type of change - [x] Refactoring	2025-04-03 11:38:02 +08:00
Zhichang Yu	fdc410e743	Fix set_graph on non-existing edge (#6777 ) ### What problem does this PR solve? Fix set_graph on non-existing edge ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-03 11:09:04 +08:00
Kevin Hu	5b5558300a	Feat: add gemini-2.5-pro-exp-03-25 (#6774 ) ### What problem does this PR solve? #6733 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-03 10:48:58 +08:00
liu an	b5918e7158	Docs: Fix for issue #6713 (#6775 ) ### What problem does this PR solve? update fo issue #6713 ### Type of change - [x] Documentation Update	2025-04-03 10:19:58 +08:00
liu an	58f8026632	Test: Update test cases for PR #6643 (#6766 ) ### What problem does this PR solve? Update test cases for PR #6643 issue #6607 ### Type of change - [x] update test cases	2025-04-03 10:10:40 +08:00
liwenju0	a73fbc61ff	Fix: Handle the case of deleting empty blocks. Update the relevant message (#6643 ) …gic to return the correct deletion message. Add handling for empty arrays to ensure no errors occur during the deletion operation. Update the test cases to verify the new logic. ### What problem does this PR solve? fix this bug：https://github.com/infiniflow/ragflow/issues/6607 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: wenju.li <wenju.li@deepctr.cn>	2025-04-02 19:20:17 +08:00
liu an	0d1c5fdd2f	Test: Added test cases for Create Chat Assistant HTTP API (#6763 ) ### What problem does this PR solve? cover [create chat assistant](https://ragflow.io/docs/v0.17.2/http_api_reference#create-chat-assistant) endpoints ### Type of change - [x] add test cases	2025-04-02 18:49:59 +08:00
liu an	6c77ef5a5e	Docs(api): align default values in create chat assistant HTTP API dos with implementation (#6764 ) ### What problem does this PR solve? align default values in create chat assistant HTTP API dos with implementation. llm.presence_penalty 0.2 -> 0.4 prompt.top_n 8->6 ### Type of change - [x] Documentation Update	2025-04-02 18:48:31 +08:00
Zhichang Yu	e7a2a4b7ff	Log llm response on exception (#6750 ) ### What problem does this PR solve? Log llm response on exception ### Type of change - [x] Refactoring	2025-04-02 17:10:57 +08:00
balibabu	724a36fcdb	Fix: Issue with Markdown Code Blocks Breaking Frontend Layout #5789 (#6758 ) ### What problem does this PR solve? Fix: Issue with Markdown Code Blocks Breaking Frontend Layout #5789 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-02 16:28:55 +08:00
liwenju0	9ce6521582	Fix: Change the field name of the document ID from "documents" to "do… (#6753 ) …cument_ids" to maintain consistency. ### What problem does this PR solve? Close #6752 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: wenju.li <wenju.li@deepctr.cn>	2025-04-02 15:52:52 +08:00
balibabu	160bf4ccb3	Fix: The file upload prompt indicates "No authorization." #6516 (#6756 ) ### What problem does this PR solve? Fix: The file upload prompt indicates "No authorization." #6516 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-02 15:52:35 +08:00
balibabu	aa25d09b0c	Fix: Using the Enter key does not send a complete message #6754 (#6755 ) ### What problem does this PR solve? Fix: Using the Enter key does not send a complete message #6754 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-02 15:02:16 +08:00
writinwaters	2471a6e115	Updated max_tokens descriptions (#6751 ) ### What problem does this PR solve? #6721 ### Type of change - [x] Documentation Update	2025-04-02 13:56:55 +08:00
balibabu	fc02929946	Feat: Support deleting knowledge graph #6747 (#6748 ) ### What problem does this PR solve? Feat: Support deleting knowledge graph #6747 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-02 11:20:37 +08:00
liu an	3ae1e9e3c4	Test: Skip test case for PR 6443 (#6724 ) ### What problem does this PR solve? Skip test case for PR #6443 ### Type of change - [x] update test cases	2025-04-02 10:41:01 +08:00
balibabu	117f18240d	Feat: Add a notification logic to the team member invite feature #6610 (#6729 ) ### What problem does this PR solve? Feat: Add a notification logic to the team member invite feature #6610 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-02 09:15:13 +08:00
writinwaters	31296ad70f	Miscellaneous doc updates and refactored team management doc. (#6730 ) ### What problem does this PR solve? #5576, #6672 ### Type of change - [x] Documentation and UI Update	2025-04-01 19:05:30 +08:00
balibabu	132eae9d5b	Feat: Interrupt streaming #6515 (#6723 ) ### What problem does this PR solve? Feat: Interrupt streaming #6515 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-01 17:26:54 +08:00
kaiyuan Zhang	ead5f7aba9	Fix infinite recursion in RagTokenizer when processing repetitive characters (#6109 ) ### What problem does this PR solve? fix #6085 RagTokenizer's dfs_() function falls into infinite recursion when processing text with repetitive Chinese characters (e.g., "一一一一一十一十一十一..." or "一一一一一一十十十十十十十二十二十二..."), causing memory leaks. ### Type of change Implemented three optimizations to the dfs_() function: 1.Added memoization with _memo dictionary to cache computed results 2.Added recursion depth limiting with _depth parameter (max 10 levels) 3.Implemented special handling for repetitive character sequences - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-04-01 13:59:52 +08:00
liu an	58e6e7b668	Test: Refactor test fixtures and test cases (#6709 ) ### What problem does this PR solve? Refactor test fixtures and test cases ### Type of change - [ ] Refactoring test cases	2025-04-01 13:39:07 +08:00
Yue-Lyu123	20b8ccd1e9	Hotfix ece5903 (#6705 ) I'm really sorry, I found that in graphrag/general/extractor.py under def __call__, the line change.removed_nodes.extend(nodes[1:]) causes an AttributeError: 'set' object has no attribute 'extend'. Could you please merge the branch e666528 again? I made some modifications.	2025-04-01 12:06:28 +08:00
balibabu	d0dca16fee	Feat: Allows users to search for models in the model selection drop-down box #3221 (#6708 ) ### What problem does this PR solve? Feat: Allows users to search for models in the model selection drop-down box #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-01 11:53:48 +08:00
Kevin Hu	fc21dd0a4a	Feat: add qwq-plus-latest (#6702 ) ### What problem does this PR solve? #6697 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-01 11:06:03 +08:00
Kevin Hu	61c0dfab70	Fix: `Email` error. (#6701 ) ### What problem does this PR solve? #6695 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-01 10:37:04 +08:00
Yue-Lyu123	67330833af	fix: correct [AttributeError: 'set' object has no attribute 'nodes' T… (#6699 ) ### Related Issue: https://github.com/infiniflow/ragflow/issues/6653 ### Environment: Using nightly version [ece5903] Elasticsearch database Thanks for the review! My fault! I realize my initial testing wasn't passed. In graphrag/entity_resolution.py `sub_connect_graph` is a set like` {'HELLO', 'Hi', 'How are you'}`, Neither accessing `.nodes` nor `.nodes()` will work, it still causes `AttributeError: 'set' object has no attribute 'nodes'` In graphrag/general/extractor.py The `list.extend() `method performs an in-place operation, directly modifying the original list and returning ‘None’ rather than the modified list. Neither accessing `sorted(set(node0_attrs[attr].extend(node1_attrs.get(attr, []))))` nor `sorted(set(node0_attrs[attr].extend(node1_attrs[attr])))` will work, it still causes `TypeError: 'NoneType' object is not iterable` ### Type of change - [ ] Bug Fix AttributeError: graphrag/entity_resolution.py - [ ] Bug Fix TypeError: graphrag/general/extractor.py	2025-04-01 09:38:21 +08:00
Yue-Lyu123	ece59034f7	fix: Resolve KnowledgeGraph entity resolution errors (#6653 ) (#6691 ) ### Related Issue: #6653 ### Environment: Using nightly version Elasticsearch database ### Bug Description: When clicking the "Entity Resolution" button in KnowledgeGraph, encountered the following errors: graphrag/entity_resolution.py ``` list(sub_connect_graph.nodes) AttributeError ``` graphrag/general/extractor.py ``` node0_attrs[attr] = sorted(set(node0_attrs[attr].extend(node1_attrs[attr]))) TypeError: 'NoneType' object is not iterable ``` ``` for attr in ["keywords", "source_id"]: KeyError I think attribute "keywords" is in edges not nodes ``` graphrag/utils.py ``` settings.docStoreConn.delete() # Sync function called as async ``` ### Changes Made: Fixed AttributeError in entity_resolution.py by properly handling graph nodes Fixed TypeError and KeyError in extractor.py by separate operations Corrected async/sync mismatch in document deletion call	2025-03-31 22:31:35 +08:00
Kevin Hu	0a42e5777e	Refa: docker/.env comment refinement. (#6689 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2025-03-31 18:26:20 +08:00
RedBookOfMemory	e2b66628f4	Feat: extend S3 storage compatibility and add knowledge base ID prefix (#6355 ) ### What problem does this PR solve? - Added support for S3-compatible protocols. - Enabled the use of knowledge base ID as a file prefix when storing files in S3. - Updated docker/README.md to include detailed S3 and OSS configuration instructions. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-03-31 16:09:43 +08:00
Alex Chen	46b5e32cd7	Feat: support vision llm for gpustack (#6636 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/6138 This PR is going to support vision llm for gpustack, modify url path from `/v1-openai` to `/v1` ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-03-31 15:33:52 +08:00
Kevin Hu	7d9dd1e5d3	Refa: remove default build-in rerank model. (#6682 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring - [x] Performance Improvement	2025-03-31 15:33:19 +08:00
so95	1985ff7918	add type canvas (#6680 ) add type canvas ### Type of change - [x] Refactoring	2025-03-31 14:46:29 +08:00
Kevin Hu	60b9c027c8	Refa: add meta data to retrieval. (#6676 ) ### What problem does this PR solve? #6619 ### Type of change - [x] Performance Improvement	2025-03-31 11:45:56 +08:00
writinwaters	2793c8e4fe	Added a guide on setting page rank. (#6645 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update --------- Co-authored-by: balibabu <cike8899@users.noreply.github.com>	2025-03-31 11:44:18 +08:00
Yingfeng	805a8f1f47	Update broken discord (#6678 ) ### Type of change - [x] Documentation Update	2025-03-31 11:29:34 +08:00
Song Fuchang	d4a3e9a7cc	Fix table migration on non-exist-yet indexed columns. (#6666 ) ### What problem does this PR solve? Fix #6334 Hello, I encountered the same problem in #6334. In the `api/db/db_models.py`, it calls `obj.create_table()` unconditionally in `init_database_tables`, before the `migrate_db()`. Specially for the `permission` field of `user_canvas` table, it has `index=True`, which causes `peewee` to issue a SQL trying to create the index when the field does not exist (the `user_canvas` table already exists), so `psycopg2.errors.UndefinedColumn: column "permission" does not exist` occurred. I've added a judgement in the code, to only call `create_table()` when the table does not exist, delegate the migration process to `migrate_db()`. Then another problem occurs: the `migrate_db()` actually does nothing because it failed on the first migration! The `playhouse` blindly issue DDLs without things like `IF NOT EXISTS`, so it fails... even if the exception is `pass`, the transaction is still rolled back. So I removed the transaction in `migrate_db()` to make it work. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-03-31 11:27:20 +08:00
Song Fuchang	ad4e59edb2	Don't split and strip input in retrieval component. (#6662 ) ### What problem does this PR solve? Actually fix #6241 Hello, I ran into the same problem as #6241. When I'm testing my agent flow in the web ui using `Run` button with a file input, the retrieval component always gave an empty output. In the code I found that: `web/src/pages/flow/debug-content/index.tsx`: ```tsx const onOk = useCallback(async () => { const values = await form.validateFields(); const nextValues = Object.entries(values).map(([key, value]) => { const item = parameters[Number(key)]; let nextValue = value; if (Array.isArray(value)) { nextValue = ``; value.forEach((x) => { nextValue += x?.originFileObj instanceof File ? `${x.name}\n${x.response?.data}\n----\n` // Here, the file content always ends in '\n' : `${x.url}\n${x.result}\n----\n`; }); } return { ...item, value: nextValue }; }); ok(nextValues); }, [form, ok, parameters]); ``` while in the `agent/component/retrieval.py`: ```python def _run(self, history, **kwargs): query = self.get_input() query = str(query["content"][0]) if "content" in query else "" lines = query.split('\n') # inputs are split to ['xxx','yyy','----',''] query = lines[-1] if lines else "" # Here we always get '', thus no result kbs = KnowledgebaseService.get_by_ids(self._param.kb_ids) if not kbs: return Retrieval.be_output("") ``` so the code will never got correct result. I'm not sure why the input needs such a split here, so I just removed the splitting, and it works well on my side. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-03-31 11:26:49 +08:00
liu an	aca4cf4369	Test: Added test cases for Retrieval Chunks HTTP API (#6649 ) ### What problem does this PR solve? cover [retrieval chunk](https://ragflow.io/docs/v0.17.2/http_api_reference#retrieve-chunks) endpoints ### Type of change - [x] add test cases	2025-03-31 10:05:35 +08:00
Song Fuchang	9aa047257a	Fix agent completion requiring calling twice with parameters in begin component (#6659 ) ### What problem does this PR solve? Fix #5418 Actually, the fix #4329 also works for agent flows with parameters, so this PR just relaxes the `else` branch of that. With this PR, it works fine on my side, may need more testing to make sure this does not break something. I guess the real problem may be deeply hidden in the code which relates to conversation and canvas execution. After a few hours of debugging, I see the only difference between with and without parameters in `begin` component, is the `history` field of canvas data. When the `begin` component contains some parameters, the debug log shows: ``` 025-03-29 19:50:38,521 DEBUG 356590 { "component_name": "Begin", "params": {"output_var_name": "output", "message_history_window_size": 22, "query": [{"type": "fileUrls", "key": "fileUrls", "name": "files", "optional": true, "value": "问题.txt\n今天天气怎么样"}], "inputs": [], "debug_inputs": [], "prologue": "你好！我是你的助理，有什么可以帮到你的吗？", "output": null}, "output": null, "inputs": [] }, history: [["user", "请回答我上传文件中的问题。"]], kwargs: {"stream": false} 2025-03-29 19:50:38,523 DEBUG 356590 { "component_name": "Answer", "params": {"output_var_name": "output", "message_history_window_size": 22, "query": [], "inputs": [], "debug_inputs": [], "post_answers": [], "output": null}, "output": null, "inputs": [] }, history: [["user", "请回答我上传文件中的问题。"]], kwargs: {"stream": false} ``` Then it does not go further along the flow. When the `begin` component does not contain any parameter, the debug log shows: ``` 2025-03-29 19:41:13,518 DEBUG 353596 { "component_name": "Begin", "params": {"output_var_name": "output", "message_history_window_size": 22, "query": [], "inputs": [], "debug_inputs": [], "prologue": "你好！我是你的助理，有什么可以帮到你的吗？", "output": null}, "output": null, "inputs": [] }, history: [], kwargs: {"stream": false} 2025-03-29 19:41:13,520 DEBUG 353596 { "component_name": "Answer", "params": {"output_var_name": "output", "message_history_window_size": 22, "query": [], "inputs": [], "debug_inputs": [], "post_answers": [], "output": null}, "output": null, "inputs": [] }, history: [], kwargs: {"stream": false} 2025-03-29 19:41:13,556 INFO 353596 127.0.0.1 - - [29/Mar/2025 19:41:13] "POST /api/v1/agents/fee6886a0c6f11f09b48eb8798e9aa9b/sessions?user_id=123 HTTP/1.1" 200 - 2025-03-29 19:41:21,115 DEBUG 353596 Canvas.prepare2run: Retrieval:LateGuestsNotice 2025-03-29 19:41:21,116 DEBUG 353596 { "component_name": "Retrieval", "params": {"output_var_name": "output", "message_history_window_size": 22, "query": [], "inputs": [], "debug_inputs": [], "similarity_threshold": 0.2, "keywords_similarity_weight": 0.3, "top_n": 8, "top_k": 1024, "kb_ids": ["9aca3c700c5911f0811caf35658b9385"], "rerank_id": "", "empty_response": "", "tavily_api_key": "", "use_kg": false, "output": null}, "output": null, "inputs": [] }, history: [["user", "请回答我上传文件中的问题。"]], kwargs: {"stream": false} ``` It correctly goes along the flow and generates correct answer. You can see the difference: when the `begin` component has any parameter, the `history` field is filled from the beginning, while it is just `[]` if the `begin` component has no parameter. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-03-31 09:57:56 +08:00
Zhichang Yu	65a8cd1772	Fix knowledge_graph_kwd on infinity. Close #6476 and #6624 (#6651 ) ### What problem does this PR solve? Fix knowledge_graph_kwd on infinity. Close #6476 and #6624 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-28 22:05:40 +08:00
Kevin Hu	563a84beaf	Docs: fix retrieval docs. (#6633 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2025-03-28 16:03:37 +08:00
Zhichang Yu	d32a35d8fd	Fix entity_types. Close #6287 and #6608 (#6632 ) ### What problem does this PR solve? Fix entity_types. Close #6287 and #6608 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-28 15:00:24 +08:00
Wanderson Pinto dos Santos	2632493c8b	Consolidate entrypoint to support broader deployment scenarios (#6566 ) ### What problem does this PR solve? This PR gives better control over how we distribute which service will be loaded. With this approach, we can create containers to run only the web server and others to run the task executor. It also introduces the unique ID per task executor host, this will be important when scaling task executors horizontally, considering unique task executor ids will be required. This new `entrypoint.sh` maintains the default behavior of starting the web server and task executor in the same host. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [X] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-03-28 12:39:34 +08:00
Marcus Yuan	c61df5dd25	Dynamic Context Window Size for Ollama Chat (#6582 ) # Dynamic Context Window Size for Ollama Chat ## Problem Statement Previously, the Ollama chat implementation used a fixed context window size of 32768 tokens. This caused two main issues: 1. Performance degradation due to unnecessarily large context windows for small conversations 2. Potential business logic failures when using smaller fixed sizes (e.g., 2048 tokens) ## Solution Implemented a dynamic context window size calculation that: 1. Uses a base context size of 8192 tokens 2. Applies a 1.2x buffer ratio to the total token count 3. Adds multiples of 8192 tokens based on the buffered token count 4. Implements a smart context size update strategy ## Implementation Details ### Token Counting Logic ```python def count_tokens(text): """Calculate token count for text""" # Simple calculation: 1 token per ASCII character # 2 tokens for non-ASCII characters (Chinese, Japanese, Korean, etc.) total = 0 for char in text: if ord(char) < 128: # ASCII characters total += 1 else: # Non-ASCII characters total += 2 return total ``` ### Dynamic Context Calculation ```python def _calculate_dynamic_ctx(self, history): """Calculate dynamic context window size""" # Calculate total tokens for all messages total_tokens = 0 for message in history: content = message.get("content", "") content_tokens = count_tokens(content) role_tokens = 4 # Role marker token overhead total_tokens += content_tokens + role_tokens # Apply 1.2x buffer ratio total_tokens_with_buffer = int(total_tokens * 1.2) # Calculate context size in multiples of 8192 if total_tokens_with_buffer <= 8192: ctx_size = 8192 else: ctx_multiplier = (total_tokens_with_buffer // 8192) + 1 ctx_size = ctx_multiplier * 8192 return ctx_size ``` ### Integration in Chat Method ```python def chat(self, system, history, gen_conf): if system: history.insert(0, {"role": "system", "content": system}) if "max_tokens" in gen_conf: del gen_conf["max_tokens"] try: # Calculate new context size new_ctx_size = self._calculate_dynamic_ctx(history) # Prepare options with context size options = { "num_ctx": new_ctx_size } # Add other generation options if "temperature" in gen_conf: options["temperature"] = gen_conf["temperature"] if "max_tokens" in gen_conf: options["num_predict"] = gen_conf["max_tokens"] if "top_p" in gen_conf: options["top_p"] = gen_conf["top_p"] if "presence_penalty" in gen_conf: options["presence_penalty"] = gen_conf["presence_penalty"] if "frequency_penalty" in gen_conf: options["frequency_penalty"] = gen_conf["frequency_penalty"] # Make API call with dynamic context size response = self.client.chat( model=self.model_name, messages=history, options=options, keep_alive=60 ) return response["message"]["content"].strip(), response.get("eval_count", 0) + response.get("prompt_eval_count", 0) except Exception as e: return "ERROR: " + str(e), 0 ``` ## Benefits 1. Improved Performance: Uses appropriate context windows based on conversation length 2. Better Resource Utilization: Context window size scales with content 3. Maintained Compatibility: Works with existing business logic 4. Predictable Scaling: Context growth in 8192-token increments 5. Smart Updates: Context size updates are optimized to reduce unnecessary model reloads ## Future Considerations 1. Fine-tune buffer ratio based on usage patterns 2. Add monitoring for context window utilization 3. Consider language-specific token counting optimizations 4. Implement adaptive threshold based on conversation patterns 5. Add metrics for context size update frequency --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-03-28 12:38:27 +08:00

1 2 3 4 5 ...

2738 Commits