Improved ollama doc (#3787)

### What problem does this PR solve? Improved ollama doc. Close #3723 ### Type of change - [x] Documentation Update
2025-08-13 10:09:01 +08:00 · 2024-12-02 17:28:30 +08:00 · 2024-12-02 17:28:30 +08:00 · 9d093547e8
commit 9d093547e8
parent c5f13629af
1 changed files with 41 additions and 38 deletions
--- a/docs/guides/deploy_local_llm.mdx
+++ b/docs/guides/deploy_local_llm.mdx
@ -17,7 +17,7 @@ RAGFlow seamlessly integrates with Ollama and Xinference, without the need for f
 This user guide does not intend to cover much of the installation or configuration details of Ollama or Xinference; its focus is on configurations inside RAGFlow. For the most current information, you may need to check out the official site of Ollama or Xinference.
 :::

-## Deploy a local model using Ollama
+## Deploy local models using Ollama

 [Ollama](https://github.com/ollama/ollama) enables you to run open-source large language models that you deployed locally. It bundles model weights, configurations, and data into a single package, defined by a Modelfile, and optimizes setup and configurations, including GPU usage.

@ -27,35 +27,54 @@ This user guide does not intend to cover much of the installation or configurati
 - For a complete list of supported models and variants, see the [Ollama model library](https://ollama.com/library).
 :::

-To deploy a local model, e.g., **Llama3**, using Ollama: 
+### 1. Deploy ollama using docker

-### 1. Check firewall settings
-
-Ensure that your host machine's firewall allows inbound connections on port 11434. For example:
-   
 ```bash
-sudo ufw allow 11434/tcp
+sudo docker run --name ollama -p 11434:11434 ollama/ollama
+time=2024-12-02T02:20:21.360Z level=INFO source=routes.go:1248 msg="Listening on [::]:11434 (version 0.4.6)"
+time=2024-12-02T02:20:21.360Z level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12]"
 ```
+
+Ensure ollama is listening on all IP address:
+```bash
+sudo ss -tunlp|grep 11434
+tcp   LISTEN 0      4096                  0.0.0.0:11434      0.0.0.0:*    users:(("docker-proxy",pid=794507,fd=4))
+tcp   LISTEN 0      4096                     [::]:11434         [::]:*    users:(("docker-proxy",pid=794513,fd=4))
+```
+
+Pull models as you need. It's recommended to start with `llama3.2` (a 3B chat model) and `bge-m3` (a 567M embedding model):
+```bash
+sudo docker exec ollama ollama pull llama3.2
+pulling dde5aa3fc5ff... 100% ▕████████████████▏ 2.0 GB
+success
+```
+
+```bash
+sudo docker exec ollama ollama pull bge-m3                 
+pulling daec91ffb5dd... 100% ▕████████████████▏ 1.2 GB                                  
+success 
+```
+
 ### 2. Ensure Ollama is accessible

-Restart system and use curl or your web browser to check if the service URL of your Ollama service at `http://localhost:11434` is accessible.
-   
+If RAGFlow runs in Docker and Ollama runs on the same host machine, check if ollama is accessiable from inside the RAGFlow container:
 ```bash
+sudo docker exec -it ragflow-server bash
+root@8136b8c3e914:/ragflow# curl  http://host.docker.internal:11434/
 Ollama is running
 ```

-### 3. Run your local model
-
+If RAGFlow runs from source code and Ollama runs on the same host machine, check if ollama is accessiable from RAGFlow host machine:
 ```bash
-ollama run llama3
+curl  http://localhost:11434/
+Ollama is running
 ```
-<details>
-  <summary>If your Ollama is installed through Docker, run the following instead:</summary>

-   ```bash
-   docker exec -it ollama ollama run llama3
-   ```
-</details>
+If RAGFlow and Ollama run on different machines, check if ollama is accessiable from RAGFlow host machine:
+```bash
+curl  http://${IP_OF_OLLAMA_MACHINE}:11434/
+Ollama is running
+```

 ### 4. Add Ollama

@ -68,26 +87,10 @@ In RAGFlow, click on your logo on the top right of the page **>** **Model Provid

 In the popup window, complete basic settings for Ollama:

-1. Because **llama3** is a chat model, choose **chat** as the model type.
-2. Ensure that the model name you enter here *precisely* matches the name of the local model you are running with Ollama.
-3. Ensure that the base URL you enter is accessible to RAGFlow.
-4. OPTIONAL: Switch on the toggle under **Does it support Vision?** if your model includes an image-to-text model.
+1. Ensure model name and type match those been pulled at step 1, For example, (`llama3.2`, `chat`), (`bge-m3`, `embedding`).
+2. Ensure that the base URL match which been determined at step 2.
+3. OPTIONAL: Switch on the toggle under **Does it support Vision?** if your model includes an image-to-text model.

-:::caution NOTE
- If RAGFlow is in Docker and Ollama runs on the same host machine, use `http://host.docker.internal:11434` as base URL.
- If your Ollama and RAGFlow run on the same machine, use `http://localhost:11434` as base URL.
- If your Ollama runs on a different machine from RAGFlow, use `http://<IP_OF_OLLAMA_MACHINE>:11434` as base URL.
-:::
-
-:::danger WARNING
-If your Ollama runs on a different machine, you may also need to set the `OLLAMA_HOST` environment variable to `0.0.0.0` in **ollama.service** (Note that this is *NOT* the base URL):
-
-```bash
-Environment="OLLAMA_HOST=0.0.0.0"
-```
-
-See [this guide](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server) for more information.
-:::

 :::caution WARNING
 Improper base URL settings will trigger the following error:
@ -100,7 +103,7 @@ Max retries exceeded with url: /api/chat (Caused by NewConnectionError('<urllib3

 Click on your logo **>** **Model Providers** **>** **System Model Settings** to update your model: 
   
-*You should now be able to find **llama3** from the dropdown list under **Chat model**.*
+*You should now be able to find **llama3.2** from the dropdown list under **Chat model**, and **bge-m3** from the dropdown list under **Embedding model**.*

 > If your local model is an embedding model, you should find your local model under **Embedding model**.