Refactor :minor change in server

2025-06-06 11:05:26 +00:00 · 2025-03-07 13:56:14 +01:00 · 2025-03-07 13:56:14 +01:00 · 7271466e30
commit 7271466e30
parent e0b181d8d1 77c9d1bc7b
4 changed files with 103 additions and 26 deletions
--- a/README.md
+++ b/README.md
@ -1,7 +1,7 @@

-# 🚀 agenticSeek: Local AI Assistant Powered by DeepSeek Agents  
+# AgenticSeek: Fully local AI Assistant Powered by Deepseek R1 Agents.

-**A fully local AI assistant** using Deepseek R1 agents.
+**A fully local AI assistant** using AI agents. The goal of the project is to create a truly Jarvis like assistant using reasoning model such as deepseek R1. 

 > 🛠️ **Work in Progress** – Looking for contributors! 🚀  
 ---
@ -11,13 +11,19 @@
 -  **Privacy-first**: Runs 100% locally – **no data leaves your machine**  
 - ️ **Voice-enabled**: Speak and interact naturally
 - **Coding abilities**: Code in Python, Bash, C, Golang, and soon more
-  **Self-correcting**: Automatically fixes errors by itself
+-  **Trial-and-error**: Automatically fixes code or command upon execution failure
 - **Agent routing**: Select the best agent for the task
 - **Multi-agent**: For complex tasks, divide and conquer with multiple agents
-  **Web browsing (not implemented yet)**: Browse the web and search the internet  
+- **Tools:**: All agents have their respective tools ability. Basic search, flight API, files explorer, etc...
+-  **Web browsing (not implemented yet)**: Browse the web autonomously to conduct task.

 ---

+
+
+---
+
+
 ## Installation  

 ### 1️⃣ **Install Dependencies**  
@ -57,21 +63,25 @@ Run the assistant:
 python3 main.py
 ```

-### 4️⃣ **Alternative: Run the Assistant (Own Server)**  
+### 4️⃣ **Alternative: Run the LLM on your own server**  


-Get the ip address of the machine that will run the model
+On your "server" that will run the AI model, get the ip address

 ```sh
 ip a | grep "inet " | grep -v 127.0.0.1 | awk '{print $2}' | cut -d/ -f1
 ```

-On the other machine that will run the model execute the script in stream_llm.py
+Clone the repository and then, run the script `stream_llm.py` in `server/`

 ```sh
 python3 stream_llm.py
 ```

+Now on your personal computer:
+
+Clone the repository.
+
 Change the `config.ini` file to set the `provider_name` to `server` and `provider_model` to `deepseek-r1:7b`.
 Set the `provider_server_address` to the ip address of the machine that will run the model.

@ -89,21 +99,40 @@ Run the assistant:
 python3 main.py
 ```

+## Provider
+
+Currently the only provider are :
+- ollama -> Use ollama running on your computer. Ollama program for running locally large language models.
+- server -> A custom script that allow you to have the LLM model run on another machine. Currently it use ollama but we'll switch to other options soon.
+- openai -> Use ChatGPT API (not private).
+- deepseek -> Deepseek API (not private).
+
+To select a provider change the config.ini:
+
+```
+is_local = False
+provider_name = openai
+provider_model = gpt-4o
+provider_server_address = 127.0.0.1:5000
+```
+is_local: should be True for any locally running LLM, otherwise False.
+
+provider_name: Select the provider to use by its name, see the provider list above.
+
+provider_model: Set the model to use by the agent.
+
+provider_server_address: can be set to anything if you are not using the server provider.
+
+
 ## Current capabilities

 - All running locally
 - Reasoning with deepseek R1
- Code execution capabilities (Python, Golang, C)
+- Code execution capabilities (Python, Golang, C, etc..)
 - Shell control capabilities in bash
 - Will try to fix errors by itself
 - Routing system, select the best agent for the task
 - Fast text-to-speech using kokoro.
+- Speech to text.
 - Memory compression (reduce history as interaction progresses using summary model) 
- Recovery: recover last session from memory
-
-## UNDER DEVELOPMENT
-
- Web browsing
- Knowledge base RAG
- Graphical interface
- Speech-to-text using distil-whisper/distil-medium.en
+- Recovery: recover and save session from filesystem.
--- a/media/demo_img.png
+++ b/media/demo_img.png
--- a/server/config.json
+++ b/server/config.json
@ -0,0 +1,30 @@
+{
+    "model_name": "deepseek-r1:14b",
+    "known_models": [
+        "qwq:32b",
+        "deepseek-r1:1.5b",
+        "deepseek-r1:7b",
+        "deepseek-r1:14b",
+        "deepseek-r1:32b",
+        "deepseek-r1:70b",
+        "deepseek-r1:671b",
+        "deepseek-coder:1.3b",
+        "deepseek-coder:6.7b",
+        "deepseek-coder:33b",
+        "llama2-uncensored:7b",
+        "llama2-uncensored:70b",
+        "llama3.1:8b",
+        "llama3.1:70b",
+        "llama3.3:70b",
+        "llama3:8b",
+        "llama3:70b",
+        "i4:14b",
+        "mistral:7b",
+        "mistral:70b",
+        "mistral:33b",
+        "qwen1:7b",
+        "qwen1:14b",
+        "qwen1:32b",
+        "qwen1:70b"
+    ]
+}
--- a/server/stream_llm.py
+++ b/server/stream_llm.py
@ -2,25 +2,43 @@ from flask import Flask, jsonify, request
 import threading
 import ollama
 import logging
+import json

 log = logging.getLogger('werkzeug')
 log.setLevel(logging.ERROR)

 app = Flask(__name__)

-model = 'deepseek-r1:14b'
-
 # Shared state with thread-safe locks
+
+class Config:
+    def __init__(self):
+        self.model = None 
+        self.known_models = []
+        self.allowed_models = []
+        self.model_name = None
+
+    def load(self):
+        with open('config.json', 'r') as f:
+            data = json.load(f)
+            self.known_models = data['known_models']
+            self.model_name = data['model_name']
+    
+    def validate_model(self, model):
+        if model not in self.known_models:
+            raise ValueError(f"Model {model} is not known")
+
 class GenerationState:
    def __init__(self):
        self.lock = threading.Lock()
        self.last_complete_sentence = ""
        self.current_buffer = ""
        self.is_generating = False
+        self.model = None

 state = GenerationState()
            
-def generate_response(history, model):
+def generate_response(history):
    global state
    try:
        with state.lock:
@ -29,21 +47,18 @@ def generate_response(history, model):
            state.current_buffer = ""

        stream = ollama.chat(
-            model=model,
+            model=state.model,
            messages=history,
            stream=True,
        )
-
        for chunk in stream:
            content = chunk['message']['content']
            print(content, end='', flush=True)
-
            with state.lock:
                state.current_buffer += content
-
    except ollama.ResponseError as e:
        if e.status_code == 404:
-            ollama.pull(model)
+            ollama.pull(state.model)
        with state.lock:
            state.is_generating = False
        print(f"Error: {e}")
@ -61,8 +76,7 @@ def start_generation():
            return jsonify({"error": "Generation already in progress"}), 400
        
        history = data.get('messages', [])
-        # Start generation in background thread
-        threading.Thread(target=generate_response, args=(history, model)).start()
+        threading.Thread(target=generate_response, args=(history, state.model)).start()
    return jsonify({"message": "Generation started"}), 202

@app.route('/get_updated_sentence')
@ -75,4 +89,8 @@ def get_updated_sentence():
        })

 if __name__ == '__main__':
+    config = Config()
+    config.load()
+    config.validate_model(config.model_name)
+    state.model = config.model_name
    app.run(host='0.0.0.0', port=5000, debug=False, threaded=True)