Refactor :minor change in server

2025-06-06 19:15:28 +00:00 · 2025-03-07 13:56:14 +01:00 · 2025-03-07 13:56:14 +01:00 · 7271466e30
commit 7271466e30
parent e0b181d8d1 77c9d1bc7b
4 changed files with 103 additions and 26 deletions
--- a/README.md
+++ b/README.md
@ -1,7 +1,7 @@
-# 🚀 agenticSeek: Local AI Assistant Powered by DeepSeek Agents  
+# AgenticSeek: Fully local AI Assistant Powered by Deepseek R1 Agents.
-**A fully local AI assistant** using Deepseek R1 agents.
+**A fully local AI assistant** using AI agents. The goal of the project is to create a truly Jarvis like assistant using reasoning model such as deepseek R1. 
 > 🛠️ **Work in Progress** – Looking for contributors! 🚀  
 ---
@ -11,13 +11,19 @@
 -  **Privacy-first**: Runs 100% locally – **no data leaves your machine**  
 - ️ **Voice-enabled**: Speak and interact naturally
 - **Coding abilities**: Code in Python, Bash, C, Golang, and soon more
-  **Self-correcting**: Automatically fixes errors by itself
+-  **Trial-and-error**: Automatically fixes code or command upon execution failure
 - **Agent routing**: Select the best agent for the task
 - **Multi-agent**: For complex tasks, divide and conquer with multiple agents
-  **Web browsing (not implemented yet)**: Browse the web and search the internet  
+- **Tools:**: All agents have their respective tools ability. Basic search, flight API, files explorer, etc...
 -  **Web browsing (not implemented yet)**: Browse the web autonomously to conduct task.
 ---
 ---
 ## Installation  
 ### 1️⃣ **Install Dependencies**  
@ -57,21 +63,25 @@ Run the assistant:
 python3 main.py
 ```
-### 4️⃣ **Alternative: Run the Assistant (Own Server)**  
+### 4️⃣ **Alternative: Run the LLM on your own server**  
-Get the ip address of the machine that will run the model
+On your "server" that will run the AI model, get the ip address
 ```sh
 ip a | grep "inet " | grep -v 127.0.0.1 | awk '{print $2}' | cut -d/ -f1
 ```
-On the other machine that will run the model execute the script in stream_llm.py
+Clone the repository and then, run the script `stream_llm.py` in `server/`
 ```sh
 python3 stream_llm.py
 ```
 Now on your personal computer:
 Clone the repository.
 Change the `config.ini` file to set the `provider_name` to `server` and `provider_model` to `deepseek-r1:7b`.
 Set the `provider_server_address` to the ip address of the machine that will run the model.
@ -89,21 +99,40 @@ Run the assistant:
 python3 main.py
 ```
 ## Provider
 Currently the only provider are :
 - ollama -> Use ollama running on your computer. Ollama program for running locally large language models.
 - server -> A custom script that allow you to have the LLM model run on another machine. Currently it use ollama but we'll switch to other options soon.
 - openai -> Use ChatGPT API (not private).
 - deepseek -> Deepseek API (not private).
 To select a provider change the config.ini:
 ```
 is_local = False
 provider_name = openai
 provider_model = gpt-4o
 provider_server_address = 127.0.0.1:5000
 ```
 is_local: should be True for any locally running LLM, otherwise False.
 provider_name: Select the provider to use by its name, see the provider list above.
 provider_model: Set the model to use by the agent.
 provider_server_address: can be set to anything if you are not using the server provider.
 ## Current capabilities
 - All running locally
 - Reasoning with deepseek R1
- Code execution capabilities (Python, Golang, C)
+- Code execution capabilities (Python, Golang, C, etc..)
 - Shell control capabilities in bash
 - Will try to fix errors by itself
 - Routing system, select the best agent for the task
 - Fast text-to-speech using kokoro.
 - Speech to text.
 - Memory compression (reduce history as interaction progresses using summary model) 
- Recovery: recover last session from memory
+- Recovery: recover and save session from filesystem.
 ## UNDER DEVELOPMENT
 - Web browsing
 - Knowledge base RAG
 - Graphical interface
 - Speech-to-text using distil-whisper/distil-medium.en
--- a/media/demo_img.png
+++ b/media/demo_img.png
--- a/server/config.json
+++ b/server/config.json
@ -0,0 +1,30 @@
 {
    "model_name": "deepseek-r1:14b",
    "known_models": [
        "qwq:32b",
        "deepseek-r1:1.5b",
        "deepseek-r1:7b",
        "deepseek-r1:14b",
        "deepseek-r1:32b",
        "deepseek-r1:70b",
        "deepseek-r1:671b",
        "deepseek-coder:1.3b",
        "deepseek-coder:6.7b",
        "deepseek-coder:33b",
        "llama2-uncensored:7b",
        "llama2-uncensored:70b",
        "llama3.1:8b",
        "llama3.1:70b",
        "llama3.3:70b",
        "llama3:8b",
        "llama3:70b",
        "i4:14b",
        "mistral:7b",
        "mistral:70b",
        "mistral:33b",
        "qwen1:7b",
        "qwen1:14b",
        "qwen1:32b",
        "qwen1:70b"
    ]
 }
--- a/server/stream_llm.py
+++ b/server/stream_llm.py
@ -2,25 +2,43 @@ from flask import Flask, jsonify, request
 import threading
 import ollama
 import logging
 import json
 log = logging.getLogger('werkzeug')
 log.setLevel(logging.ERROR)
 app = Flask(__name__)
 model = 'deepseek-r1:14b'
 # Shared state with thread-safe locks
 class Config:
    def __init__(self):
        self.model = None 
        self.known_models = []
        self.allowed_models = []
        self.model_name = None
    def load(self):
        with open('config.json', 'r') as f:
            data = json.load(f)
            self.known_models = data['known_models']
            self.model_name = data['model_name']
    def validate_model(self, model):
        if model not in self.known_models:
            raise ValueError(f"Model {model} is not known")
 class GenerationState:
    def __init__(self):
        self.lock = threading.Lock()
        self.last_complete_sentence = ""
        self.current_buffer = ""
        self.is_generating = False
        self.model = None
 state = GenerationState()
-def generate_response(history, model):
+def generate_response(history):
    global state
    try:
        with state.lock:
@ -29,21 +47,18 @@ def generate_response(history, model):
            state.current_buffer = ""
        stream = ollama.chat(
-            model=model,
+            model=state.model,
            messages=history,
            stream=True,
        )
        for chunk in stream:
            content = chunk['message']['content']
            print(content, end='', flush=True)
            with state.lock:
                state.current_buffer += content
    except ollama.ResponseError as e:
        if e.status_code == 404:
-            ollama.pull(model)
+            ollama.pull(state.model)
        with state.lock:
            state.is_generating = False
        print(f"Error: {e}")
@ -61,8 +76,7 @@ def start_generation():
            return jsonify({"error": "Generation already in progress"}), 400
        history = data.get('messages', [])
-        # Start generation in background thread
+        threading.Thread(target=generate_response, args=(history, state.model)).start()
        threading.Thread(target=generate_response, args=(history, model)).start()
    return jsonify({"message": "Generation started"}), 202
@app.route('/get_updated_sentence')
@ -75,4 +89,8 @@ def get_updated_sentence():
        })
 if __name__ == '__main__':
    config = Config()
    config.load()
    config.validate_model(config.model_name)
    state.model = config.model_name
    app.run(host='0.0.0.0', port=5000, debug=False, threaded=True)