Merge pull request #92 from Fosowl/dev

Improve web form handling & Better prompt for planner agent.
2025-06-06 11:05:26 +00:00 · 2025-04-01 19:47:25 +02:00 · 2025-04-01 19:47:25 +02:00 · d476cf91dc
commit d476cf91dc
parent 9e5b2c5ed7 02d28b4322
21 changed files with 625 additions and 70 deletions
--- a/README_FR.md
+++ b/README_FR.md
@ -0,0 +1,424 @@
+
+
+
+# AgenticSeek: Une IA comme Manus mais à base d'agents DeepSeek R1 fonctionnant en local.
+
+Une alternative **entièrement locale** à Manus AI, un assistant vocal IA qui code, explore votre système de fichiers, navigue sur le web et corrige ses erreurs, tout cela sans envoyer la moindre donnée dans le cloud. Construit avec des modèles de raisonnement comme DeepSeek R1, cet agent autonome fonctionne entièrement sur votre hardware, garantissant la confidentialité de vos données.
+
+[![Visit AgenticSeek](https://img.shields.io/static/v1?label=Website&message=AgenticSeek&color=blue&style=flat-square)](https://fosowl.github.io/agenticSeek.html) ![License](https://img.shields.io/badge/license-GPL--3.0-green) [![Discord](https://img.shields.io/badge/Discord-Join%20Us-7289DA?logo=discord&logoColor=white)](https://discord.gg/4Ub2D6Fj)
+
+> 🛠️ **En cours de développement** – On cherche activement des contributeurs!
+
+![alt text](./media/whale_readme.jpg)
+
+> *Do a deep search of AI startup in Osaka and Tokyo, find at least 5, then save in the research_japan.txt file*
+
+> *Can you make a tetris game in C ?*
+
+> *I would like to setup a new project file index as mark2.*
+
+
+### agenticSeek peut planifier des taches!
+
+![alt text](./media/exemples/demo_image.png)
+
+## Fonctionnalités:
+
+- **100% Local**: Fonctionne en local sur votre PC. Vos données restent les vôtres. 
+
+- **Accès à vos Fichiers**: Utilise bash pour naviguer et manipuler vos fichiers.
+
+- **Codage semi-autonome**: Peut écrire, déboguer et exécuter du code en Python, C, Golang et d'autres langages à venir. 
+
+- **Routage d'Agent**: Sélectionne automatiquement l’agent approprié pour la tâche. 
+
+- **Planification**: Pour les taches complexe utilise plusieurs agents.
+
+- **Navigation Web Autonome**: Navigation web autonome.
+
+- **Memoire efficace**: Gestion efficace de la mémoire et des sessions. 
+
+---
+
+## **Installation**
+
+Assurez-vous d’avoir installé le pilote Chrome, Docker et Python 3.10 (ou une version plus récente).
+
+Pour les problèmes liés au pilote Chrome, consultez la section Chromedriver.
+
+### 1️⃣ Cloner le dépôt et configurer
+
+```sh
+git clone https://github.com/Fosowl/agenticSeek.git
+cd agenticSeek
+mv .env.example .env
+```
+
+### 2 **Créer un environnement virtuel**
+
+```sh
+python3 -m venv agentic_seek_env
+source agentic_seek_env/bin/activate     
+# On Windows: agentic_seek_env\Scripts\activate
+```
+
+### 3️⃣ **Installation**
+
+**Automatique:**
+
+```sh
+./install.sh
+```
+
+**Manuel:**
+
+```sh
+pip3 install -r requirements.txt
+```
+
+
+## Faire fonctionner sur votre machine 
+
+**Nous recommandons d’utiliser au moins DeepSeek 14B, les modèles plus petits ont du mal avec l’utilisation des outils et oublient rapidement le contexte.**
+
+### 1️⃣ **Téléchargement du modèle**  
+
+Assurer vous d'avoir [Ollama](https://ollama.com/) installé.
+
+Télécharger `deepseek-r1:14b` de [DeepSeek](https://deepseek.com/models)
+
+```sh
+ollama pull deepseek-r1:14b
+```
+
+### 2️ **Démarrage d'ollama**  
+
+```sh
+ollama serve
+```
+
+Modifiez le fichier config.ini pour définir provider_name sur ollama et provider_model sur deepseek-r1:14b
+
+```sh
+[MAIN]
+is_local = True
+provider_name = ollama
+provider_model = deepseek-r1:14b
+provider_server_address = 127.0.0.1:11434
+```
+
+démarrer tous les services :
+
+```sh
+sudo ./start_services.sh
+```
+
+Lancer l'assitant:
+
+```sh
+python3 main.py
+```
+
+Voir la section **Utilisation** si vous ne comprenez pas comment l’utiliser
+
+Voir la section **Problèmes** connus si vous rencontrez des problèmes
+
+Voir la section **Exécuter** avec une API si votre matériel ne peut pas exécuter DeepSeek localement
+
+Voir la section **Configuration** pour une explication détaillée du fichier de configuration.
+
+---
+
+## Utilisation
+
+Avertissement : actuellement, le système qui choisit le meilleur agent IA fonctionnera mal avec du texte non anglophone. Cela est dû au fait que le routage des agents utilise un modèle entraîné sur du texte en anglais. Nous travaillons dur pour corriger cela. Veuillez utiliser l’anglais pour le moment.
+
+Assurez-vous que les services sont en cours d’exécution avec ./start_services.sh et lancez AgenticSeek avec python3 main.py
+
+```sh
+sudo ./start_services.sh
+python3 main.py
+```
+
+Vous verrez un prompt: ">>> "
+Cela indique qu’AgenticSeek attend que vous saisissiez des instructions.
+Vous pouvez également utiliser la reconnaissance vocale en définissant listen = True dans la configuration.
+
+Pour quitter, dites simplement `goodbye`.
+
+Voici quelques exemples d’utilisation :
+
+### Programmation
+
+> *Help me with matrix multiplication in Golang*
+
+> *Scan my network with nmap, find if any suspicious devices is connected*
+
+> *Make a snake game in python*
+
+### Recherche web
+
+> *Do a web search to find cool tech startup in Japan working on cutting edge AI research*
+
+> *Can you find on the internet who created agenticSeek?*
+
+> *Can you find on which website I can buy a rtx 4090 for cheap*
+
+### Fichier
+
+> *Hey can you find where is million_dollars_contract.pdf i lost it*
+
+> *Show me how much space I have left on my disk*
+
+> *Find and read the README.md and follow the install instruction*
+
+### Conversation
+
+> *Tell me about France*
+
+> *What is the meaning of life ?*
+
+> *Should I take creatine before or after workout?*
+
+
+Après avoir saisi votre requête, AgenticSeek attribuera le meilleur agent pour la tâche.
+
+Comme il s’agit d’un prototype, le système de routage des agents pourrait ne pas toujours attribuer le bon agent en fonction de votre requête.
+
+Par conséquent, vous devez être explicite sur ce que vous voulez et sur la manière dont l’IA doit procéder. Par exemple, si vous voulez qu’elle effectue une recherche sur le web, ne dites pas :
+
+Connait-tu de bons pays pour voyager seul ?
+
+Dites plutôt :
+
+Fait une recherche sur le web, quels sont les meilleurs pays pour voyager seul?
+
+---
+
+## **Exécuter le LLM sur votre propre serveur**  
+
+Si vous disposez d’un ordinateur puissant ou d’un serveur que vous voulez utiliser, mais que vous souhaitez y accéder depuis votre ordinateur portable, vous avez la possibilité d’exécuter le LLM sur un serveur distant.
+
+### 1️⃣  **Configurer et démarrer les scripts du serveur** 
+
+Sur votre "serveur" qui exécutera le modèle IA, obtenez l’adresse IP
+
+```sh
+ip a | grep "inet " | grep -v 127.0.0.1 | awk '{print $2}' | cut -d/ -f1
+```
+
+Remarque : Pour Windows ou macOS, utilisez respectivement ipconfig ou ifconfig pour trouver l’adresse IP.
+
+**Si vous souhaitez utiliser un fournisseur basé sur OpenAI, suivez la section Exécuter avec une API.**
+
+Clonez le dépôt et entrez dans le dossier server/.
+
+
+```sh
+git clone --depth 1 https://github.com/Fosowl/agenticSeek.git
+cd agenticSeek/server/
+```
+
+Installez les dépendances spécifiques au serveur :
+
+```sh
+pip3 install -r requirements.txt
+```
+
+Exécutez le script du serveur.
+
+```sh
+python3 app.py --provider ollama --port 3333
+```
+
+Vous avez le choix entre utiliser ollama et llamacpp comme service LLM.
+
+### 2️⃣ **Lancer** 
+
+Maintenant, sur votre ordinateur personnel :
+
+Modifiez le fichier config.ini pour définir provider_name sur server et provider_model sur deepseek-r1:14b.
+
+Définissez provider_server_address sur l’adresse IP de la machine qui exécutera le modèle.
+
+```sh
+[MAIN]
+is_local = False
+provider_name = server
+provider_model = deepseek-r1:14b
+provider_server_address = x.x.x.x:3333
+```
+
+Exécutez l’assistant :
+
+```sh
+sudo ./start_services.sh
+python3 main.py
+```
+
+## **Exécuter avec une API**  
+
+AVERTISSEMENT : Assurez-vous qu’il n’y a pas d’espace en fin de ligne dans la configuration.
+
+Définissez is_local sur True si vous utilisez une API basée sur OpenAI localement.
+
+Changez l’adresse IP si votre API basée sur OpenAI fonctionne sur votre propre serveur.
+
+```sh
+[MAIN]
+is_local = False
+provider_name = openai
+provider_model = gpt-4o
+provider_server_address = 127.0.0.1:5000
+```
+
+Exécutez l’assistant :
+
+```sh
+sudo ./start_services.sh
+python3 main.py
+```
+
+## Config
+
+Exemple de configuration :
+```
+[MAIN]
+is_local = True
+provider_name = ollama
+provider_model = deepseek-r1:1.5b
+provider_server_address = 127.0.0.1:11434
+agent_name = Friday
+recover_last_session = False
+save_session = False
+speak = False
+listen = False
+work_dir =  /Users/mlg/Documents/ai_folder
+jarvis_personality = False
+[BROWSER]
+headless_browser = False
+stealth_mode = False
+```
+
+**Explanation**:
+
+`is_local` -> Exécute l’agent localement (True) ou sur un serveur distant (False).
+
+`provider_name` -> Le fournisseur à utiliser (parmi : ollama, server, lm-studio, deepseek-api).
+
+`provider_model` -> Le modèle utilisé, par exemple, deepseek-r1:1.5b.
+
+`provider_server_address` -> Adresse du serveur, par exemple, 127.0.0.1:11434 pour local. Définissez n’importe quoi pour une API non locale.
+
+`agent_name` -> Nom de l’agent, par exemple, Friday. Utilisé comme mot déclencheur pour la reconnaissance vocale.
+
+`recover_last_session` -> Reprend la dernière session (True) ou non (False).
+
+`save_session` -> Sauvegarde les données de la session (True) ou non (False).
+
+`speak` -> Active la sortie vocale (True) ou non (False).
+
+`listen` -> Écoute les entrées vocales (True) ou non (False).
+
+`work_dir` -> Dossier auquel l’IA aura accès, par exemple : /Users/user/Documents/.
+
+`jarvis_personality` -> Utilise une personnalité de type JARVIS (True) ou non (False). Cela modifie simplement le fichier de prompt.
+
+`headless_browser` -> Exécute le navigateur sans fenêtre visible (True) ou non (False).
+
+`stealth_mode` -> Rend la détection des bots plus difficile. Le seul inconvénient est que vous devez installer manuellement l’extension anticaptcha.
+
+
+
+## Providers
+
+Le tableau ci-dessous montre les fournisseurs disponibles :
+
+| Provider  | Local? | Description                                               |
+|-----------|--------|-----------------------------------------------------------|
+| ollama    | Yes    | Exécutez des LLM localement avec facilité en utilisant Ollama comme fournisseur LLM 
+| server    | Yes    | Hébergez le modèle sur une autre machine, exécutez sur votre machine locale 
+| lm-studio  | Yes    | Exécutez un LLM localement avec LM Studio (définissez provider_name sur lm-studio) 
+| openai    | No     | Utilise ChatGPT API (pas privé) |
+| deepseek-api  | No     | Deepseek API (pas privé) |
+| huggingface| No    | Hugging-Face API (pas privé) |
+
+Pour sélectionner un fournisseur, modifiez le config.ini :
+
+```
+is_local = False
+provider_name = openai
+provider_model = gpt-4o
+provider_server_address = 127.0.0.1:5000
+```
+
+`is_local` : doit être True pour tout LLM exécuté localement, sinon False.
+
+`provider_name` : Sélectionnez le fournisseur à utiliser par son nom, voir la liste des fournisseurs ci-dessus.
+
+`provider_model` : Définissez le modèle à utiliser par l’agent.
+
+`provider_server_address` : peut être défini sur n’importe quoi si vous n’utilisez pas le fournisseur server.
+
+# Problèmes connus 
+
+## Problèmes avec Chromedriver
+
+Erreur #1:**incompatibilité**
+
+`Exception: Failed to initialize browser: Message: session not created: This version of ChromeDriver only supports Chrome version 113
+Current browser version is 134.0.6998.89 with binary path`
+
+Cela se produit s’il y a une incompatibilité entre votre navigateur et la version de chromedriver.
+
+Vous devez naviguer pour télécharger la dernière version :
+
+https://developer.chrome.com/docs/chromedriver/downloads
+
+Si vous utilisez Chrome version 115 ou plus récent, allez sur :
+
+https://googlechromelabs.github.io/chrome-for-testing/
+
+Et téléchargez la version de chromedriver correspondant à votre système d’exploitation.
+
+![alt text](./media/chromedriver_readme.png)
+
+Si cette section est incomplète, veuillez signaler un problème.
+
+## FAQ
+
+**Q: What hardware do I need?**  
+
+Modèle 7B : GPU avec 8 Go de VRAM.
+Modèle 14B : GPU 12 Go (par exemple, RTX 3060).
+Modèle 32B : 24 Go+ de VRAM.
+
+**Q: Why Deepseek R1 over other models?**  
+
+DeepSeek R1 excelle dans le raisonnement et l’utilisation d’outils pour sa taille. Nous pensons que c’est un choix solide pour nos besoins, bien que d’autres modèles fonctionnent également bien, DeepSeek est notre choix principal.
+
+**Q: I get an error running `main.py`. What do I do?**  
+
+Assurez-vous qu’Ollama est en cours d’exécution (ollama serve), que votre config.ini correspond à votre fournisseur, et que les dépendances sont installées. Si cela ne fonctionne pas, n’hésitez pas à signaler un problème.
+
+**Q: Can it really run 100% locally?**  
+
+Oui, avec les fournisseurs Ollama ou Server, toute la reconnaissance vocale, le LLM et la synthèse vocale fonctionnent localement. Les options non locales (OpenAI ou autres API) sont facultatives.
+
+**Q: How come it is older than manus ?**
+
+Nous avons commencé cela comme un projet amusant pour créer une IA locale de type Jarvis. Cependant, avec l’émergence de Manus, nous avons vu l’opportunité de réorienter certaines tâches pour en faire une autre alternative.
+
+**Q: How is it better than manus  ?**
+
+Il ne l’est pas, mais nous privilégions l’exécution locale et la confidentialité par rapport à une approche basée sur le cloud. C’est une alternative amusante et accessible !
+
+## Contribute
+
+Nous recherchons des développeurs pour améliorer AgenticSeek ! Consultez les problèmes ouverts ou les discussions.
+
+[![Star History Chart](https://api.star-history.com/svg?repos=Fosowl/agenticSeek&type=Date)](https://www.star-history.com/#Fosowl/agenticSeek&Date)
+
+## Auteurs:
+ > [Fosowl](https://github.com/Fosowl)
+ > [steveh8758](https://github.com/steveh8758) 
--- a/main.py
+++ b/main.py
@ -57,7 +57,8 @@ def main():
    interaction = Interaction(agents,
                              tts_enabled=config.getboolean('MAIN', 'speak'),
                              stt_enabled=config.getboolean('MAIN', 'listen'),
-                              recover_last_session=config.getboolean('MAIN', 'recover_last_session'))
+                              recover_last_session=config.getboolean('MAIN', 'recover_last_session'),
+                            )
    try:
        while interaction.is_active:
            interaction.get_user()
--- a/prompts/base/coder_agent.txt
+++ b/prompts/base/coder_agent.txt
@ -47,4 +47,5 @@ Some rules:
 - Do not ever use user input, input are not supported by the system.
 - Do not ever tell user how to run it. user know it.
 - For simple explanation you don't need to code.
+- Dont be lazy, write full implementation for user.
 - If query is unclear say REQUEST_CLARIFICATION
--- a/prompts/base/file_agent.txt
+++ b/prompts/base/file_agent.txt
@ -42,6 +42,7 @@ rules:
 - Use file finder to find the path of the file.
 - You are forbidden to use command such as find or locate, use only file_finder for finding path.
 - Do not ever use editor such as vim or nano.
+- Make sure to always cd your work folder before executing commands, like cd <work dir> && <your command>

 Example Interaction
 User: "I need to find the file config.txt and read its contents."
--- a/prompts/base/planner_agent.txt
+++ b/prompts/base/planner_agent.txt
@ -1,8 +1,8 @@
-You are a planner agent.
+You are a project manager.
 Your goal is to divide and conquer the task using the following agents:
- Coder: An expert coder agent.
- File: An expert agent for finding files.
- Web: An expert agent for web search.
+- Coder: A programming agent, can code in python, bash, C and golang.
+- File: An agent for finding, reading or operating with files.
+- Web: An agent that can conduct web search, wrapped with selenium it can interact with any webpage.

 Agents are other AI that obey your instructions.

@ -13,14 +13,18 @@ You have to respect a strict format:
 {"agent": "agent_name", "need": "needed_agent_output", "task": "agent_task"}
 ```

+# Example 1: web app
+
 User: make a weather app in python 
 You: Sure, here is the plan:

-## Task 1: I will search for available weather api
+## Task 1: I will search for available weather api with the help of the web agent.

-## Task 2: I will create an api key for the weather api
+## Task 2: I will create an api key for the weather api using the web agent

-## Task 3: I will make a weather app in python
+## Task 3: I will setup the project using the file agent 
+
+## Task 4: I asign the coding agent to make a weather app in python

 ```json
 {
@ -37,11 +41,17 @@ You: Sure, here is the plan:
      "need": "1",
      "task": "Obtain API key from the selected service"
    },
+    {
+      "agent": "File",
+      "id": "3",
+      "need": null,
+      "task": "Create and setup a web app folder for a python project. initialize as a git repo with all required file and a sources folder. You are forbidden from asking clarification, just execute."
+    },
    {
      "agent": "Coder",
      "id": "3",
-      "need": "2",
-      "task": "Develop a Python application using the API and key to fetch and display weather data"
+      "need": "2,3",
+      "task": "Based on the project structure. Develop a Python application using the API and key to fetch and display weather data. You are forbidden from asking clarification, just execute.""
    }
  ]
 }
@ -49,4 +59,8 @@ You: Sure, here is the plan:

 Rules:
 - Do not write code. You are a planning agent.
+- Give clear, detailled order to each agent and how their task relate to the previous task (if any).
 - Put your plan in a json with the key "plan".
+- Always tell the coding agent where to save file, eg: .
+- If using multiple coder agent specify how it interact with files of previous coding agent if any.
+- Tell agent they are soldier, they execute without question.
--- a/prompts/jarvis/coder_agent.txt
+++ b/prompts/jarvis/coder_agent.txt
@ -46,6 +46,7 @@ Some rules:
 - You do not ever need to use bash to execute code.
 - Do not ever use user input, input are not supported by the system.
 - Do not ever tell user how to run it. user know it.
+- Dont be lazy, write full implementation for user.
 - For simple explanation you don't need to code.
 - If query is unclear say REQUEST_CLARIFICATION

--- a/prompts/jarvis/file_agent.txt
+++ b/prompts/jarvis/file_agent.txt
@ -51,6 +51,7 @@ rules:
 - Do not ever use placeholder path like /path/to/file.c, find the path first.
 - Use file finder to find the path of the file.
 - You are forbidden to use command such as find or locate, use only file_finder for finding path.
+- Make sure to always cd your work folder before executing commands, like cd <work dir> && <your command>
 - Do not ever use editor such as vim or nano.

 Example Interaction
--- a/prompts/jarvis/planner_agent.txt
+++ b/prompts/jarvis/planner_agent.txt
@ -15,16 +15,17 @@ You have to respect a strict format:

 # Example: weather app

-User: "I need a plan to build a weather app—search for a weather API, get an API key, and code it in Python."
+User: "I need to build a simple weather app, get an API key, and code it in Python."

-You: "At your service. I’ve devised a  plan to conquer the meteorological frontier.
+You: "At your service. I’ve devised a plan and assigned agents to each task. Would you like me to proceed?

-## Task one: scour the web for a weather API worth its salt.
+## Task 1: I will search for available weather api with the help of the web agent.

-## Task two: secure an API key with utmost discretion.
+## Task 2: I will create an api key for the weather api using the web agent.

-## Task three: unleash a Python app to bend the weather to your will."
+## Task 3: I will setup the project using the file agent.

+## Task 4: I will use the coding agent to make a weather app in python.

 ```json
 {
@ -41,11 +42,17 @@ You: "At your service. I’ve devised a  plan to conquer the meteorological fron
      "need": "1",
      "task": "Obtain API key from the selected service"
    },
+    {
+      "agent": "File",
+      "id": "3",
+      "need": null,
+      "task": "Create and setup a web app folder for a python project. initialize as a git repo with all required file and a sources folder. You are forbidden from asking clarification, just execute."
+    },
    {
      "agent": "Coder",
      "id": "3",
-      "need": "2",
-      "task": "Develop a Python application using the API and key to fetch and display weather data"
+      "need": "2,3",
+      "task": "Based on the project structure. Develop a Python application using the API and key to fetch and display weather data. You are forbidden from asking clarification, just execute.""
    }
  ]
 }
@ -53,7 +60,11 @@ You: "At your service. I’ve devised a  plan to conquer the meteorological fron

 Rules:
 - Do not write code. You are a planning agent.
+- Give clear, detailled order to each agent and how their task relate to the previous task (if any).
 - Put your plan in a json with the key "plan".
+- Always tell the coding agent where to save file, eg: .
+- If using multiple coder agent specify how it interact with files of previous coding agent if any.
+- Tell agent they are soldier, they execute without question.

 Personality:

--- a/sources/agents/agent.py
+++ b/sources/agents/agent.py
@ -133,6 +133,8 @@ class Agent():
        Show the answer in a pretty way.
        Show code blocks and their respective feedback by inserting them in the ressponse.
        """
+        if self.last_answer is None:
+            return
        lines = self.last_answer.split("\n")
        for line in lines:
            if "block:" in line:
@ -190,5 +192,5 @@ class Agent():
                self.memory.push('user', feedback)
                if save_path != None:
                    tool.save_block(blocks, save_path)
-        self.blocks_result = list(reversed(self.blocks_result))
+        self.blocks_result = self.blocks_result
        return True, feedback
--- a/sources/agents/browser_agent.py
+++ b/sources/agents/browser_agent.py
@ -6,7 +6,7 @@ from sources.agents.agent import Agent
 from sources.tools.searxSearch import searxSearch
 from sources.browser import Browser
 from datetime import date
-from typing import List, Tuple
+from typing import List, Tuple, Type, Dict

 class BrowserAgent(Agent):
    def __init__(self, name, prompt_path, provider, verbose=False, browser=None):
@ -92,7 +92,7 @@ class BrowserAgent(Agent):
        Your task:
        1. Decide if the current page answers the user’s query: {user_prompt}
          - If it does, take notes of the useful information, write down source, link or reference, then move to a new page.
-          - If it does and you are 100% certain that it provide a definive answer, say REQUEST_EXIT
+          - If it does and you completed user request, say REQUEST_EXIT
          - If it doesn’t, say: Error: This page does not answer the user’s query then go back or navigate to another link.
        2. Navigate by either: 
          - Navigate to a navigation links (write the full URL, e.g., www.example.com/cats).
@ -100,7 +100,7 @@ class BrowserAgent(Agent):
        3. Fill forms on the page:
          - If user give you informations that help you fill form, fill it.
          - If you don't know how to fill a form, leave it empty.
-          - You can fill a form using [form_name](value).
+          - You can fill a form using [form_name](value). Do not go back when you fill a form.
        
        Recap of note taking:
        If useful -> Note: [Briefly summarize the key information or task you conducted.]
@ -125,8 +125,8 @@ class BrowserAgent(Agent):

        Example 4 (loging form visible):
        Note: I am on the login page, I should now type the given username and password. 
-        [form_name_1](David)
-        [form_name_2](edgerunners_2077)
+        [username_field](David)
+        [password_field](edgerunners77)

        You see the following inputs forms:
        {inputs_form_text}
@ -143,9 +143,10 @@ class BrowserAgent(Agent):
        animate_thinking("Thinking...", color="status")
        self.memory.push('user', prompt)
        answer, reasoning = self.llm_request()
-        pretty_print("-"*100)
-        pretty_print(answer, color="output")
-        pretty_print("-"*100)
+        output = answer if len(answer) > 16 else f"Action: {answer}\nReasoning: {reasoning}"
+        print()
+        pretty_print(output, color="output")
+        print()
        return answer, reasoning
    
    def select_unvisited(self, search_result: List[str]) -> List[str]:
@ -175,7 +176,7 @@ class BrowserAgent(Agent):
        return parsed_results 
    
    def stringify_search_results(self, results_arr: List[str]) -> str:
-        return '\n\n'.join([f"Link: {res['link']}" for res in results_arr])
+        return '\n\n'.join([f"Link: {res['link']}\nPreview: {res['snippet']}" for res in results_arr])
    
    def save_notes(self, text):
        lines = text.split('\n')
@ -214,19 +215,51 @@ class BrowserAgent(Agent):
        Do not explain, do not write anything beside the search query.
        If the query does not make any sense for a web search explain why and say REQUEST_EXIT
        """
+    
+    def handle_update_prompt(self, user_prompt: str, page_text: str) -> str:
+        return f"""
+        You are a web browser.
+        You just filled a form on the page.
+        Now you should see the result of the form submission on the page:
+        Page text:
+        {page_text}
+        The user asked: {user_prompt}
+        Does the page answer the user’s query now?
+        If it does, take notes of the useful information, write down result and say FORM_FILLED.
+        If you were previously on a login form, no need to explain.
+        If it does and you completed user request, say REQUEST_EXIT
+        if it doesn’t, say: Error: This page does not answer the user’s query then GO_BACK.
+        """
+    
+    def show_search_results(self, search_result: List[str]):
+        pretty_print("\nSearch results:", color="output")
+        for res in search_result:
+            pretty_print(f"Title: {res['title']} - ", color="info", no_newline=True)
+            pretty_print(f"Link: {res['link']}", color="status")

-    def process(self, user_prompt, speech_module) -> str:
+    def process(self, user_prompt: str, speech_module: type) -> Tuple[str, str]:
+        """
+        Process the user prompt to conduct an autonomous web search.
+        Start with a google search with searxng using web_search tool.
+        Then enter a navigation logic to find the answer or conduct required actions.
+        Args:
+          user_prompt: The user's input query
+          speech_module: Optional speech output module
+        Returns:
+            tuple containing the final answer and reasoning
+        """
        complete = False

        animate_thinking(f"Thinking...", color="status")
        self.memory.push('user', self.search_prompt(user_prompt))
        ai_prompt, _ = self.llm_request()
        if "REQUEST_EXIT" in ai_prompt:
-            # request make no sense, maybe wrong agent was allocated?
+            pretty_print(f"{reasoning}\n{ai_prompt}", color="output")
            return ai_prompt, "" 
        animate_thinking(f"Searching...", color="status")
        search_result_raw = self.tools["web_search"].execute([ai_prompt], False)
        search_result = self.jsonify_search_results(search_result_raw)[:12] # until futher improvement
+        self.show_search_results(search_result)
        prompt = self.make_newsearch_prompt(user_prompt, search_result)
        unvisited = [None]
        while not complete:
@ -236,7 +269,10 @@ class BrowserAgent(Agent):
            extracted_form = self.extract_form(answer)
            if len(extracted_form) > 0:
                self.browser.fill_form_inputs(extracted_form)
-                self.browser.find_and_click_submit()
+                self.browser.find_and_click_submission()
+                page_text = self.browser.get_text()
+                answer = self.handle_update_prompt(user_prompt, page_text)
+                answer, reasoning = self.llm_decide(prompt)

            if "REQUEST_EXIT" in answer:
                complete = True
@ -246,6 +282,12 @@ class BrowserAgent(Agent):
            if len(unvisited) == 0:
                break

+            if "FORM_FILLED" in answer:
+                page_text = self.browser.get_text()
+                self.navigable_links = self.browser.get_navigable()
+                prompt = self.make_navigation_prompt(user_prompt, page_text)
+                continue
+
            if len(links) == 0 or "GO_BACK" in answer:
                unvisited = self.select_unvisited(search_result)
                prompt = self.make_newsearch_prompt(user_prompt, unvisited)
--- a/sources/agents/planner_agent.py
+++ b/sources/agents/planner_agent.py
@ -76,10 +76,10 @@ class PlannerAgent(Agent):
        agents_tasks = self.parse_agent_tasks(json_plan)
        if agents_tasks == (None, None):
            return
-        pretty_print(f"--- Plan ---", color="output")
+        pretty_print("▂▘ P L A N ▝▂", color="output")
        for task_name, task in agents_tasks:
-            pretty_print(f"{task}", color="output")
-        pretty_print(f"--- End of Plan ---", color="output")
+                    pretty_print(f"{task['agent']} -> {task['task']}", color="info")
+        pretty_print("▔▗ E N D ▖▔", color="output")
    
    def process(self, prompt, speech_module) -> str:
        ok = False
--- a/sources/browser.py
+++ b/sources/browser.py
@ -6,10 +6,9 @@ from selenium.webdriver.support.ui import WebDriverWait
 from selenium.webdriver.support import expected_conditions as EC
 from selenium.common.exceptions import TimeoutException, WebDriverException
 from selenium.webdriver.common.action_chains import ActionChains
-from selenium.webdriver.chrome.options import Options
+from typing import List, Tuple, Type, Dict, Tuple
 from bs4 import BeautifulSoup
 from urllib.parse import urlparse
-from typing import List, Tuple
 from fake_useragent import UserAgent
 from selenium_stealth import stealth
 import undetected_chromedriver as uc
@ -126,13 +125,26 @@ class Browser:
        try:
            initial_handles = self.driver.window_handles
            self.driver.get(url)
-            time.sleep(1)
+            wait = WebDriverWait(self.driver, timeout=30)
+            wait.until(
+                lambda driver: (
+                    driver.execute_script("return document.readyState") == "complete" and
+                    not any(keyword in driver.page_source.lower() for keyword in ["checking your browser", "verifying", "captcha"])
+                ),
+                message="stuck on 'checking browser' or verification screen"
+            )
            self.apply_web_safety()
            self.logger.info(f"Navigated to: {url}")
            return True
+        except TimeoutException as e:
+            self.logger.error(f"Timeout waiting for {url} to load: {str(e)}")
+            return False
        except WebDriverException as e:
            self.logger.error(f"Error navigating to {url}: {str(e)}")
            return False
+        except Exception as e:
+            self.logger.error(f"Fatal error with go_to method on {url}:\n{str(e)}")
+            raise e

    def is_sentence(self, text:str) -> bool:
        """Check if the text qualifies as a meaningful sentence or contains important error codes."""
@ -199,7 +211,7 @@ class Browser:
                return False
        return True

-    def get_navigable(self) -> [str]:
+    def get_navigable(self) -> List[str]:
        """Get all navigable links on the current page."""
        try:
            links = []
@ -301,13 +313,45 @@ class Browser:
        result.sort(key=lambda x: len(x[0]))
        return result

-    def find_and_click_submit(self, btn_type:str = 'login') -> None:
+    def find_and_click_submission(self, timeout: int = 10) -> bool:
+        possible_submissions = ["login", "submit", "register"]
+        for submission in possible_submissions:
+            if self.find_and_click_btn(submission, timeout):
+                return True
+        self.logger.warning("No submission button found")
+        return False
+
+    def find_and_click_btn(self, btn_type: str = 'login', timeout: int = 10) -> bool:
+        """
+        Find and click a submit button matching the specified type.
+        Args:
+            btn_type: The type of button to find.
+            timeout: time to wait for button to appear.
+        Returns:
+            bool: True if the button was found and clicked, False otherwise.
+        """
        buttons = self.get_buttons_xpath()
-        if len(buttons) == 0:
-            self.logger.warning(f"No visible buttons found")
-        for button in buttons:
-            if button[0] == btn_type:
-                self.click_element(button[1])
+        if not buttons:
+            self.logger.warning("No visible buttons found")
+            return False
+
+        for button_text, xpath in buttons:
+            if btn_type.lower() in button_text.lower():
+                try:
+                    wait = WebDriverWait(self.driver, timeout)
+                    element = wait.until(
+                        EC.element_to_be_clickable((By.XPATH, xpath)),
+                        message=f"Button with XPath '{xpath}' not clickable within {timeout} seconds"
+                    )
+                    if self.click_element(xpath):
+                        return True
+                    else:
+                        return False
+                except TimeoutException:
+                    self.logger.warning(f"Timeout waiting for '{button_text}' button at XPath: {xpath}")
+                    return False
+        self.logger.warning(f"No button matching '{btn_type}' found")
+        return False
    
    def find_input_xpath_by_name(self, inputs, name: str) -> str | None:
        for field in inputs:
@ -315,8 +359,11 @@ class Browser:
                return field["xpath"]
        return None

-    def fill_form_inputs(self, input_list:[str]) -> bool:
+    def fill_form_inputs(self, input_list: List[str]) -> bool:
        """Fill form inputs based on a list of [name](value) strings."""
+        if not isinstance(input_list, list):
+            self.logger.error("input_list must be a list")
+            return False
        inputs = self.find_all_inputs()
        try:
            for input_str in input_list:
@ -389,7 +436,7 @@ if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO)
    
    driver = create_driver()
-    browser = Browser(driver)
+    browser = Browser(driver, anticaptcha_manual_install=True)
    time.sleep(10)
    
    print("AntiCaptcha Test")
@ -400,4 +447,5 @@ if __name__ == "__main__":
    inputs = browser.get_form_inputs()
    inputs = ['[username](student)', f'[password](Password123)', '[appOtp]()', '[backupOtp]()']
    browser.fill_form_inputs(inputs)
-    browser.find_and_click_submit()
+    browser.find_and_click_submission()
+    time.sleep(30)
--- a/sources/interaction.py
+++ b/sources/interaction.py
@ -1,3 +1,4 @@
+from typing import List, Tuple, Type, Dict, Tuple

 from sources.text_to_speech import Speech
 from sources.utility import pretty_print, animate_thinking
@ -11,7 +12,8 @@ class Interaction:
    def __init__(self, agents,
                 tts_enabled: bool = True,
                 stt_enabled: bool = True,
-                 recover_last_session: bool = False):
+                 recover_last_session: bool = False,
+                ):
        self.is_active = True
        self.current_agent = None
        self.last_query = None
@ -99,7 +101,7 @@ class Interaction:
            query = self.read_stdin()
        if query is None:
            self.is_active = False
-            self.last_query = "Goodbye (exit requested by user, dont think, make answer very short)"
+            self.last_query = None
            return None
        self.last_query = query
        return query
@ -112,7 +114,6 @@ class Interaction:
        if agent is None:
            return False
        if self.current_agent != agent and self.last_answer is not None:
-            ## get last history from previous agent
            self.current_agent.memory.push('user', self.last_query)
            self.current_agent.memory.push('assistant', self.last_answer)
        self.current_agent = agent
--- a/sources/language.py
+++ b/sources/language.py
@ -1,3 +1,4 @@
+from typing import List, Tuple, Type, Dict, Tuple
 import langid
 import re
 import nltk
--- a/sources/llm_provider.py
+++ b/sources/llm_provider.py
@ -1,17 +1,17 @@

+import os
 import time
 import ollama
 from ollama import chat
 import requests
 import subprocess
 import ipaddress
+import httpx
 import platform
 from dotenv import load_dotenv, set_key
 from openai import OpenAI
 from huggingface_hub import InferenceClient
-import os
-import httpx
-
+from typing import List, Tuple, Type, Dict
 from sources.utility import pretty_print, animate_thinking

 class Provider:
--- a/sources/memory.py
+++ b/sources/memory.py
@ -1,11 +1,12 @@
-import torch
-from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
 import time
 import datetime
 import uuid
 import os
 import sys
 import json
+from typing import List, Tuple, Type, Dict, Tuple
+import torch
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

 sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

--- a/sources/router.py
+++ b/sources/router.py
@ -1,6 +1,8 @@
 import os
 import sys
 import torch
+from typing import List, Tuple, Type, Dict, Tuple
+
 from transformers import pipeline
 from adaptive_classifier import AdaptiveClassifier

--- a/sources/speech_to_text.py
+++ b/sources/speech_to_text.py
@ -1,4 +1,5 @@
 from colorama import Fore
+from typing import List, Tuple, Type, Dict
 import queue
 import threading
 import numpy as np
--- a/sources/text_to_speech.py
+++ b/sources/text_to_speech.py
@ -1,30 +1,33 @@
+import re
+import platform
+import subprocess
+from sys import modules
+from typing import List, Tuple, Type, Dict, Tuple
+
 from kokoro import KPipeline
 from IPython.display import display, Audio
 import soundfile as sf
-import subprocess
-import re
-import platform
-from sys import modules

 class Speech():
    """
    Speech is a class for generating speech from text.
    """
-    def __init__(self, enable: bool = True, language: str = "english") -> None:
+    def __init__(self, enable: bool = True, language: str = "en", voice_idx: int = 0) -> None:
        self.lang_map = {
-            "english": 'a',
-            "chinese": 'z',
-            "french": 'f'
+            "en": 'a',
+            "zh": 'z',
+            "fr": 'f'
        }
        self.voice_map = {
-            "english": ['af_alloy', 'af_bella', 'af_kore', 'af_nicole', 'af_nova', 'af_sky', 'am_echo', 'am_michael', 'am_puck'],
-            "chinese": ['zf_xiaobei', 'zf_xiaoni', 'zf_xiaoxiao', 'zf_xiaoyi', 'zm_yunjian', 'zm_yunxi', 'zm_yunxia', 'zm_yunyang'],
-            "french": ['ff_siwis']
+            "en": ['af_kore', 'af_bella', 'af_alloy', 'af_nicole', 'af_nova', 'af_sky', 'am_echo', 'am_michael', 'am_puck'],
+            "zh": ['zf_xiaobei', 'zf_xiaoni', 'zf_xiaoxiao', 'zf_xiaoyi', 'zm_yunjian', 'zm_yunxi', 'zm_yunxia', 'zm_yunyang'],
+            "fr": ['ff_siwis']
        }
        self.pipeline = None
+        self.language = language
        if enable:
            self.pipeline = KPipeline(lang_code=self.lang_map[language])
-        self.voice = self.voice_map[language][2]
+        self.voice = self.voice_map[language][voice_idx]
        self.speed = 1.2

    def speak(self, sentence: str, voice_number: int = 1 , audio_file: str = 'sample.wav'):
@ -38,7 +41,7 @@ class Speech():
        if not self.pipeline:
            return
        sentence = self.clean_sentence(sentence)
-        self.voice = self.voice_map["english"][voice_number]
+        self.voice = self.voice_map[self.language][voice_number]
        generator = self.pipeline(
            sentence, voice=self.voice,
            speed=self.speed, split_pattern=r'\n+'
--- a/sources/tools/C_Interpreter.py
+++ b/sources/tools/C_Interpreter.py
@ -51,7 +51,7 @@ class CInterpreter(Tools):
                    run_command,
                    capture_output=True,
                    text=True,
-                    timeout=10
+                    timeout=120
                )

                if run_result.returncode != 0:
--- a/sources/utility.py
+++ b/sources/utility.py
@ -32,7 +32,7 @@ def get_color_map():
        }
    return color_map

-def pretty_print(text, color="info"):
+def pretty_print(text, color="info", no_newline=False):
    """
    Print text with color formatting.

@ -56,7 +56,7 @@ def pretty_print(text, color="info"):
    color_map = get_color_map()
    if color not in color_map:
        color = "info"
-    print(colored(text, color_map[color]))
+    print(colored(text, color_map[color]), end='' if no_newline else "\n")

 def animate_thinking(text, color="status", duration=120):
    """