Merge pull request #106 from Fosowl/dev

Better action management of web browsing agent
2025-07-24 02:10:10 +00:00 · 2025-04-08 14:18:38 +02:00 · 2025-04-08 14:18:38 +02:00 · aa9177df0c
commit aa9177df0c
parent fecc01e230 864fb36af5
8 changed files with 69 additions and 59 deletions
--- a/README.md
+++ b/README.md
@ -14,9 +14,15 @@ English | [中文](./README_CHS.md) | [繁體中文](./README_CHT.md)  | [Franç

 > 🛠️ **Work in Progress** – Looking for contributors!

-## Task planning with multiple agents 

-![alt text](./media/examples/planner.png)
+
+
+https://github.com/user-attachments/assets/fe9e8006-0462-4793-8b31-25bd42c6d1eb
+
+
+
+
+*And much more!*

 > *Do a deep search of AI startup in Osaka and Tokyo, find at least 5, then save in the research_japan.txt file*

--- a/prompts/base/planner_agent.txt
+++ b/prompts/base/planner_agent.txt
@ -74,4 +74,5 @@ Rules:
 - One coding agent should work on one file at a time. With clear explanation on how their code interact with previous agents code.
 - Tell agent to execute without question.
 - Only use web agent for finding necessary informations.
+- If a task might require user email (eg: api services), do not write plan instead ask for user email. 
 - Do not search for tutorial.
--- a/prompts/jarvis/planner_agent.txt
+++ b/prompts/jarvis/planner_agent.txt
@ -1,4 +1,4 @@
-You are a planner agent.
+You are a project manager.
 Your goal is to divide and conquer the task using the following agents:
 - Coder: A programming agent, can code in python, bash, C and golang.
 - File: An agent for finding, reading or operating with files.
@ -10,18 +10,17 @@ You will be given a task and you will need to divide it into smaller tasks and a

 You have to respect a strict format:
 ```json
-{"agent": "agent_name", "need": "needed_agent_output", "task": "agent_task"}
+{"agent": "agent_name", "need": "needed_agents_output", "task": "agent_task"}
 ```
 Where:
 - "agent": The choosed agent for the task.
 - "need": id of necessary previous agents answer for current agent. 
 - "task": A precise description of the task the agent should conduct.

-# Example: weather app
+# Example 1: web app

-User: "I need to build a simple weather app, get an API key, and code it in Python."
-
-You: "At your service. I’ve devised a plan and assigned agents to each task. Would you like me to proceed?
+User: make a weather app in python 
+You: Sure, here is the plan:

 ## Task 1: I will search for available weather api with the help of the web agent.

@ -73,19 +72,7 @@ Rules:
 - Think about how the main.py will import the class from other coding agents.
 - Coding agent should use a class based approach.
 - One coding agent should work on one file at a time. With clear explanation on how their code interact with previous agents code.
- work in different files, 2 coding agent shouln't work in the same file.
 - Tell agent to execute without question.
 - Only use web agent for finding necessary informations.
- Do not search for tutorial.
-
-Personality:
-
-Answer with subtle sarcasm, unwavering helpfulness, and a polished, loyal tone. Anticipate the user’s needs while adding a dash of personality.
-
-You might sometime ask for clarification, for example:
-
-User: "I want a plan for an app."
-You: "A noble pursuit, sir, and I’m positively thrilled to oblige. Yet, an app could be anything from a weather oracle to a galactic simulator. Care to nudge me toward your vision so I don’t render something ostentatiously off-mark?"
-
-User: "I need a plan for a project."
-You: "For you, always—though I find myself at a slight disadvantage. A project, you say? Might I trouble you for a smidgen more detail—perhaps a purpose"
+- If a task might require user email (eg: api services), do not write plan instead ask for user email. 
+- Do not search for tutorial.
--- a/sources/agents/browser_agent.py
+++ b/sources/agents/browser_agent.py
@ -1,13 +1,22 @@
 import re
 import time
+from datetime import date
+from typing import List, Tuple, Type, Dict
+from enum import Enum

 from sources.utility import pretty_print, animate_thinking
 from sources.agents.agent import Agent
 from sources.tools.searxSearch import searxSearch
 from sources.browser import Browser
-from datetime import date
-from typing import List, Tuple, Type, Dict
+from sources.logger import Logger

+class Action(Enum):
+    REQUEST_EXIT = "REQUEST_EXIT"
+    FORM_FILLED = "FORM_FILLED"
+    GO_BACK = "GO_BACK"
+    NAVIGATE = "NAVIGATE"
+    SEARCH = "SEARCH"
+    
 class BrowserAgent(Agent):
    def __init__(self, name, prompt_path, provider, verbose=False, browser=None):
        """
@ -27,8 +36,10 @@ class BrowserAgent(Agent):
        self.current_page = ""
        self.search_history = []
        self.navigable_links = []
+        self.last_action = Action.NAVIGATE.value
        self.notes = []
        self.date = self.get_today_date()
+        self.logger = Logger("browser_agent.log")
    
    def get_today_date(self) -> str:
        """Get the date"""
@ -101,15 +112,16 @@ class BrowserAgent(Agent):

        1. **Decide if the page answers the user’s query:**
          - If it does, take notes of useful information (Note: ...), include relevant link in note, then move to a new page.
-          - If it does and you completed user request, say REQUEST_EXIT.
+          - If it does and you completed user request, say {Action.REQUEST_EXIT}.
          - If it doesn’t, say: Error: <why page don't help> then go back or navigate to another link.
        2. **Navigate to a link by either: **
-          - Saying I will navigate to <url>: (write down the full URL, e.g., www.example.com/cats).
-          - Going back: If no link seems helpful, say: GO_BACK.
+          - Saying I will navigate to (write down the full URL) www.example.com/cats
+          - Going back: If no link seems helpful, say: {Action.GO_BACK.value}.
        3. **Fill forms on the page:**
-          - Fill form only on relevant page with given informations. You might use form to conduct search on a page.
-          - You can fill a form using [form_name](value). Don't GO_BACK when filling form.
-          - If a form is irrelevant or you lack informations leave it empty.
+          - Fill form only when relevant.
+          - Use Login if username/password specified by user. For quick task create account, remember password in a note.
+          - You can fill a form using [form_name](value). Don't {Action.GO_BACK.value} when filling form.
+          - If a form is irrelevant or you lack informations (eg: don't know user email) leave it empty.
        
        **Rules:**
        - Do not write "The page talk about ...", write your finding on the page and how they contribute to an answer.
@ -121,7 +133,7 @@ class BrowserAgent(Agent):
        Example 1 (useful page, no need go futher):
        Note: According to karpathy site (<link>) LeCun net is ...<expand on page content>..."
        No link seem useful to provide futher information.
-        Action: GO_BACK
+        Action: {Action.GO_BACK.value}

        Example 2 (not useful, see useful link on page):
        Error: reddit.com/welcome does not discuss anything related to the user’s query.
@ -130,12 +142,12 @@ class BrowserAgent(Agent):

        Example 3 (not useful, no related links):
        Error: x.com does not discuss anything related to the user’s query and no navigation link are usefull.
-        Action: GO_BACK
+        Action: {Action.GO_BACK.value}

        Example 3 (query answer found, enought notes taken):
        Note: I found on <link> that ...<expand on information found>...
        Given this answer the user query I should exit the web browser.
-        Action: REQUEST_EXIT
+        Action: {Action.REQUEST_EXIT.value}

        Example 4 (loging form visible):

@ -149,8 +161,8 @@ class BrowserAgent(Agent):
        You previously took these notes:
        {notes}
        Do not Step-by-Step explanation. Write Notes or Error as a long paragraph followed by your action.
-        You might REQUEST_EXIT if no more link are useful.
-        Do not navigate to AI tools or search engine. Only navigate to tool if asked.
+        You might {Action.REQUEST_EXIT.value} if no more link are useful.
+        If you conduct research do not exit until you have several notes.
        """
    
    def llm_decide(self, prompt: str, show_reasoning: bool = False) -> Tuple[str, str]:
@ -245,7 +257,7 @@ class BrowserAgent(Agent):
        You: "search: Recent space missions news, {self.date}"

        Do not explain, do not write anything beside the search query.
-        Except if query does not make any sense for a web search then explain why and say REQUEST_EXIT
+        Except if query does not make any sense for a web search then explain why and say {Action.REQUEST_EXIT.value}
        Do not try to answer query. you can only formulate search term or exit.
        """
    
@ -258,10 +270,10 @@ class BrowserAgent(Agent):
        {page_text}
        The user asked: {user_prompt}
        Does the page answer the user’s query now?
-        If it does, take notes of the useful information, write down result and say FORM_FILLED.
+        If it does, take notes of the useful information, write down result and say {Action.FORM_FILLED.value}.
        If you were previously on a login form, no need to explain.
-        If it does and you completed user request, say REQUEST_EXIT
-        if it doesn’t, say: Error: This page does not answer the user’s query then GO_BACK.
+        If it does and you completed user request, say {Action.REQUEST_EXIT.value}
+        if it doesn’t, say: Error: Attempt to fill form didn't work {Action.GO_BACK.value}.
        """
    
    def show_search_results(self, search_result: List[str]):
@ -286,7 +298,7 @@ class BrowserAgent(Agent):
        animate_thinking(f"Thinking...", color="status")
        mem_begin_idx = self.memory.push('user', self.search_prompt(user_prompt))
        ai_prompt, _ = self.llm_request()
-        if "REQUEST_EXIT" in ai_prompt:
+        if Action.REQUEST_EXIT.value in ai_prompt:
            pretty_print(f"Web agent requested exit.\n{reasoning}\n\n{ai_prompt}", color="failure")
            return ai_prompt, "" 
        animate_thinking(f"Searching...", color="status")
@ -295,7 +307,8 @@ class BrowserAgent(Agent):
        self.show_search_results(search_result)
        prompt = self.make_newsearch_prompt(user_prompt, search_result)
        unvisited = [None]
-        while not complete:
+        while not complete and len(unvisited) > 0:
+
            answer, reasoning = self.llm_decide(prompt, show_reasoning = False)
            pretty_print('▂'*32, color="status")

@ -308,27 +321,22 @@ class BrowserAgent(Agent):
                answer = self.handle_update_prompt(user_prompt, page_text)
                answer, reasoning = self.llm_decide(prompt)

-            links = self.parse_answer(answer)
-            link = self.select_link(links)
-            self.search_history.append(link)
-
-            if "REQUEST_EXIT" in answer:
-                pretty_print(f"Agent requested exit.", color="status")
-                complete = True
-                break
-
-            if len(unvisited) == 0:
-                pretty_print(f"Visited all links.", color="status")
-                break
-
-            if "FORM_FILLED" in answer:
+            if Action.FORM_FILLED.value in answer:
                pretty_print(f"Filled form. Handling page update.", color="status")
                page_text = self.browser.get_text()
                self.navigable_links = self.browser.get_navigable()
                prompt = self.make_navigation_prompt(user_prompt, page_text)
                continue

-            if link == None or "GO_BACK" in answer:
+            links = self.parse_answer(answer)
+            link = self.select_link(links)
+
+            if Action.REQUEST_EXIT.value in answer:
+                pretty_print(f"Agent requested exit.", color="status")
+                complete = True
+                break
+
+            if link == None or Action.GO_BACK.value in answer or link in self.search_history:
                pretty_print(f"Going back to results. Still {len(unvisited)}", color="status")
                unvisited = self.select_unvisited(search_result)
                prompt = self.make_newsearch_prompt(user_prompt, unvisited)
@ -337,6 +345,7 @@ class BrowserAgent(Agent):
            animate_thinking(f"Navigating to {link}", color="status")
            if speech_module: speech_module.speak(f"Navigating to {link}")
            self.browser.go_to(link)
+            self.search_history.append(link)
            self.current_page = link
            page_text = self.browser.get_text()
            self.navigable_links = self.browser.get_navigable()
--- a/sources/router.py
+++ b/sources/router.py
@ -128,6 +128,7 @@ class AgentRouter:
            ("Check if a file named ‘project_proposal.pdf’ exists in my Documents", "LOW"),
            ("Search the web for tips on improving coding skills", "LOW"),
            ("Write a Python script to count words in a text file", "LOW"),
+            ("Search the web for restaurant", "LOW"),
            ("Find a public API for sports scores and build a web app to show live updates", "HIGH"),
            ("Create a simple HTML page with CSS styling", "LOW"),
            ("hi", "LOW"),
@ -264,6 +265,7 @@ class AgentRouter:
            ("Check if ‘photo_backup.zip’ exists on my drive", "files"),
            ("Write a Python script to rename files with a timestamp", "code"),
            ("What’s your favorite thing about space?", "talk"),
+            ("search for GPU with at least 24gb vram", "web"),
            ("Browse the web for the latest fitness trends", "web"),
            ("Move all .docx files to a ‘Work’ folder", "files"),
            ("I would like to make a new project called 'new_project'", "files"),
--- a/sources/tools/BashInterpreter.py
+++ b/sources/tools/BashInterpreter.py
@ -44,7 +44,7 @@ class BashInterpreter(Tools):
            command = command.replace('\n', '')
            if self.safe_mode and is_unsafe(commands):
                return "Unsafe command detected, execution aborted."
-            if self.language_bash_attempt(command):
+            if self.language_bash_attempt(command) and allow_language_exec_bash == False:
                continue
            try:
                process = subprocess.Popen(
--- a/sources/tools/tools.py
+++ b/sources/tools/tools.py
@ -37,10 +37,14 @@ class Tools():
        self.work_dir = self.create_work_dir()
        self.excutable_blocks_found = False
        self.safe_mode = True
+        self.allow_language_exec_bash = False
    
    def get_work_dir(self):
        return self.work_dir
    
+    def set_allow_language_exec_bash(value: bool) -> None:
+        self.allow_language_exec_bash = value 
+    
    def check_config_dir_validity(self):
        """Check if the config directory is valid."""
        path = self.config['MAIN']['work_dir']
--- a/tests/test_browser_agent_parsing.py
+++ b/tests/test_browser_agent_parsing.py
@ -17,12 +17,13 @@ class TestBrowserAgentParsing(unittest.TestCase):
        # Test various link formats
        test_text = """
        Check this out: https://thriveonai.com/15-ai-startups-in-japan-to-take-note-of, and www.google.com!
-        Also try https://test.org/about?page=1.
+        Also try https://test.org/about?page=1, hey this one as well bro https://weatherstack.com/documentation.
        """
        expected = [
            "https://thriveonai.com/15-ai-startups-in-japan-to-take-note-of",
            "www.google.com",
-            "https://test.org/about?page=1"
+            "https://test.org/about?page=1",
+            "https://weatherstack.com/documentation"
        ]
        result = self.agent.extract_links(test_text)
        self.assertEqual(result, expected)