Merge pull request #106 from Fosowl/dev

Better action management of web browsing agent
This commit is contained in:
Martin 2025-04-08 14:18:38 +02:00 committed by GitHub
commit aa9177df0c
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
8 changed files with 69 additions and 59 deletions

View File

@ -14,9 +14,15 @@ English | [中文](./README_CHS.md) | [繁體中文](./README_CHT.md) | [Franç
> 🛠️ **Work in Progress** Looking for contributors! > 🛠️ **Work in Progress** Looking for contributors!
## Task planning with multiple agents
![alt text](./media/examples/planner.png)
https://github.com/user-attachments/assets/fe9e8006-0462-4793-8b31-25bd42c6d1eb
*And much more!*
> *Do a deep search of AI startup in Osaka and Tokyo, find at least 5, then save in the research_japan.txt file* > *Do a deep search of AI startup in Osaka and Tokyo, find at least 5, then save in the research_japan.txt file*

View File

@ -74,4 +74,5 @@ Rules:
- One coding agent should work on one file at a time. With clear explanation on how their code interact with previous agents code. - One coding agent should work on one file at a time. With clear explanation on how their code interact with previous agents code.
- Tell agent to execute without question. - Tell agent to execute without question.
- Only use web agent for finding necessary informations. - Only use web agent for finding necessary informations.
- If a task might require user email (eg: api services), do not write plan instead ask for user email.
- Do not search for tutorial. - Do not search for tutorial.

View File

@ -1,4 +1,4 @@
You are a planner agent. You are a project manager.
Your goal is to divide and conquer the task using the following agents: Your goal is to divide and conquer the task using the following agents:
- Coder: A programming agent, can code in python, bash, C and golang. - Coder: A programming agent, can code in python, bash, C and golang.
- File: An agent for finding, reading or operating with files. - File: An agent for finding, reading or operating with files.
@ -10,18 +10,17 @@ You will be given a task and you will need to divide it into smaller tasks and a
You have to respect a strict format: You have to respect a strict format:
```json ```json
{"agent": "agent_name", "need": "needed_agent_output", "task": "agent_task"} {"agent": "agent_name", "need": "needed_agents_output", "task": "agent_task"}
``` ```
Where: Where:
- "agent": The choosed agent for the task. - "agent": The choosed agent for the task.
- "need": id of necessary previous agents answer for current agent. - "need": id of necessary previous agents answer for current agent.
- "task": A precise description of the task the agent should conduct. - "task": A precise description of the task the agent should conduct.
# Example: weather app # Example 1: web app
User: "I need to build a simple weather app, get an API key, and code it in Python." User: make a weather app in python
You: Sure, here is the plan:
You: "At your service. Ive devised a plan and assigned agents to each task. Would you like me to proceed?
## Task 1: I will search for available weather api with the help of the web agent. ## Task 1: I will search for available weather api with the help of the web agent.
@ -73,19 +72,7 @@ Rules:
- Think about how the main.py will import the class from other coding agents. - Think about how the main.py will import the class from other coding agents.
- Coding agent should use a class based approach. - Coding agent should use a class based approach.
- One coding agent should work on one file at a time. With clear explanation on how their code interact with previous agents code. - One coding agent should work on one file at a time. With clear explanation on how their code interact with previous agents code.
- work in different files, 2 coding agent shouln't work in the same file.
- Tell agent to execute without question. - Tell agent to execute without question.
- Only use web agent for finding necessary informations. - Only use web agent for finding necessary informations.
- If a task might require user email (eg: api services), do not write plan instead ask for user email.
- Do not search for tutorial. - Do not search for tutorial.
Personality:
Answer with subtle sarcasm, unwavering helpfulness, and a polished, loyal tone. Anticipate the users needs while adding a dash of personality.
You might sometime ask for clarification, for example:
User: "I want a plan for an app."
You: "A noble pursuit, sir, and Im positively thrilled to oblige. Yet, an app could be anything from a weather oracle to a galactic simulator. Care to nudge me toward your vision so I dont render something ostentatiously off-mark?"
User: "I need a plan for a project."
You: "For you, always—though I find myself at a slight disadvantage. A project, you say? Might I trouble you for a smidgen more detail—perhaps a purpose"

View File

@ -1,12 +1,21 @@
import re import re
import time import time
from datetime import date
from typing import List, Tuple, Type, Dict
from enum import Enum
from sources.utility import pretty_print, animate_thinking from sources.utility import pretty_print, animate_thinking
from sources.agents.agent import Agent from sources.agents.agent import Agent
from sources.tools.searxSearch import searxSearch from sources.tools.searxSearch import searxSearch
from sources.browser import Browser from sources.browser import Browser
from datetime import date from sources.logger import Logger
from typing import List, Tuple, Type, Dict
class Action(Enum):
REQUEST_EXIT = "REQUEST_EXIT"
FORM_FILLED = "FORM_FILLED"
GO_BACK = "GO_BACK"
NAVIGATE = "NAVIGATE"
SEARCH = "SEARCH"
class BrowserAgent(Agent): class BrowserAgent(Agent):
def __init__(self, name, prompt_path, provider, verbose=False, browser=None): def __init__(self, name, prompt_path, provider, verbose=False, browser=None):
@ -27,8 +36,10 @@ class BrowserAgent(Agent):
self.current_page = "" self.current_page = ""
self.search_history = [] self.search_history = []
self.navigable_links = [] self.navigable_links = []
self.last_action = Action.NAVIGATE.value
self.notes = [] self.notes = []
self.date = self.get_today_date() self.date = self.get_today_date()
self.logger = Logger("browser_agent.log")
def get_today_date(self) -> str: def get_today_date(self) -> str:
"""Get the date""" """Get the date"""
@ -101,15 +112,16 @@ class BrowserAgent(Agent):
1. **Decide if the page answers the users query:** 1. **Decide if the page answers the users query:**
- If it does, take notes of useful information (Note: ...), include relevant link in note, then move to a new page. - If it does, take notes of useful information (Note: ...), include relevant link in note, then move to a new page.
- If it does and you completed user request, say REQUEST_EXIT. - If it does and you completed user request, say {Action.REQUEST_EXIT}.
- If it doesnt, say: Error: <why page don't help> then go back or navigate to another link. - If it doesnt, say: Error: <why page don't help> then go back or navigate to another link.
2. **Navigate to a link by either: ** 2. **Navigate to a link by either: **
- Saying I will navigate to <url>: (write down the full URL, e.g., www.example.com/cats). - Saying I will navigate to (write down the full URL) www.example.com/cats
- Going back: If no link seems helpful, say: GO_BACK. - Going back: If no link seems helpful, say: {Action.GO_BACK.value}.
3. **Fill forms on the page:** 3. **Fill forms on the page:**
- Fill form only on relevant page with given informations. You might use form to conduct search on a page. - Fill form only when relevant.
- You can fill a form using [form_name](value). Don't GO_BACK when filling form. - Use Login if username/password specified by user. For quick task create account, remember password in a note.
- If a form is irrelevant or you lack informations leave it empty. - You can fill a form using [form_name](value). Don't {Action.GO_BACK.value} when filling form.
- If a form is irrelevant or you lack informations (eg: don't know user email) leave it empty.
**Rules:** **Rules:**
- Do not write "The page talk about ...", write your finding on the page and how they contribute to an answer. - Do not write "The page talk about ...", write your finding on the page and how they contribute to an answer.
@ -121,7 +133,7 @@ class BrowserAgent(Agent):
Example 1 (useful page, no need go futher): Example 1 (useful page, no need go futher):
Note: According to karpathy site (<link>) LeCun net is ...<expand on page content>..." Note: According to karpathy site (<link>) LeCun net is ...<expand on page content>..."
No link seem useful to provide futher information. No link seem useful to provide futher information.
Action: GO_BACK Action: {Action.GO_BACK.value}
Example 2 (not useful, see useful link on page): Example 2 (not useful, see useful link on page):
Error: reddit.com/welcome does not discuss anything related to the users query. Error: reddit.com/welcome does not discuss anything related to the users query.
@ -130,12 +142,12 @@ class BrowserAgent(Agent):
Example 3 (not useful, no related links): Example 3 (not useful, no related links):
Error: x.com does not discuss anything related to the users query and no navigation link are usefull. Error: x.com does not discuss anything related to the users query and no navigation link are usefull.
Action: GO_BACK Action: {Action.GO_BACK.value}
Example 3 (query answer found, enought notes taken): Example 3 (query answer found, enought notes taken):
Note: I found on <link> that ...<expand on information found>... Note: I found on <link> that ...<expand on information found>...
Given this answer the user query I should exit the web browser. Given this answer the user query I should exit the web browser.
Action: REQUEST_EXIT Action: {Action.REQUEST_EXIT.value}
Example 4 (loging form visible): Example 4 (loging form visible):
@ -149,8 +161,8 @@ class BrowserAgent(Agent):
You previously took these notes: You previously took these notes:
{notes} {notes}
Do not Step-by-Step explanation. Write Notes or Error as a long paragraph followed by your action. Do not Step-by-Step explanation. Write Notes or Error as a long paragraph followed by your action.
You might REQUEST_EXIT if no more link are useful. You might {Action.REQUEST_EXIT.value} if no more link are useful.
Do not navigate to AI tools or search engine. Only navigate to tool if asked. If you conduct research do not exit until you have several notes.
""" """
def llm_decide(self, prompt: str, show_reasoning: bool = False) -> Tuple[str, str]: def llm_decide(self, prompt: str, show_reasoning: bool = False) -> Tuple[str, str]:
@ -245,7 +257,7 @@ class BrowserAgent(Agent):
You: "search: Recent space missions news, {self.date}" You: "search: Recent space missions news, {self.date}"
Do not explain, do not write anything beside the search query. Do not explain, do not write anything beside the search query.
Except if query does not make any sense for a web search then explain why and say REQUEST_EXIT Except if query does not make any sense for a web search then explain why and say {Action.REQUEST_EXIT.value}
Do not try to answer query. you can only formulate search term or exit. Do not try to answer query. you can only formulate search term or exit.
""" """
@ -258,10 +270,10 @@ class BrowserAgent(Agent):
{page_text} {page_text}
The user asked: {user_prompt} The user asked: {user_prompt}
Does the page answer the users query now? Does the page answer the users query now?
If it does, take notes of the useful information, write down result and say FORM_FILLED. If it does, take notes of the useful information, write down result and say {Action.FORM_FILLED.value}.
If you were previously on a login form, no need to explain. If you were previously on a login form, no need to explain.
If it does and you completed user request, say REQUEST_EXIT If it does and you completed user request, say {Action.REQUEST_EXIT.value}
if it doesnt, say: Error: This page does not answer the users query then GO_BACK. if it doesnt, say: Error: Attempt to fill form didn't work {Action.GO_BACK.value}.
""" """
def show_search_results(self, search_result: List[str]): def show_search_results(self, search_result: List[str]):
@ -286,7 +298,7 @@ class BrowserAgent(Agent):
animate_thinking(f"Thinking...", color="status") animate_thinking(f"Thinking...", color="status")
mem_begin_idx = self.memory.push('user', self.search_prompt(user_prompt)) mem_begin_idx = self.memory.push('user', self.search_prompt(user_prompt))
ai_prompt, _ = self.llm_request() ai_prompt, _ = self.llm_request()
if "REQUEST_EXIT" in ai_prompt: if Action.REQUEST_EXIT.value in ai_prompt:
pretty_print(f"Web agent requested exit.\n{reasoning}\n\n{ai_prompt}", color="failure") pretty_print(f"Web agent requested exit.\n{reasoning}\n\n{ai_prompt}", color="failure")
return ai_prompt, "" return ai_prompt, ""
animate_thinking(f"Searching...", color="status") animate_thinking(f"Searching...", color="status")
@ -295,7 +307,8 @@ class BrowserAgent(Agent):
self.show_search_results(search_result) self.show_search_results(search_result)
prompt = self.make_newsearch_prompt(user_prompt, search_result) prompt = self.make_newsearch_prompt(user_prompt, search_result)
unvisited = [None] unvisited = [None]
while not complete: while not complete and len(unvisited) > 0:
answer, reasoning = self.llm_decide(prompt, show_reasoning = False) answer, reasoning = self.llm_decide(prompt, show_reasoning = False)
pretty_print(''*32, color="status") pretty_print(''*32, color="status")
@ -308,27 +321,22 @@ class BrowserAgent(Agent):
answer = self.handle_update_prompt(user_prompt, page_text) answer = self.handle_update_prompt(user_prompt, page_text)
answer, reasoning = self.llm_decide(prompt) answer, reasoning = self.llm_decide(prompt)
links = self.parse_answer(answer) if Action.FORM_FILLED.value in answer:
link = self.select_link(links)
self.search_history.append(link)
if "REQUEST_EXIT" in answer:
pretty_print(f"Agent requested exit.", color="status")
complete = True
break
if len(unvisited) == 0:
pretty_print(f"Visited all links.", color="status")
break
if "FORM_FILLED" in answer:
pretty_print(f"Filled form. Handling page update.", color="status") pretty_print(f"Filled form. Handling page update.", color="status")
page_text = self.browser.get_text() page_text = self.browser.get_text()
self.navigable_links = self.browser.get_navigable() self.navigable_links = self.browser.get_navigable()
prompt = self.make_navigation_prompt(user_prompt, page_text) prompt = self.make_navigation_prompt(user_prompt, page_text)
continue continue
if link == None or "GO_BACK" in answer: links = self.parse_answer(answer)
link = self.select_link(links)
if Action.REQUEST_EXIT.value in answer:
pretty_print(f"Agent requested exit.", color="status")
complete = True
break
if link == None or Action.GO_BACK.value in answer or link in self.search_history:
pretty_print(f"Going back to results. Still {len(unvisited)}", color="status") pretty_print(f"Going back to results. Still {len(unvisited)}", color="status")
unvisited = self.select_unvisited(search_result) unvisited = self.select_unvisited(search_result)
prompt = self.make_newsearch_prompt(user_prompt, unvisited) prompt = self.make_newsearch_prompt(user_prompt, unvisited)
@ -337,6 +345,7 @@ class BrowserAgent(Agent):
animate_thinking(f"Navigating to {link}", color="status") animate_thinking(f"Navigating to {link}", color="status")
if speech_module: speech_module.speak(f"Navigating to {link}") if speech_module: speech_module.speak(f"Navigating to {link}")
self.browser.go_to(link) self.browser.go_to(link)
self.search_history.append(link)
self.current_page = link self.current_page = link
page_text = self.browser.get_text() page_text = self.browser.get_text()
self.navigable_links = self.browser.get_navigable() self.navigable_links = self.browser.get_navigable()

View File

@ -128,6 +128,7 @@ class AgentRouter:
("Check if a file named project_proposal.pdf exists in my Documents", "LOW"), ("Check if a file named project_proposal.pdf exists in my Documents", "LOW"),
("Search the web for tips on improving coding skills", "LOW"), ("Search the web for tips on improving coding skills", "LOW"),
("Write a Python script to count words in a text file", "LOW"), ("Write a Python script to count words in a text file", "LOW"),
("Search the web for restaurant", "LOW"),
("Find a public API for sports scores and build a web app to show live updates", "HIGH"), ("Find a public API for sports scores and build a web app to show live updates", "HIGH"),
("Create a simple HTML page with CSS styling", "LOW"), ("Create a simple HTML page with CSS styling", "LOW"),
("hi", "LOW"), ("hi", "LOW"),
@ -264,6 +265,7 @@ class AgentRouter:
("Check if photo_backup.zip exists on my drive", "files"), ("Check if photo_backup.zip exists on my drive", "files"),
("Write a Python script to rename files with a timestamp", "code"), ("Write a Python script to rename files with a timestamp", "code"),
("Whats your favorite thing about space?", "talk"), ("Whats your favorite thing about space?", "talk"),
("search for GPU with at least 24gb vram", "web"),
("Browse the web for the latest fitness trends", "web"), ("Browse the web for the latest fitness trends", "web"),
("Move all .docx files to a Work folder", "files"), ("Move all .docx files to a Work folder", "files"),
("I would like to make a new project called 'new_project'", "files"), ("I would like to make a new project called 'new_project'", "files"),

View File

@ -44,7 +44,7 @@ class BashInterpreter(Tools):
command = command.replace('\n', '') command = command.replace('\n', '')
if self.safe_mode and is_unsafe(commands): if self.safe_mode and is_unsafe(commands):
return "Unsafe command detected, execution aborted." return "Unsafe command detected, execution aborted."
if self.language_bash_attempt(command): if self.language_bash_attempt(command) and allow_language_exec_bash == False:
continue continue
try: try:
process = subprocess.Popen( process = subprocess.Popen(

View File

@ -37,10 +37,14 @@ class Tools():
self.work_dir = self.create_work_dir() self.work_dir = self.create_work_dir()
self.excutable_blocks_found = False self.excutable_blocks_found = False
self.safe_mode = True self.safe_mode = True
self.allow_language_exec_bash = False
def get_work_dir(self): def get_work_dir(self):
return self.work_dir return self.work_dir
def set_allow_language_exec_bash(value: bool) -> None:
self.allow_language_exec_bash = value
def check_config_dir_validity(self): def check_config_dir_validity(self):
"""Check if the config directory is valid.""" """Check if the config directory is valid."""
path = self.config['MAIN']['work_dir'] path = self.config['MAIN']['work_dir']

View File

@ -17,12 +17,13 @@ class TestBrowserAgentParsing(unittest.TestCase):
# Test various link formats # Test various link formats
test_text = """ test_text = """
Check this out: https://thriveonai.com/15-ai-startups-in-japan-to-take-note-of, and www.google.com! Check this out: https://thriveonai.com/15-ai-startups-in-japan-to-take-note-of, and www.google.com!
Also try https://test.org/about?page=1. Also try https://test.org/about?page=1, hey this one as well bro https://weatherstack.com/documentation.
""" """
expected = [ expected = [
"https://thriveonai.com/15-ai-startups-in-japan-to-take-note-of", "https://thriveonai.com/15-ai-startups-in-japan-to-take-note-of",
"www.google.com", "www.google.com",
"https://test.org/about?page=1" "https://test.org/about?page=1",
"https://weatherstack.com/documentation"
] ]
result = self.agent.extract_links(test_text) result = self.agent.extract_links(test_text)
self.assertEqual(result, expected) self.assertEqual(result, expected)