fix : major issue with file reading tool, feat : prompt improvement for browser agent

This commit is contained in:
martin legrand 2025-04-10 16:35:07 +02:00
parent 7553d9dbb6
commit 369850b86d
11 changed files with 147 additions and 70 deletions

View File

@ -46,9 +46,8 @@ Some rules:
- Always provide a short sentence above the code for what it does, even for a hello world.
- Be efficient, no need to explain your code, unless asked.
- You do not ever need to use bash to execute code.
- Do not ever use user input, input are not supported by the system.
- Do not ever tell user how to run it. user know it.
- For simple explanation you don't need to code.
- In python do not use if __name__ == "__main__"
- If using gui, make sure echap or exit button close the program
- No lazyness, write and rewrite full code every time
- If query is unclear say REQUEST_CLARIFICATION

View File

@ -25,14 +25,15 @@ The file_finder tool is used to locate files on the users system. It is a sep
To use the file_finder tool, use this syntax:
```file_finder
toto.py
name=toto.py
```
This will return the path of the file toto.py and other informations.
Find file and read file:
```file_finder:read
toto.py
```file_finder
action=read
name=toto.py
```
This will return the content of the file toto.py.
@ -44,13 +45,15 @@ rules:
- Do not ever use editor such as vim or nano.
- Make sure to always cd your work folder before executing commands, like cd <work dir> && <your command>
- only use file name with file_finder, not path
- remember to use :read when reading file.
Example Interaction
User: "I need to find the file config.txt and read its contents."
Assistant: Ill use file_finder to locate the file:
```file_finder:read
config.txt
```file_finder
action=read
name=config.txt
```

View File

@ -72,6 +72,7 @@ Rules:
- Think about how the main.py will import the class from other coding agents.
- Coding agent should use a class based approach.
- One coding agent should work on one file at a time. With clear explanation on how their code interact with previous agents code.
- The file agent can only conduct one action at the time. successive file agent could be needed.
- Tell agent to execute without question.
- Only use web agent for finding necessary informations.
- If a task might require user email (eg: api services), do not write plan instead ask for user email.

View File

@ -45,9 +45,8 @@ Some rules:
- Always provide a short sentence above the code for what it does, even for a hello world.
- Be efficient, no need to explain your code, unless asked.
- You do not ever need to use bash to execute code.
- Do not ever use user input, input are not supported by the system.
- Do not ever tell user how to run it. user know it.
- For simple explanation you don't need to code.
- In python do not use if __name__ == "__main__"
- If using gui, make sure echap close the program
- No lazyness, write and rewrite full code every time
- If query is unclear say REQUEST_CLARIFICATION

View File

@ -35,14 +35,15 @@ The file_finder tool is used to locate files on the users system. It is a sep
To use the file_finder tool, use this syntax:
```file_finder
toto.py
name=toto.py
```
This will return the path of the file toto.py and other informations.
Find file and read file:
```file_finder:read
toto.py
```file_finder
action=read
name=toto.py
```
This will return the content of the file toto.py.
@ -60,8 +61,9 @@ User: "I need to find the file config.txt and read its contents."
Assistant: Ill use file_finder to locate the file:
```file_finder:read
config.txt
```file_finder
action=read
name=config.txt
```
Personality:

View File

@ -72,6 +72,7 @@ Rules:
- Think about how the main.py will import the class from other coding agents.
- Coding agent should use a class based approach.
- One coding agent should work on one file at a time. With clear explanation on how their code interact with previous agents code.
- The file agent can only conduct one action at the time. successive file agent could be needed.
- Tell agent to execute without question.
- Only use web agent for finding necessary informations.
- If a task might require user email (eg: api services), do not write plan instead ask for user email.

View File

@ -112,7 +112,6 @@ class BrowserAgent(Agent):
1. **Decide if the page answers the users query:**
- If it does, take notes of useful information (Note: ...), include relevant link in note, then move to a new page.
- If it does and no futher step would help with user request, say {Action.REQUEST_EXIT}.
- If it doesnt, say: Error: <why page don't help> then go back or navigate to another link.
2. **Navigate to a link by either: **
- Saying I will navigate to (write down the full URL) www.example.com/cats
@ -122,6 +121,12 @@ class BrowserAgent(Agent):
- Use Login if username/password specified by user. For quick task create account, remember password in a note.
- You can fill a form using [form_name](value). Don't {Action.GO_BACK.value} when filling form.
- If a form is irrelevant or you lack informations (eg: don't know user email) leave it empty.
4. **Decide if you completed the task**
- Check your notes. Do they fully answer the question? Did you verify with multiple pages?
- Are you sure its correct?
- If yes to all, say {Action.REQUEST_EXIT}.
- If no, or a page lacks info, go to another link.
- Never stop or ask the user for help.
**Rules:**
- Do not write "The page talk about ...", write your finding on the page and how they contribute to an answer.
@ -144,9 +149,9 @@ class BrowserAgent(Agent):
Error: x.com does not discuss anything related to the users query and no navigation link are usefull.
Action: {Action.GO_BACK.value}
Example 3 (query answer found, enought notes taken):
Note: I found on website www.example.com that ...<expand on information found>...
Given this answer the user query I should exit the web browser.
Example 3 (clear definitive query answer found or enought notes taken):
Note: I took 10 notes so far with enought finding to answer user question.
Therefore I should exit the web browser.
Action: {Action.REQUEST_EXIT.value}
Example 4 (loging form visible):
@ -161,9 +166,6 @@ class BrowserAgent(Agent):
You previously took these notes:
{notes}
Do not Step-by-Step explanation. Write Notes or Error as a long paragraph followed by your action.
If you conduct research do not exit until you have several notes.
Do not ever ask the user to conduct a task, do not ever exit expecting user intervention.
You should be Investigative, Curious and Skeptical.
"""
def llm_decide(self, prompt: str, show_reasoning: bool = False) -> Tuple[str, str]:

View File

@ -67,7 +67,9 @@ def create_driver(headless=False, stealth_mode=True) -> webdriver.Chrome:
chrome_options.add_argument("--autoplay-policy=user-gesture-required")
chrome_options.add_argument("--mute-audio")
chrome_options.add_argument("--disable-notifications")
chrome_options.add_argument('--window-size=1080,560')
resolutions = [(1920, 1080), (1366, 768), (1440, 900)]
width, height = random.choice(resolutions)
chrome_options.add_argument(f'--window-size={width},{height}')
if not stealth_mode:
# crx file can't be installed in stealth mode
crx_path = "./crx/nopecha.crx"
@ -85,6 +87,15 @@ def create_driver(headless=False, stealth_mode=True) -> webdriver.Chrome:
service = Service(chromedriver_path)
if stealth_mode:
driver = uc.Chrome(service=service, options=chrome_options)
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": """
Object.defineProperty(navigator, 'hardwareConcurrency', { get: () => Math.floor(Math.random() * 8) + 2 });
Object.defineProperty(navigator, 'deviceMemory', { get: () => Math.floor(Math.random() * 8) + 2 });
"""
})
chrome_version = driver.capabilities['browserVersion']
user_agent = f"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/{chrome_version} Safari/537.36"
chrome_options.add_argument(f'user-agent={user_agent}')
stealth(driver,
languages=["en-US", "en"],
vendor="Google Inc.",
@ -125,6 +136,7 @@ class Browser:
def go_to(self, url:str) -> bool:
"""Navigate to a specified URL."""
time.sleep(random.uniform(0.4, 2.5)) # more human behavior
try:
initial_handles = self.driver.window_handles
self.driver.get(url)
@ -465,9 +477,10 @@ if __name__ == "__main__":
#browser.go_to("https://practicetestautomation.com/practice-test-login/")
time.sleep(10)
print("AntiCaptcha / Form Test")
browser.go_to("https://www.google.com/recaptcha/api2/demo")
#browser.go_to("https://www.google.com/recaptcha/api2/demo")
browser.go_to("https://auth.leboncoin.fr/login")
inputs = browser.get_form_inputs()
#inputs = ['[input1](Martin)', f'[input2](Test)', '[input3](test@gmail.com)']
inputs = ['[input1](Martin)', f'[input2](Test)', '[input3](test@gmail.com)']
browser.fill_form_inputs(inputs)
browser.find_and_click_submission()
time.sleep(10)

View File

@ -1,6 +1,7 @@
import os
import sys
import torch
import random
from typing import List, Tuple, Type, Dict
from transformers import pipeline
@ -70,12 +71,30 @@ class AgentRouter:
Use the build in add_examples method of the Adaptive_classifier.
"""
few_shots = [
("can you find api and build a python web app with it ?", "HIGH"),
("can you lookup for api that track flight and build a web flight tracking app", "HIGH"),
("hi", "LOW"),
("How it's going ?", "LOW"),
("Whats the weather like today?", "LOW"),
("Can you find a file named notes.txt in my Documents folder?", "LOW"),
("Write a Python script to generate a random password", "LOW"),
("Debug this JavaScript code thats not running properly", "LOW"),
("Search the web for the cheapest laptop under $500", "LOW"),
("Locate a file called report_2024.pdf on my drive", "LOW"),
("Check if a folder named Backups exists on my system", "LOW"),
("Can you find family_vacation.mp4 in my Videos folder?", "LOW"),
("Search my drive for a file named todo_list.xlsx", "LOW"),
("Write a Python function to check if a string is a palindrome", "LOW"),
("Can you search the web for startups in Berlin?", "LOW"),
("Find recent articles on blockchain technology online", "LOW"),
("Check if Personal_Projects folder exists on my desktop", "LOW"),
("Create a bash script to list all running processes", "LOW"),
("Debug this Python script thats crashing on line 10", "LOW"),
("Browse the web to find out who invented Python", "LOW"),
("Locate a file named shopping_list.txt on my system", "LOW"),
("Search the web for tips on staying productive", "LOW"),
("Find sales_pitch.pptx in my Downloads folder", "LOW"),
("can you find a file called resume.docx on my drive?", "LOW"),
("can you write a python script to check if the device on my network is connected to the internet", "LOW"),
("can you debug this Java code? Its not working.", "LOW"),
("can you browse the web and find me a 4090 for cheap?", "LOW"),
("can you find the old_project.zip file somewhere on my drive?", "LOW"),
("can you locate the backup folder I created last month on my system?", "LOW"),
("could you check if the presentation.pdf file exists in my downloads?", "LOW"),
@ -91,52 +110,71 @@ class AgentRouter:
("find the file important_notes.txt", "LOW"),
("search the web for the best ways to learn a new language", "LOW"),
("locate the file presentation.pptx in my Documents folder", "LOW"),
("Make a 3d game in javascript using three.js", "HIGH"),
("Create a whole web app in python using the flask framework that query news API", "HIGH"),
("Find the latest research papers on AI and build a web app that display them", "HIGH"),
("Create a bash script that monitor the CPU usage and send an email if it's too high", "HIGH"),
("Make a 3d game in javascript using three.js", "LOW"),
("Find the latest research papers on AI and build save in a file", "HIGH"),
("Make a web server in go that serve a simple html page", "LOW"),
("Make a web server in go that query a weather API and display the weather", "HIGH"),
("Make a web search for latest news on the stock market and display them", "HIGH"),
("Search the web for latest ai papers", "LOW"),
("Write a Python script to calculate the factorial of a number", "LOW"),
("Can you find a weather API and build a Python app to display current weather", "HIGH"),
("Search the web for the cheapest 4K monitor and provide a link", "LOW"),
("Create a Python web app using Flask to track cryptocurrency prices from an API", "HIGH"),
("Write a JavaScript function to reverse a string", "LOW"),
("Can you locate a file called budget_2025.xlsx on my system?", "LOW"),
("Search the web for recent articles on space exploration", "LOW"),
("Find a public API for movie data and build a web app to display movie ratings", "HIGH"),
("Write a bash script to list all files in a directory", "LOW"),
("when is the exam period for master student in france?", "LOW"),
("Check if a folder named Photos_2024 exists on my desktop", "LOW"),
("Can you look up some nice knitting patterns on that web thingy?", "LOW"),
("Goodness, check if my Photos_Grandkids folder is still on the desktop", "LOW"),
("Create a Python script to rename all files in a folder based on their creation date", "LOW"),
("Search the web for tutorials on machine learning and build a simple ML model in Python", "HIGH"),
("Debug this Python code thats throwing an error", "LOW"),
("Can you find a file named meeting_notes.txt in my Downloads folder?", "LOW"),
("Create a JavaScript game using Phaser.js with multiple levels", "HIGH"),
("Write a Go program to check if a port is open on a network", "LOW"),
("Search the web for the latest electric car reviews", "LOW"),
("Find a public API for book data and create a Flask app to list bestsellers", "HIGH"),
("Write a Python function to merge two sorted lists", "LOW"),
("Organize my desktop files by extension and then write a script to list them", "HIGH"),
("Create a bash script to monitor disk space and alert via text file", "LOW"),
("can you find vitess repo, clone it and install by following the readme", "HIGH"),
("Whats out there on the web about cheap travel spots?", "LOW"),
("Search X for posts about AI ethics and summarize them", "LOW"),
("Can you follow the readme and install the project", "HIGH"),
("Find the latest research on renewable energy and build a web app to display it", "HIGH"),
("Write a C program to sort an array of integers", "LOW"),
("Create a Node.js server that queries a public API for traffic data and displays it", "HIGH"),
("Check if a file named project_proposal.pdf exists in my Documents", "LOW"),
("Search the web for tips on improving coding skills", "LOW"),
("Write a Python script to count words in a text file", "LOW"),
("Search the web for restaurant", "LOW"),
("Find a public API for sports scores and build a web app to show live updates", "HIGH"),
("Create a simple HTML page with CSS styling", "LOW"),
("hi", "LOW"),
("Bonjour", "LOW"),
("What's up ?", "LOW"),
("Use file.txt and then use it to ...", "HIGH"),
("Yo, whats good? Find my mixtape.mp3 real quick", "LOW"),
("Can you follow the readme and install the project", "HIGH"),
("Man, write me a dope Python script to flex some random numbers", "LOW"),
("Search the web for peer-reviewed articles on gene editing", "LOW"),
("Locate meeting_notes.docx in Downloads, Im late for this call", "LOW"),
("Write a Python script to list all .pdf files in my Documents", "LOW"),
("Write a Python thing to sort my .jpg files by date", "LOW"),
("Find gallery_list.pdf, then build a web app to show my pics", "HIGH"),
("Find budget_2025.xlsx, analyze it, and make a chart for my boss", "HIGH"),
("Retrieve the latest publications on CRISPR and develop a web application to display them", "HIGH"),
("Bro dig up a music API and build me a tight app for the hottest tracks", "HIGH"),
("Find a public API for sports scores and build a web app to show live updates", "HIGH"),
("Find a public API for book data and create a Flask app to list bestsellers", "HIGH"),
("Organize my desktop files by extension and then write a script to list them", "HIGH"),
("Find the latest research on renewable energy and build a web app to display it", "HIGH"),
("can you find vitess repo, clone it and install by following the readme", "HIGH"),
("Create a JavaScript game using Phaser.js with multiple levels", "HIGH"),
("Use my research_note.txt file, double check the informations on the web", "HIGH"),
("Make a web server in go that query a flight API and display them in a app", "HIGH"),
("can you lookup for api that track flight and build a web flight tracking app", "HIGH"),
("Find the file toto.pdf then use its content to reply to Jojo on superforum.com", "HIGH"),
("Create a whole web app in python using the flask framework that query news API", "HIGH"),
("Create a bash script that monitor the CPU usage and send an email if it's too high", "HIGH"),
("Make a web search for latest news on the stock market and display them with python", "HIGH"),
("Find my resume file, apply to job that might fit online", "HIGH"),
("Can you find a weather API and build a Python app to display current weather", "HIGH"),
("Create a Python web app using Flask to track cryptocurrency prices from an API", "HIGH"),
("Search the web for tutorials on machine learning and build a simple ML model in Python", "HIGH"),
("Find a public API for movie data and build a web app to display movie ratings", "HIGH"),
("Create a Node.js server that queries a public API for traffic data and displays it", "HIGH"),
("can you find api and build a python web app with it ?", "HIGH"),
("Find a public API for recipe data and build a web app to display recipes", "HIGH"),
("Search the web for recent space mission updates and build a Flask app", "HIGH"),
("Create a Python script to scrape a website and save data to a database", "HIGH"),
("Find a public API for fitness tracking and build a web app to show stats", "HIGH"),
("Search the web for tutorials on web development and build a sample site", "HIGH"),
("Create a Node.js app to query a public API for event listings and display them", "HIGH"),
("Find a file named budget.xlsx, analyze its data, and generate a chart", "HIGH"),
]
random.shuffle(few_shots)
texts = [text for text, _ in few_shots]
labels = [label for _, label in few_shots]
self.complexity_classifier.add_examples(texts, labels)
@ -287,6 +325,7 @@ class AgentRouter:
("hi", "talk"),
("hello", "talk"),
]
random.shuffle(few_shots)
texts = [text for text, _ in few_shots]
labels = [label for _, label in few_shots]
self.talk_classifier.add_examples(texts, labels)

View File

@ -62,11 +62,13 @@ class FileFinder(Tools):
file_path = None
excluded_files = [".pyc", ".o", ".so", ".a", ".lib", ".dll", ".dylib", ".so", ".git"]
for root, dirs, files in os.walk(directory_path):
for file in files:
if any(excluded_file in file for excluded_file in excluded_files):
for f in files:
if f is None:
continue
if filename.strip() in file.strip():
file_path = os.path.join(root, file)
if any(excluded_file in f for excluded_file in excluded_files):
continue
if filename.strip() in f.strip():
file_path = os.path.join(root, f)
return file_path
return None
@ -82,27 +84,25 @@ class FileFinder(Tools):
if not blocks or not isinstance(blocks, list):
return "Error: No valid filenames provided"
results = []
output = ""
for block in blocks:
filename = block.split(":")[0]
filename = self.get_parameter_value(block, "name")
action = self.get_parameter_value(block, "action")
if filename is None:
output = "Error: No filename provided\n"
return output
if action is None:
action = "info"
file_path = self.recursive_search(self.work_dir, filename)
if file_path is None:
results.append({"filename": filename, "error": "File not found"})
output = f"File: {filename} - not found\n"
continue
if len(block.split(":")) > 1:
action = block.split(":")[1]
else:
action = "info"
result = self.get_file_info(file_path)
results.append(result)
output = ""
for result in results:
if "error" in result:
output += f"File: {result['filename']} - {result['error']}\n"
else:
if action == "read":
output += result['read']
output += "Content:\n" + result['read'] + "\n"
else:
output += (f"File: {result['filename']}, "
f"found at {result['path']}, "
@ -144,7 +144,10 @@ class FileFinder(Tools):
if __name__ == "__main__":
tool = FileFinder()
result = tool.execute(["toto.txt"], False)
result = tool.execute(["""
action=read
name=tools.py
"""], False)
print("Execution result:")
print(result)
print("\nFailure check:", tool.execution_failure_check(result))

View File

@ -127,6 +127,21 @@ class Tools():
with open(os.path.join(directory, save_path_file), 'w') as f:
f.write(block)
def get_parameter_value(self, block: str, parameter_name: str) -> str:
"""
Get a parameter name.
Args:
block (str): The block of text to search for the parameter
parameter_name (str): The name of the parameter to retrieve
Returns:
str: The value of the parameter
"""
for param_line in block.split('\n'):
if parameter_name in param_line:
param_value = param_line.split('=')[1].strip()
return param_value
return None
def found_executable_blocks(self):
"""
Check if executable blocks were found.