Merge pull request #92 from Fosowl/dev

Improve web form handling & Better prompt for planner agent.
This commit is contained in:
Martin 2025-04-01 19:47:25 +02:00 committed by GitHub
commit d476cf91dc
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
21 changed files with 625 additions and 70 deletions

424
README_FR.md Normal file
View File

@ -0,0 +1,424 @@
# AgenticSeek: Une IA comme Manus mais à base d'agents DeepSeek R1 fonctionnant en local.
Une alternative **entièrement locale** à Manus AI, un assistant vocal IA qui code, explore votre système de fichiers, navigue sur le web et corrige ses erreurs, tout cela sans envoyer la moindre donnée dans le cloud. Construit avec des modèles de raisonnement comme DeepSeek R1, cet agent autonome fonctionne entièrement sur votre hardware, garantissant la confidentialité de vos données.
[![Visit AgenticSeek](https://img.shields.io/static/v1?label=Website&message=AgenticSeek&color=blue&style=flat-square)](https://fosowl.github.io/agenticSeek.html) ![License](https://img.shields.io/badge/license-GPL--3.0-green) [![Discord](https://img.shields.io/badge/Discord-Join%20Us-7289DA?logo=discord&logoColor=white)](https://discord.gg/4Ub2D6Fj)
> 🛠️ **En cours de développement** On cherche activement des contributeurs!
![alt text](./media/whale_readme.jpg)
> *Do a deep search of AI startup in Osaka and Tokyo, find at least 5, then save in the research_japan.txt file*
> *Can you make a tetris game in C ?*
> *I would like to setup a new project file index as mark2.*
### agenticSeek peut planifier des taches!
![alt text](./media/exemples/demo_image.png)
## Fonctionnalités:
- **100% Local**: Fonctionne en local sur votre PC. Vos données restent les vôtres.
- **Accès à vos Fichiers**: Utilise bash pour naviguer et manipuler vos fichiers.
- **Codage semi-autonome**: Peut écrire, déboguer et exécuter du code en Python, C, Golang et d'autres langages à venir.
- **Routage d'Agent**: Sélectionne automatiquement lagent approprié pour la tâche.
- **Planification**: Pour les taches complexe utilise plusieurs agents.
- **Navigation Web Autonome**: Navigation web autonome.
- **Memoire efficace**: Gestion efficace de la mémoire et des sessions.
---
## **Installation**
Assurez-vous davoir installé le pilote Chrome, Docker et Python 3.10 (ou une version plus récente).
Pour les problèmes liés au pilote Chrome, consultez la section Chromedriver.
### 1⃣ Cloner le dépôt et configurer
```sh
git clone https://github.com/Fosowl/agenticSeek.git
cd agenticSeek
mv .env.example .env
```
### 2 **Créer un environnement virtuel**
```sh
python3 -m venv agentic_seek_env
source agentic_seek_env/bin/activate
# On Windows: agentic_seek_env\Scripts\activate
```
### 3⃣ **Installation**
**Automatique:**
```sh
./install.sh
```
**Manuel:**
```sh
pip3 install -r requirements.txt
```
## Faire fonctionner sur votre machine
**Nous recommandons dutiliser au moins DeepSeek 14B, les modèles plus petits ont du mal avec lutilisation des outils et oublient rapidement le contexte.**
### 1**Téléchargement du modèle**
Assurer vous d'avoir [Ollama](https://ollama.com/) installé.
Télécharger `deepseek-r1:14b` de [DeepSeek](https://deepseek.com/models)
```sh
ollama pull deepseek-r1:14b
```
### 2 **Démarrage d'ollama**
```sh
ollama serve
```
Modifiez le fichier config.ini pour définir provider_name sur ollama et provider_model sur deepseek-r1:14b
```sh
[MAIN]
is_local = True
provider_name = ollama
provider_model = deepseek-r1:14b
provider_server_address = 127.0.0.1:11434
```
démarrer tous les services :
```sh
sudo ./start_services.sh
```
Lancer l'assitant:
```sh
python3 main.py
```
Voir la section **Utilisation** si vous ne comprenez pas comment lutiliser
Voir la section **Problèmes** connus si vous rencontrez des problèmes
Voir la section **Exécuter** avec une API si votre matériel ne peut pas exécuter DeepSeek localement
Voir la section **Configuration** pour une explication détaillée du fichier de configuration.
---
## Utilisation
Avertissement : actuellement, le système qui choisit le meilleur agent IA fonctionnera mal avec du texte non anglophone. Cela est dû au fait que le routage des agents utilise un modèle entraîné sur du texte en anglais. Nous travaillons dur pour corriger cela. Veuillez utiliser langlais pour le moment.
Assurez-vous que les services sont en cours dexécution avec ./start_services.sh et lancez AgenticSeek avec python3 main.py
```sh
sudo ./start_services.sh
python3 main.py
```
Vous verrez un prompt: ">>> "
Cela indique quAgenticSeek attend que vous saisissiez des instructions.
Vous pouvez également utiliser la reconnaissance vocale en définissant listen = True dans la configuration.
Pour quitter, dites simplement `goodbye`.
Voici quelques exemples dutilisation :
### Programmation
> *Help me with matrix multiplication in Golang*
> *Scan my network with nmap, find if any suspicious devices is connected*
> *Make a snake game in python*
### Recherche web
> *Do a web search to find cool tech startup in Japan working on cutting edge AI research*
> *Can you find on the internet who created agenticSeek?*
> *Can you find on which website I can buy a rtx 4090 for cheap*
### Fichier
> *Hey can you find where is million_dollars_contract.pdf i lost it*
> *Show me how much space I have left on my disk*
> *Find and read the README.md and follow the install instruction*
### Conversation
> *Tell me about France*
> *What is the meaning of life ?*
> *Should I take creatine before or after workout?*
Après avoir saisi votre requête, AgenticSeek attribuera le meilleur agent pour la tâche.
Comme il sagit dun prototype, le système de routage des agents pourrait ne pas toujours attribuer le bon agent en fonction de votre requête.
Par conséquent, vous devez être explicite sur ce que vous voulez et sur la manière dont lIA doit procéder. Par exemple, si vous voulez quelle effectue une recherche sur le web, ne dites pas :
Connait-tu de bons pays pour voyager seul ?
Dites plutôt :
Fait une recherche sur le web, quels sont les meilleurs pays pour voyager seul?
---
## **Exécuter le LLM sur votre propre serveur**
Si vous disposez dun ordinateur puissant ou dun serveur que vous voulez utiliser, mais que vous souhaitez y accéder depuis votre ordinateur portable, vous avez la possibilité dexécuter le LLM sur un serveur distant.
### 1**Configurer et démarrer les scripts du serveur**
Sur votre "serveur" qui exécutera le modèle IA, obtenez ladresse IP
```sh
ip a | grep "inet " | grep -v 127.0.0.1 | awk '{print $2}' | cut -d/ -f1
```
Remarque : Pour Windows ou macOS, utilisez respectivement ipconfig ou ifconfig pour trouver ladresse IP.
**Si vous souhaitez utiliser un fournisseur basé sur OpenAI, suivez la section Exécuter avec une API.**
Clonez le dépôt et entrez dans le dossier server/.
```sh
git clone --depth 1 https://github.com/Fosowl/agenticSeek.git
cd agenticSeek/server/
```
Installez les dépendances spécifiques au serveur :
```sh
pip3 install -r requirements.txt
```
Exécutez le script du serveur.
```sh
python3 app.py --provider ollama --port 3333
```
Vous avez le choix entre utiliser ollama et llamacpp comme service LLM.
### 2**Lancer**
Maintenant, sur votre ordinateur personnel :
Modifiez le fichier config.ini pour définir provider_name sur server et provider_model sur deepseek-r1:14b.
Définissez provider_server_address sur ladresse IP de la machine qui exécutera le modèle.
```sh
[MAIN]
is_local = False
provider_name = server
provider_model = deepseek-r1:14b
provider_server_address = x.x.x.x:3333
```
Exécutez lassistant :
```sh
sudo ./start_services.sh
python3 main.py
```
## **Exécuter avec une API**
AVERTISSEMENT : Assurez-vous quil ny a pas despace en fin de ligne dans la configuration.
Définissez is_local sur True si vous utilisez une API basée sur OpenAI localement.
Changez ladresse IP si votre API basée sur OpenAI fonctionne sur votre propre serveur.
```sh
[MAIN]
is_local = False
provider_name = openai
provider_model = gpt-4o
provider_server_address = 127.0.0.1:5000
```
Exécutez lassistant :
```sh
sudo ./start_services.sh
python3 main.py
```
## Config
Exemple de configuration :
```
[MAIN]
is_local = True
provider_name = ollama
provider_model = deepseek-r1:1.5b
provider_server_address = 127.0.0.1:11434
agent_name = Friday
recover_last_session = False
save_session = False
speak = False
listen = False
work_dir = /Users/mlg/Documents/ai_folder
jarvis_personality = False
[BROWSER]
headless_browser = False
stealth_mode = False
```
**Explanation**:
`is_local` -> Exécute lagent localement (True) ou sur un serveur distant (False).
`provider_name` -> Le fournisseur à utiliser (parmi : ollama, server, lm-studio, deepseek-api).
`provider_model` -> Le modèle utilisé, par exemple, deepseek-r1:1.5b.
`provider_server_address` -> Adresse du serveur, par exemple, 127.0.0.1:11434 pour local. Définissez nimporte quoi pour une API non locale.
`agent_name` -> Nom de lagent, par exemple, Friday. Utilisé comme mot déclencheur pour la reconnaissance vocale.
`recover_last_session` -> Reprend la dernière session (True) ou non (False).
`save_session` -> Sauvegarde les données de la session (True) ou non (False).
`speak` -> Active la sortie vocale (True) ou non (False).
`listen` -> Écoute les entrées vocales (True) ou non (False).
`work_dir` -> Dossier auquel lIA aura accès, par exemple : /Users/user/Documents/.
`jarvis_personality` -> Utilise une personnalité de type JARVIS (True) ou non (False). Cela modifie simplement le fichier de prompt.
`headless_browser` -> Exécute le navigateur sans fenêtre visible (True) ou non (False).
`stealth_mode` -> Rend la détection des bots plus difficile. Le seul inconvénient est que vous devez installer manuellement lextension anticaptcha.
## Providers
Le tableau ci-dessous montre les fournisseurs disponibles :
| Provider | Local? | Description |
|-----------|--------|-----------------------------------------------------------|
| ollama | Yes | Exécutez des LLM localement avec facilité en utilisant Ollama comme fournisseur LLM
| server | Yes | Hébergez le modèle sur une autre machine, exécutez sur votre machine locale
| lm-studio | Yes | Exécutez un LLM localement avec LM Studio (définissez provider_name sur lm-studio)
| openai | No | Utilise ChatGPT API (pas privé) |
| deepseek-api | No | Deepseek API (pas privé) |
| huggingface| No | Hugging-Face API (pas privé) |
Pour sélectionner un fournisseur, modifiez le config.ini :
```
is_local = False
provider_name = openai
provider_model = gpt-4o
provider_server_address = 127.0.0.1:5000
```
`is_local` : doit être True pour tout LLM exécuté localement, sinon False.
`provider_name` : Sélectionnez le fournisseur à utiliser par son nom, voir la liste des fournisseurs ci-dessus.
`provider_model` : Définissez le modèle à utiliser par lagent.
`provider_server_address` : peut être défini sur nimporte quoi si vous nutilisez pas le fournisseur server.
# Problèmes connus
## Problèmes avec Chromedriver
Erreur #1:**incompatibilité**
`Exception: Failed to initialize browser: Message: session not created: This version of ChromeDriver only supports Chrome version 113
Current browser version is 134.0.6998.89 with binary path`
Cela se produit sil y a une incompatibilité entre votre navigateur et la version de chromedriver.
Vous devez naviguer pour télécharger la dernière version :
https://developer.chrome.com/docs/chromedriver/downloads
Si vous utilisez Chrome version 115 ou plus récent, allez sur :
https://googlechromelabs.github.io/chrome-for-testing/
Et téléchargez la version de chromedriver correspondant à votre système dexploitation.
![alt text](./media/chromedriver_readme.png)
Si cette section est incomplète, veuillez signaler un problème.
## FAQ
**Q: What hardware do I need?**
Modèle 7B : GPU avec 8 Go de VRAM.
Modèle 14B : GPU 12 Go (par exemple, RTX 3060).
Modèle 32B : 24 Go+ de VRAM.
**Q: Why Deepseek R1 over other models?**
DeepSeek R1 excelle dans le raisonnement et lutilisation doutils pour sa taille. Nous pensons que cest un choix solide pour nos besoins, bien que dautres modèles fonctionnent également bien, DeepSeek est notre choix principal.
**Q: I get an error running `main.py`. What do I do?**
Assurez-vous quOllama est en cours dexécution (ollama serve), que votre config.ini correspond à votre fournisseur, et que les dépendances sont installées. Si cela ne fonctionne pas, nhésitez pas à signaler un problème.
**Q: Can it really run 100% locally?**
Oui, avec les fournisseurs Ollama ou Server, toute la reconnaissance vocale, le LLM et la synthèse vocale fonctionnent localement. Les options non locales (OpenAI ou autres API) sont facultatives.
**Q: How come it is older than manus ?**
Nous avons commencé cela comme un projet amusant pour créer une IA locale de type Jarvis. Cependant, avec lémergence de Manus, nous avons vu lopportunité de réorienter certaines tâches pour en faire une autre alternative.
**Q: How is it better than manus ?**
Il ne lest pas, mais nous privilégions lexécution locale et la confidentialité par rapport à une approche basée sur le cloud. Cest une alternative amusante et accessible !
## Contribute
Nous recherchons des développeurs pour améliorer AgenticSeek ! Consultez les problèmes ouverts ou les discussions.
[![Star History Chart](https://api.star-history.com/svg?repos=Fosowl/agenticSeek&type=Date)](https://www.star-history.com/#Fosowl/agenticSeek&Date)
## Auteurs:
> [Fosowl](https://github.com/Fosowl)
> [steveh8758](https://github.com/steveh8758)

View File

@ -57,7 +57,8 @@ def main():
interaction = Interaction(agents,
tts_enabled=config.getboolean('MAIN', 'speak'),
stt_enabled=config.getboolean('MAIN', 'listen'),
recover_last_session=config.getboolean('MAIN', 'recover_last_session'))
recover_last_session=config.getboolean('MAIN', 'recover_last_session'),
)
try:
while interaction.is_active:
interaction.get_user()

View File

@ -47,4 +47,5 @@ Some rules:
- Do not ever use user input, input are not supported by the system.
- Do not ever tell user how to run it. user know it.
- For simple explanation you don't need to code.
- Dont be lazy, write full implementation for user.
- If query is unclear say REQUEST_CLARIFICATION

View File

@ -42,6 +42,7 @@ rules:
- Use file finder to find the path of the file.
- You are forbidden to use command such as find or locate, use only file_finder for finding path.
- Do not ever use editor such as vim or nano.
- Make sure to always cd your work folder before executing commands, like cd <work dir> && <your command>
Example Interaction
User: "I need to find the file config.txt and read its contents."

View File

@ -1,8 +1,8 @@
You are a planner agent.
You are a project manager.
Your goal is to divide and conquer the task using the following agents:
- Coder: An expert coder agent.
- File: An expert agent for finding files.
- Web: An expert agent for web search.
- Coder: A programming agent, can code in python, bash, C and golang.
- File: An agent for finding, reading or operating with files.
- Web: An agent that can conduct web search, wrapped with selenium it can interact with any webpage.
Agents are other AI that obey your instructions.
@ -13,14 +13,18 @@ You have to respect a strict format:
{"agent": "agent_name", "need": "needed_agent_output", "task": "agent_task"}
```
# Example 1: web app
User: make a weather app in python
You: Sure, here is the plan:
## Task 1: I will search for available weather api
## Task 1: I will search for available weather api with the help of the web agent.
## Task 2: I will create an api key for the weather api
## Task 2: I will create an api key for the weather api using the web agent
## Task 3: I will make a weather app in python
## Task 3: I will setup the project using the file agent
## Task 4: I asign the coding agent to make a weather app in python
```json
{
@ -37,11 +41,17 @@ You: Sure, here is the plan:
"need": "1",
"task": "Obtain API key from the selected service"
},
{
"agent": "File",
"id": "3",
"need": null,
"task": "Create and setup a web app folder for a python project. initialize as a git repo with all required file and a sources folder. You are forbidden from asking clarification, just execute."
},
{
"agent": "Coder",
"id": "3",
"need": "2",
"task": "Develop a Python application using the API and key to fetch and display weather data"
"need": "2,3",
"task": "Based on the project structure. Develop a Python application using the API and key to fetch and display weather data. You are forbidden from asking clarification, just execute.""
}
]
}
@ -49,4 +59,8 @@ You: Sure, here is the plan:
Rules:
- Do not write code. You are a planning agent.
- Give clear, detailled order to each agent and how their task relate to the previous task (if any).
- Put your plan in a json with the key "plan".
- Always tell the coding agent where to save file, eg: .
- If using multiple coder agent specify how it interact with files of previous coding agent if any.
- Tell agent they are soldier, they execute without question.

View File

@ -46,6 +46,7 @@ Some rules:
- You do not ever need to use bash to execute code.
- Do not ever use user input, input are not supported by the system.
- Do not ever tell user how to run it. user know it.
- Dont be lazy, write full implementation for user.
- For simple explanation you don't need to code.
- If query is unclear say REQUEST_CLARIFICATION

View File

@ -51,6 +51,7 @@ rules:
- Do not ever use placeholder path like /path/to/file.c, find the path first.
- Use file finder to find the path of the file.
- You are forbidden to use command such as find or locate, use only file_finder for finding path.
- Make sure to always cd your work folder before executing commands, like cd <work dir> && <your command>
- Do not ever use editor such as vim or nano.
Example Interaction

View File

@ -15,16 +15,17 @@ You have to respect a strict format:
# Example: weather app
User: "I need a plan to build a weather app—search for a weather API, get an API key, and code it in Python."
User: "I need to build a simple weather app, get an API key, and code it in Python."
You: "At your service. Ive devised a plan to conquer the meteorological frontier.
You: "At your service. Ive devised a plan and assigned agents to each task. Would you like me to proceed?
## Task one: scour the web for a weather API worth its salt.
## Task 1: I will search for available weather api with the help of the web agent.
## Task two: secure an API key with utmost discretion.
## Task 2: I will create an api key for the weather api using the web agent.
## Task three: unleash a Python app to bend the weather to your will."
## Task 3: I will setup the project using the file agent.
## Task 4: I will use the coding agent to make a weather app in python.
```json
{
@ -41,11 +42,17 @@ You: "At your service. Ive devised a plan to conquer the meteorological fron
"need": "1",
"task": "Obtain API key from the selected service"
},
{
"agent": "File",
"id": "3",
"need": null,
"task": "Create and setup a web app folder for a python project. initialize as a git repo with all required file and a sources folder. You are forbidden from asking clarification, just execute."
},
{
"agent": "Coder",
"id": "3",
"need": "2",
"task": "Develop a Python application using the API and key to fetch and display weather data"
"need": "2,3",
"task": "Based on the project structure. Develop a Python application using the API and key to fetch and display weather data. You are forbidden from asking clarification, just execute.""
}
]
}
@ -53,7 +60,11 @@ You: "At your service. Ive devised a plan to conquer the meteorological fron
Rules:
- Do not write code. You are a planning agent.
- Give clear, detailled order to each agent and how their task relate to the previous task (if any).
- Put your plan in a json with the key "plan".
- Always tell the coding agent where to save file, eg: .
- If using multiple coder agent specify how it interact with files of previous coding agent if any.
- Tell agent they are soldier, they execute without question.
Personality:

View File

@ -133,6 +133,8 @@ class Agent():
Show the answer in a pretty way.
Show code blocks and their respective feedback by inserting them in the ressponse.
"""
if self.last_answer is None:
return
lines = self.last_answer.split("\n")
for line in lines:
if "block:" in line:
@ -190,5 +192,5 @@ class Agent():
self.memory.push('user', feedback)
if save_path != None:
tool.save_block(blocks, save_path)
self.blocks_result = list(reversed(self.blocks_result))
self.blocks_result = self.blocks_result
return True, feedback

View File

@ -6,7 +6,7 @@ from sources.agents.agent import Agent
from sources.tools.searxSearch import searxSearch
from sources.browser import Browser
from datetime import date
from typing import List, Tuple
from typing import List, Tuple, Type, Dict
class BrowserAgent(Agent):
def __init__(self, name, prompt_path, provider, verbose=False, browser=None):
@ -92,7 +92,7 @@ class BrowserAgent(Agent):
Your task:
1. Decide if the current page answers the users query: {user_prompt}
- If it does, take notes of the useful information, write down source, link or reference, then move to a new page.
- If it does and you are 100% certain that it provide a definive answer, say REQUEST_EXIT
- If it does and you completed user request, say REQUEST_EXIT
- If it doesnt, say: Error: This page does not answer the users query then go back or navigate to another link.
2. Navigate by either:
- Navigate to a navigation links (write the full URL, e.g., www.example.com/cats).
@ -100,7 +100,7 @@ class BrowserAgent(Agent):
3. Fill forms on the page:
- If user give you informations that help you fill form, fill it.
- If you don't know how to fill a form, leave it empty.
- You can fill a form using [form_name](value).
- You can fill a form using [form_name](value). Do not go back when you fill a form.
Recap of note taking:
If useful -> Note: [Briefly summarize the key information or task you conducted.]
@ -125,8 +125,8 @@ class BrowserAgent(Agent):
Example 4 (loging form visible):
Note: I am on the login page, I should now type the given username and password.
[form_name_1](David)
[form_name_2](edgerunners_2077)
[username_field](David)
[password_field](edgerunners77)
You see the following inputs forms:
{inputs_form_text}
@ -143,9 +143,10 @@ class BrowserAgent(Agent):
animate_thinking("Thinking...", color="status")
self.memory.push('user', prompt)
answer, reasoning = self.llm_request()
pretty_print("-"*100)
pretty_print(answer, color="output")
pretty_print("-"*100)
output = answer if len(answer) > 16 else f"Action: {answer}\nReasoning: {reasoning}"
print()
pretty_print(output, color="output")
print()
return answer, reasoning
def select_unvisited(self, search_result: List[str]) -> List[str]:
@ -175,7 +176,7 @@ class BrowserAgent(Agent):
return parsed_results
def stringify_search_results(self, results_arr: List[str]) -> str:
return '\n\n'.join([f"Link: {res['link']}" for res in results_arr])
return '\n\n'.join([f"Link: {res['link']}\nPreview: {res['snippet']}" for res in results_arr])
def save_notes(self, text):
lines = text.split('\n')
@ -214,19 +215,51 @@ class BrowserAgent(Agent):
Do not explain, do not write anything beside the search query.
If the query does not make any sense for a web search explain why and say REQUEST_EXIT
"""
def handle_update_prompt(self, user_prompt: str, page_text: str) -> str:
return f"""
You are a web browser.
You just filled a form on the page.
Now you should see the result of the form submission on the page:
Page text:
{page_text}
The user asked: {user_prompt}
Does the page answer the users query now?
If it does, take notes of the useful information, write down result and say FORM_FILLED.
If you were previously on a login form, no need to explain.
If it does and you completed user request, say REQUEST_EXIT
if it doesnt, say: Error: This page does not answer the users query then GO_BACK.
"""
def show_search_results(self, search_result: List[str]):
pretty_print("\nSearch results:", color="output")
for res in search_result:
pretty_print(f"Title: {res['title']} - ", color="info", no_newline=True)
pretty_print(f"Link: {res['link']}", color="status")
def process(self, user_prompt, speech_module) -> str:
def process(self, user_prompt: str, speech_module: type) -> Tuple[str, str]:
"""
Process the user prompt to conduct an autonomous web search.
Start with a google search with searxng using web_search tool.
Then enter a navigation logic to find the answer or conduct required actions.
Args:
user_prompt: The user's input query
speech_module: Optional speech output module
Returns:
tuple containing the final answer and reasoning
"""
complete = False
animate_thinking(f"Thinking...", color="status")
self.memory.push('user', self.search_prompt(user_prompt))
ai_prompt, _ = self.llm_request()
if "REQUEST_EXIT" in ai_prompt:
# request make no sense, maybe wrong agent was allocated?
pretty_print(f"{reasoning}\n{ai_prompt}", color="output")
return ai_prompt, ""
animate_thinking(f"Searching...", color="status")
search_result_raw = self.tools["web_search"].execute([ai_prompt], False)
search_result = self.jsonify_search_results(search_result_raw)[:12] # until futher improvement
self.show_search_results(search_result)
prompt = self.make_newsearch_prompt(user_prompt, search_result)
unvisited = [None]
while not complete:
@ -236,7 +269,10 @@ class BrowserAgent(Agent):
extracted_form = self.extract_form(answer)
if len(extracted_form) > 0:
self.browser.fill_form_inputs(extracted_form)
self.browser.find_and_click_submit()
self.browser.find_and_click_submission()
page_text = self.browser.get_text()
answer = self.handle_update_prompt(user_prompt, page_text)
answer, reasoning = self.llm_decide(prompt)
if "REQUEST_EXIT" in answer:
complete = True
@ -246,6 +282,12 @@ class BrowserAgent(Agent):
if len(unvisited) == 0:
break
if "FORM_FILLED" in answer:
page_text = self.browser.get_text()
self.navigable_links = self.browser.get_navigable()
prompt = self.make_navigation_prompt(user_prompt, page_text)
continue
if len(links) == 0 or "GO_BACK" in answer:
unvisited = self.select_unvisited(search_result)
prompt = self.make_newsearch_prompt(user_prompt, unvisited)

View File

@ -76,10 +76,10 @@ class PlannerAgent(Agent):
agents_tasks = self.parse_agent_tasks(json_plan)
if agents_tasks == (None, None):
return
pretty_print(f"--- Plan ---", color="output")
pretty_print("▂▘ P L A N ▝▂", color="output")
for task_name, task in agents_tasks:
pretty_print(f"{task}", color="output")
pretty_print(f"--- End of Plan ---", color="output")
pretty_print(f"{task['agent']} -> {task['task']}", color="info")
pretty_print("▔▗ E N D ▖▔", color="output")
def process(self, prompt, speech_module) -> str:
ok = False

View File

@ -6,10 +6,9 @@ from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, WebDriverException
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.chrome.options import Options
from typing import List, Tuple, Type, Dict, Tuple
from bs4 import BeautifulSoup
from urllib.parse import urlparse
from typing import List, Tuple
from fake_useragent import UserAgent
from selenium_stealth import stealth
import undetected_chromedriver as uc
@ -126,13 +125,26 @@ class Browser:
try:
initial_handles = self.driver.window_handles
self.driver.get(url)
time.sleep(1)
wait = WebDriverWait(self.driver, timeout=30)
wait.until(
lambda driver: (
driver.execute_script("return document.readyState") == "complete" and
not any(keyword in driver.page_source.lower() for keyword in ["checking your browser", "verifying", "captcha"])
),
message="stuck on 'checking browser' or verification screen"
)
self.apply_web_safety()
self.logger.info(f"Navigated to: {url}")
return True
except TimeoutException as e:
self.logger.error(f"Timeout waiting for {url} to load: {str(e)}")
return False
except WebDriverException as e:
self.logger.error(f"Error navigating to {url}: {str(e)}")
return False
except Exception as e:
self.logger.error(f"Fatal error with go_to method on {url}:\n{str(e)}")
raise e
def is_sentence(self, text:str) -> bool:
"""Check if the text qualifies as a meaningful sentence or contains important error codes."""
@ -199,7 +211,7 @@ class Browser:
return False
return True
def get_navigable(self) -> [str]:
def get_navigable(self) -> List[str]:
"""Get all navigable links on the current page."""
try:
links = []
@ -301,13 +313,45 @@ class Browser:
result.sort(key=lambda x: len(x[0]))
return result
def find_and_click_submit(self, btn_type:str = 'login') -> None:
def find_and_click_submission(self, timeout: int = 10) -> bool:
possible_submissions = ["login", "submit", "register"]
for submission in possible_submissions:
if self.find_and_click_btn(submission, timeout):
return True
self.logger.warning("No submission button found")
return False
def find_and_click_btn(self, btn_type: str = 'login', timeout: int = 10) -> bool:
"""
Find and click a submit button matching the specified type.
Args:
btn_type: The type of button to find.
timeout: time to wait for button to appear.
Returns:
bool: True if the button was found and clicked, False otherwise.
"""
buttons = self.get_buttons_xpath()
if len(buttons) == 0:
self.logger.warning(f"No visible buttons found")
for button in buttons:
if button[0] == btn_type:
self.click_element(button[1])
if not buttons:
self.logger.warning("No visible buttons found")
return False
for button_text, xpath in buttons:
if btn_type.lower() in button_text.lower():
try:
wait = WebDriverWait(self.driver, timeout)
element = wait.until(
EC.element_to_be_clickable((By.XPATH, xpath)),
message=f"Button with XPath '{xpath}' not clickable within {timeout} seconds"
)
if self.click_element(xpath):
return True
else:
return False
except TimeoutException:
self.logger.warning(f"Timeout waiting for '{button_text}' button at XPath: {xpath}")
return False
self.logger.warning(f"No button matching '{btn_type}' found")
return False
def find_input_xpath_by_name(self, inputs, name: str) -> str | None:
for field in inputs:
@ -315,8 +359,11 @@ class Browser:
return field["xpath"]
return None
def fill_form_inputs(self, input_list:[str]) -> bool:
def fill_form_inputs(self, input_list: List[str]) -> bool:
"""Fill form inputs based on a list of [name](value) strings."""
if not isinstance(input_list, list):
self.logger.error("input_list must be a list")
return False
inputs = self.find_all_inputs()
try:
for input_str in input_list:
@ -389,7 +436,7 @@ if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
driver = create_driver()
browser = Browser(driver)
browser = Browser(driver, anticaptcha_manual_install=True)
time.sleep(10)
print("AntiCaptcha Test")
@ -400,4 +447,5 @@ if __name__ == "__main__":
inputs = browser.get_form_inputs()
inputs = ['[username](student)', f'[password](Password123)', '[appOtp]()', '[backupOtp]()']
browser.fill_form_inputs(inputs)
browser.find_and_click_submit()
browser.find_and_click_submission()
time.sleep(30)

View File

@ -1,3 +1,4 @@
from typing import List, Tuple, Type, Dict, Tuple
from sources.text_to_speech import Speech
from sources.utility import pretty_print, animate_thinking
@ -11,7 +12,8 @@ class Interaction:
def __init__(self, agents,
tts_enabled: bool = True,
stt_enabled: bool = True,
recover_last_session: bool = False):
recover_last_session: bool = False,
):
self.is_active = True
self.current_agent = None
self.last_query = None
@ -99,7 +101,7 @@ class Interaction:
query = self.read_stdin()
if query is None:
self.is_active = False
self.last_query = "Goodbye (exit requested by user, dont think, make answer very short)"
self.last_query = None
return None
self.last_query = query
return query
@ -112,7 +114,6 @@ class Interaction:
if agent is None:
return False
if self.current_agent != agent and self.last_answer is not None:
## get last history from previous agent
self.current_agent.memory.push('user', self.last_query)
self.current_agent.memory.push('assistant', self.last_answer)
self.current_agent = agent

View File

@ -1,3 +1,4 @@
from typing import List, Tuple, Type, Dict, Tuple
import langid
import re
import nltk

View File

@ -1,17 +1,17 @@
import os
import time
import ollama
from ollama import chat
import requests
import subprocess
import ipaddress
import httpx
import platform
from dotenv import load_dotenv, set_key
from openai import OpenAI
from huggingface_hub import InferenceClient
import os
import httpx
from typing import List, Tuple, Type, Dict
from sources.utility import pretty_print, animate_thinking
class Provider:

View File

@ -1,11 +1,12 @@
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import time
import datetime
import uuid
import os
import sys
import json
from typing import List, Tuple, Type, Dict, Tuple
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

View File

@ -1,6 +1,8 @@
import os
import sys
import torch
from typing import List, Tuple, Type, Dict, Tuple
from transformers import pipeline
from adaptive_classifier import AdaptiveClassifier

View File

@ -1,4 +1,5 @@
from colorama import Fore
from typing import List, Tuple, Type, Dict
import queue
import threading
import numpy as np

View File

@ -1,30 +1,33 @@
import re
import platform
import subprocess
from sys import modules
from typing import List, Tuple, Type, Dict, Tuple
from kokoro import KPipeline
from IPython.display import display, Audio
import soundfile as sf
import subprocess
import re
import platform
from sys import modules
class Speech():
"""
Speech is a class for generating speech from text.
"""
def __init__(self, enable: bool = True, language: str = "english") -> None:
def __init__(self, enable: bool = True, language: str = "en", voice_idx: int = 0) -> None:
self.lang_map = {
"english": 'a',
"chinese": 'z',
"french": 'f'
"en": 'a',
"zh": 'z',
"fr": 'f'
}
self.voice_map = {
"english": ['af_alloy', 'af_bella', 'af_kore', 'af_nicole', 'af_nova', 'af_sky', 'am_echo', 'am_michael', 'am_puck'],
"chinese": ['zf_xiaobei', 'zf_xiaoni', 'zf_xiaoxiao', 'zf_xiaoyi', 'zm_yunjian', 'zm_yunxi', 'zm_yunxia', 'zm_yunyang'],
"french": ['ff_siwis']
"en": ['af_kore', 'af_bella', 'af_alloy', 'af_nicole', 'af_nova', 'af_sky', 'am_echo', 'am_michael', 'am_puck'],
"zh": ['zf_xiaobei', 'zf_xiaoni', 'zf_xiaoxiao', 'zf_xiaoyi', 'zm_yunjian', 'zm_yunxi', 'zm_yunxia', 'zm_yunyang'],
"fr": ['ff_siwis']
}
self.pipeline = None
self.language = language
if enable:
self.pipeline = KPipeline(lang_code=self.lang_map[language])
self.voice = self.voice_map[language][2]
self.voice = self.voice_map[language][voice_idx]
self.speed = 1.2
def speak(self, sentence: str, voice_number: int = 1 , audio_file: str = 'sample.wav'):
@ -38,7 +41,7 @@ class Speech():
if not self.pipeline:
return
sentence = self.clean_sentence(sentence)
self.voice = self.voice_map["english"][voice_number]
self.voice = self.voice_map[self.language][voice_number]
generator = self.pipeline(
sentence, voice=self.voice,
speed=self.speed, split_pattern=r'\n+'

View File

@ -51,7 +51,7 @@ class CInterpreter(Tools):
run_command,
capture_output=True,
text=True,
timeout=10
timeout=120
)
if run_result.returncode != 0:

View File

@ -32,7 +32,7 @@ def get_color_map():
}
return color_map
def pretty_print(text, color="info"):
def pretty_print(text, color="info", no_newline=False):
"""
Print text with color formatting.
@ -56,7 +56,7 @@ def pretty_print(text, color="info"):
color_map = get_color_map()
if color not in color_map:
color = "info"
print(colored(text, color_map[color]))
print(colored(text, color_map[color]), end='' if no_newline else "\n")
def animate_thinking(text, color="status", duration=120):
"""