Merge pull request #19 from Fosowl/dev

Merge dev, browsing abilities with selenium & installation scripts
This commit is contained in:
Martin 2025-03-13 17:22:48 +01:00 committed by GitHub
commit 81e9ab9eb0
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
19 changed files with 447 additions and 21 deletions

1
.gitignore vendored
View File

@ -3,6 +3,7 @@ config.ini
*.egg-info *.egg-info
experimental/ experimental/
conversations/ conversations/
agentic_env/*
.env .env
*/.env */.env

View File

@ -31,7 +31,14 @@
- **Memory**: Remembers whats useful, your preferences and past sessions conversation. - **Memory**: Remembers whats useful, your preferences and past sessions conversation.
- **Web Browsing**: Autonomous web navigation is underway. (See it on browser branch) - **Web Browsing**: Autonomous web navigation is underway.
### Searching the web with agenticSeek :
![alt text](./media/exemples/search_politics.png)
*See media/exemples for other use case screenshots.*
--- ---
@ -48,15 +55,27 @@ cd agenticSeek
```sh ```sh
python3 -m venv agentic_seek_env python3 -m venv agentic_seek_env
source agentic_seek_env/bin/activate # On Windows: agentic_seek_env\Scripts\activate source agentic_seek_env/bin/activate
# On Windows: agentic_seek_env\Scripts\activate
``` ```
### 3⃣ **Install Dependencies** ### 3⃣ **Install package**
**Automatic Installation:**
```sh
./install.sh
```
**Manually:**
```sh ```sh
pip3 install -r requirements.txt pip3 install -r requirements.txt
# or
python3 setup.py install
``` ```
## Run locally on your machine ## Run locally on your machine
**We recommend using at least Deepseek 14B, smaller models struggle with tool use and forget quickly the context.** **We recommend using at least Deepseek 14B, smaller models struggle with tool use and forget quickly the context.**
@ -80,6 +99,8 @@ ollama serve
Change the config.ini file to set the provider_name to `ollama` and provider_model to `deepseek-r1:7b` Change the config.ini file to set the provider_name to `ollama` and provider_model to `deepseek-r1:7b`
NOTE: `deepseek-r1:7b`is an exemple, use a bigger model if your hardware allow it.
```sh ```sh
[MAIN] [MAIN]
is_local = True is_local = True

47
install.sh Executable file
View File

@ -0,0 +1,47 @@
#!/bin/bash
SCRIPTS_DIR="scripts"
echo "Detecting operating system..."
OS_TYPE=$(uname -s)
case "$OS_TYPE" in
"Linux"*)
echo "Detected Linux OS"
if [ -f "$SCRIPTS_DIR/linux_install.sh" ]; then
echo "Running Linux installation script..."
bash "$SCRIPTS_DIR/linux_install.sh"
else
echo "Error: $SCRIPTS_DIR/linux_install.sh not found!"
exit 1
fi
;;
"Darwin"*)
echo "Detected macOS"
if [ -f "$SCRIPTS_DIR/macos_install.sh" ]; then
echo "Running macOS installation script..."
bash "$SCRIPTS_DIR/macos_install.sh"
else
echo "Error: $SCRIPTS_DIR/macos_install.sh not found!"
exit 1
fi
;;
"MINGW"* | "MSYS"* | "CYGWIN"*)
echo "Detected Windows (via Bash-like environment)"
if [ -f "$SCRIPTS_DIR/windows_install.sh" ]; then
echo "Running Windows installation script..."
bash "$SCRIPTS_DIR/windows_install.sh"
else
echo "Error: $SCRIPTS_DIR/windows_install.sh not found!"
exit 1
fi
;;
*)
echo "Unsupported OS detected: $OS_TYPE"
echo "This script supports Linux, macOS, and Windows (via Bash-compatible environments)."
exit 1
;;
esac
echo "Installation process finished!"

View File

@ -7,7 +7,7 @@ import configparser
from sources.llm_provider import Provider from sources.llm_provider import Provider
from sources.interaction import Interaction from sources.interaction import Interaction
from sources.agents import Agent, CoderAgent, CasualAgent, FileAgent, PlannerAgent from sources.agents import Agent, CoderAgent, CasualAgent, FileAgent, PlannerAgent, BrowserAgent
import warnings import warnings
warnings.filterwarnings("ignore") warnings.filterwarnings("ignore")
@ -44,6 +44,10 @@ def main():
PlannerAgent(model=config["MAIN"]["provider_model"], PlannerAgent(model=config["MAIN"]["provider_model"],
name="Planner", name="Planner",
prompt_path="prompts/planner_agent.txt", prompt_path="prompts/planner_agent.txt",
provider=provider),
BrowserAgent(model=config["MAIN"]["provider_model"],
name="Browser",
prompt_path="prompts/browser_agent.txt",
provider=provider) provider=provider)
] ]

Binary file not shown.

Before

Width:  |  Height:  |  Size: 286 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 797 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 898 KiB

21
prompts/browser_agent.txt Normal file
View File

@ -0,0 +1,21 @@
You are an internet ai that can browse the web for information.
In fact you are embedded in a browser with selenium.
If you need to conduct a web search, you can use the following tool:
- web_search: to search the web for information
This is how you can use the web_search tool:
```web_search
<query>
```
This will provide you with a list of links that you can navigate to.
You can navigate to a specific link by typing the link. For example, If you say:
"I want to navigate to https://www.google.com"
You will navigate to https://www.google.com
Any link that you type will be opened in a new tab.
If you want to exit the browser, you can say:
"REQUEST_EXIT"
Only exit the browser if you are done browsing.

View File

@ -13,10 +13,18 @@ flask==3.1.0
soundfile==0.13.1 soundfile==0.13.1
protobuf==3.20.3 protobuf==3.20.3
termcolor==2.5.0 termcolor==2.5.0
ipython==9.0.2 ipython==8.34.0
gliclass==0.1.8 gliclass==0.1.8
pyaudio==0.2.14 pyaudio==0.2.14
librosa==0.10.2.post1 librosa==0.10.2.post1
selenium==4.29.0
markdownify==1.1.0
httpx>=0.27,<0.29
anyio>=3.5.0,<5
distro>=1.7.0,<2
jiter>=0.4.0,<1
sniffio
tqdm>4
# if use chinese # if use chinese
ordered_set ordered_set
pypinyin pypinyin

17
scripts/linux_install.sh Normal file
View File

@ -0,0 +1,17 @@
#!/bin/bash
echo "Starting installation for Linux..."
# Update package list
sudo apt-get update
# Install Python dependencies from requirements.txt
pip3 install -r requirements.txt
# Install Selenium for chromedriver
pip3 install selenium
# Install portaudio for pyAudio
sudo apt-get install -y portaudio19-dev python3-dev
echo "Installation complete for Linux!"

17
scripts/macos_install.sh Normal file
View File

@ -0,0 +1,17 @@
#!/bin/bash
echo "Starting installation for macOS..."
# Install Python dependencies from requirements.txt
pip3 install -r requirements.txt
# Install chromedriver using Homebrew
brew install --cask chromedriver
# Install portaudio for pyAudio using Homebrew
brew install portaudio
# Install Selenium
pip3 install selenium
echo "Installation complete for macOS!"

View File

@ -0,0 +1,16 @@
#!/bin/bash
echo "Starting installation for Windows..."
# Install Python dependencies from requirements.txt
pip3 install -r requirements.txt
# Install Selenium
pip3 install selenium
echo "Note: pyAudio installation may require additional steps on Windows."
echo "Please install portaudio manually (e.g., via vcpkg or prebuilt binaries) and then run: pip3 install pyaudio"
echo "Also, download and install chromedriver manually from: https://sites.google.com/chromium.org/driver/getting-started"
echo "Place chromedriver in a directory included in your PATH."
echo "Installation partially complete for Windows. Follow manual steps above."

View File

@ -30,9 +30,16 @@ setup(
"protobuf==3.20.3", "protobuf==3.20.3",
"termcolor==2.5.0", "termcolor==2.5.0",
"gliclass==0.1.8", "gliclass==0.1.8",
"ipython==7.16.1", "ipython==8.34.0",
"pyaudio-0.2.14", "librosa==0.10.2.post1",
"librosa==0.10.2.post1" "selenium==4.29.0",
"markdownify==1.1.0",
"httpx>=0.27,<0.29"
"anyio>=3.5.0,<5"
"distro>=1.7.0,<2"
"jiter>=0.4.0,<1"
"sniffio"
"tqdm>4"
], ],
extras_require={ extras_require={
"chinese": [ "chinese": [

View File

@ -4,5 +4,6 @@ from .code_agent import CoderAgent
from .casual_agent import CasualAgent from .casual_agent import CasualAgent
from .file_agent import FileAgent from .file_agent import FileAgent
from .planner_agent import PlannerAgent from .planner_agent import PlannerAgent
from .browser_agent import BrowserAgent
__all__ = ["Agent", "CoderAgent", "CasualAgent", "FileAgent", "PlannerAgent"] __all__ = ["Agent", "CoderAgent", "CasualAgent", "FileAgent", "PlannerAgent", "BrowserAgent"]

View File

@ -0,0 +1,102 @@
import re
import time
from sources.utility import pretty_print, animate_thinking
from sources.agents.agent import Agent
from sources.tools.webSearch import webSearch
from sources.browser import Browser
class BrowserAgent(Agent):
def __init__(self, model, name, prompt_path, provider):
"""
The casual agent is a special for casual talk to the user without specific tasks.
"""
super().__init__(model, name, prompt_path, provider)
self.tools = {
"web_search": webSearch(),
}
self.role = "deep research and web search"
self.browser = Browser()
self.browser.goTo("https://github.com/")
self.search_history = []
def make_init_prompt(self, user_prompt: str, search_result: str):
return f"""
Based on the search result:
{search_result}
Start browsing and find the information the user want.
User: {user_prompt}
You must choose a link to navigate to. Say i want to navigate to a <link>.
"""
def extract_links(self, search_result: str):
return re.findall(r'https?://[^\s]+', search_result)
def make_navigation_prompt(self, user_prompt: str, page_text: str, navigable_links: list):
remaining_links = "\n".join([f"[{i}] {link}" for i, link in enumerate(navigable_links) if link not in self.search_history])
return f"""
\nYou are browsing the web. Not the user, you are the browser.
Page content:
{page_text}
Navigable links:
{remaining_links}
You must choose a link to navigate to or do a new search.
Remember, you seek the information the user want.
The user query was : {user_prompt}
If you want to do a new search, use the "web_search" tool.
Exemple:
```web_search
weather in tokyo
```
If you have an answer and want to exit the browser, please say "REQUEST_EXIT".
If you don't choose a link or do a new search I will cut my fucking arm off.
"""
def clean_links(self, links: list):
links_clean = []
for link in links:
if link[-1] == '.':
links_clean.append(link[:-1])
else:
links_clean.append(link)
return links_clean
def process(self, prompt, speech_module) -> str:
complete = False
animate_thinking(f"Searching...", color="status")
search_result = self.tools["web_search"].execute([prompt], False)
user_prompt = self.make_init_prompt(prompt, search_result)
prompt = user_prompt
while not complete:
animate_thinking("Thinking...", color="status")
self.memory.push('user', user_prompt)
answer, reasoning = self.llm_request(prompt)
pretty_print("-"*100)
pretty_print(answer, color="output")
pretty_print("-"*100)
if "REQUEST_EXIT" in answer:
complete = True
break
links = self.extract_links(answer)
links_clean = self.clean_links(links)
if len(links_clean) == 0:
prompt = f"Please choose a link to navigate to or do a new search. Links found:\n{links_clean}"
pretty_print("No links found, doing a new search.", color="warning")
continue
animate_thinking(f"Navigating to {links[0]}", color="status")
speech_module.speak(f"Navigating to {links[0]}")
self.browser.goTo(links[0])
self.search_history.append(links[0])
page_text = self.browser.getText()[:2048]
navigable_links = self.browser.getNavigable()[:15]
prompt = self.make_navigation_prompt(user_prompt, page_text, navigable_links)
self.browser.close()
return answer, reasoning
if __name__ == "__main__":
browser = Browser()

170
sources/browser.py Normal file
View File

@ -0,0 +1,170 @@
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, WebDriverException
import time
from bs4 import BeautifulSoup
import markdownify
import logging
import sys
class Browser:
def __init__(self, headless=False, anticaptcha_install=False):
"""Initialize the browser with optional headless mode."""
self.headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.9',
'Referer': 'https://www.google.com/',
}
self.anticaptcha = "https://chrome.google.com/webstore/detail/nopecha-captcha-solver/dknlfmjaanfblgfdfebhijalfmhmjjjo/related"
try:
chrome_options = Options()
if headless:
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
self.driver = webdriver.Chrome(options=chrome_options)
self.wait = WebDriverWait(self.driver, 10)
self.logger = logging.getLogger(__name__)
self.logger.info("Browser initialized successfully")
except Exception as e:
raise Exception(f"Failed to initialize browser: {str(e)}")
def goTo(self, url):
"""Navigate to a specified URL."""
try:
self.driver.get(url)
time.sleep(2) # Wait for page to load
self.logger.info(f"Navigated to: {url}")
return True
except WebDriverException as e:
self.logger.error(f"Error navigating to {url}: {str(e)}")
return False
def is_sentence(self, text):
"""Check if the text is a sentence."""
if "404" in text:
return True # we want the ai to see the error
return len(text.split(" ")) > 5 and '.' in text
def getText(self):
"""Get page text and convert it to README (Markdown) format."""
try:
soup = BeautifulSoup(self.driver.page_source, 'html.parser')
for element in soup(['script', 'style']):
element.decompose()
text = soup.get_text()
lines = (line.strip() for line in text.splitlines())
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
text = "\n".join(chunk for chunk in chunks if chunk and self.is_sentence(chunk))
markdown_text = markdownify.markdownify(text, heading_style="ATX")
return markdown_text
except Exception as e:
self.logger.error(f"Error getting text: {str(e)}")
return None
def getNavigable(self):
"""Get all navigable links on the current page."""
try:
links = []
elements = self.driver.find_elements(By.TAG_NAME, "a")
for element in elements:
href = element.get_attribute("href")
if href and href.startswith(("http", "https")):
links.append({
"url": href,
"text": element.text.strip(),
"is_displayed": element.is_displayed()
})
self.logger.info(f"Found {len(links)} navigable links")
return links
except Exception as e:
self.logger.error(f"Error getting navigable links: {str(e)}")
return []
def clickElement(self, xpath):
"""Click an element specified by xpath."""
try:
element = self.wait.until(
EC.element_to_be_clickable((By.XPATH, xpath))
)
element.click()
time.sleep(2) # Wait for action to complete
return True
except TimeoutException:
self.logger.error(f"Element not found or not clickable: {xpath}")
return False
def getCurrentUrl(self):
"""Get the current URL of the page."""
return self.driver.current_url
def getPageTitle(self):
"""Get the title of the current page."""
return self.driver.title
def scrollToBottom(self):
"""Scroll to the bottom of the page."""
try:
self.driver.execute_script(
"window.scrollTo(0, document.body.scrollHeight);"
)
time.sleep(1) # Wait for scroll to complete
return True
except Exception as e:
self.logger.error(f"Error scrolling: {str(e)}")
return False
def takeScreenshot(self, filename):
"""Take a screenshot of the current page."""
try:
self.driver.save_screenshot(filename)
self.logger.info(f"Screenshot saved as {filename}")
return True
except Exception as e:
self.logger.error(f"Error taking screenshot: {str(e)}")
return False
def close(self):
"""Close the browser."""
try:
self.driver.quit()
self.logger.info("Browser closed")
except Exception as e:
raise e
def __del__(self):
"""Destructor to ensure browser is closed."""
self.close()
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
browser = Browser(headless=False)
try:
browser.goTo("https://karpathy.github.io/")
text = browser.getText()
print("Page Text in Markdown:")
print(text)
links = browser.getNavigable()
print("\nNavigable Links:")
for link in links[:50]:
print(f"Text: {link['text']}, URL: {link['url']}")
browser.takeScreenshot("example.png")
finally:
browser.close()

View File

@ -100,7 +100,7 @@ class Interaction:
return return
if self.current_agent != agent: if self.current_agent != agent:
self.current_agent = agent self.current_agent = agent
# get history from previous agent # get history from previous agent, good ?
self.current_agent.memory.push('user', self.last_query) self.current_agent.memory.push('user', self.last_query)
self.last_answer, _ = agent.process(self.last_query, self.speech) self.last_answer, _ = agent.process(self.last_query, self.speech)

View File

@ -1,5 +1,4 @@
from colorama import Fore from colorama import Fore
import pyaudio
import queue import queue
import threading import threading
import numpy as np import numpy as np
@ -15,7 +14,8 @@ class AudioRecorder:
""" """
AudioRecorder is a class that records audio from the microphone and adds it to the audio queue. AudioRecorder is a class that records audio from the microphone and adds it to the audio queue.
""" """
def __init__(self, format: int = pyaudio.paInt16, channels: int = 1, rate: int = 4096, chunk: int = 8192, record_seconds: int = 5, verbose: bool = False): def __init__(self, format: int, channels: int = 1, rate: int = 4096, chunk: int = 8192, record_seconds: int = 5, verbose: bool = False):
import pyaudio
self.format = format self.format = format
self.channels = channels self.channels = channels
self.rate = rate self.rate = rate

View File

@ -11,14 +11,8 @@ For example:
```python ```python
print("Hello world") print("Hello world")
``` ```
This is then executed by the tool with its own class implementation of execute(). This is then executed by the tool with its own class implementation of execute().
A tool is not just for code tool but also API, internet, etc.. A tool is not just for code tool but also API, internet, etc..
For example a flight API tool could be used like so:
```flight_search
HU787
```
""" """
import sys import sys
@ -77,11 +71,11 @@ class Tools():
return dir_path return dir_path
@abstractmethod @abstractmethod
def execute(self, blocks:str, safety:bool) -> str: def execute(self, blocks:[str], safety:bool) -> str:
""" """
Abstract method that must be implemented by child classes to execute the tool's functionality. Abstract method that must be implemented by child classes to execute the tool's functionality.
Args: Args:
blocks (str): The code or query blocks to execute blocks (List[str]): The codes or queries blocks to execute
safety (bool): Whenever human intervention is required safety (bool): Whenever human intervention is required
Returns: Returns:
str: The output/result from executing the tool str: The output/result from executing the tool