Merge pull request #66 from Fosowl/dev

Fix : AudioRecorder issue, Improve readme, more flexible requirements + python_requires to 3.9
This commit is contained in:
steveh8758 2025-03-22 15:47:18 +08:00 committed by GitHub
commit 489dac5488
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
7 changed files with 73 additions and 65 deletions

View File

@ -38,8 +38,7 @@
- **Memory**: Remembers whats useful, your preferences and past sessions conversation. - **Memory**: Remembers whats useful, your preferences and past sessions conversation.
- **Web Browsing**: Autonomous web navigation is underway. - **Web Browsing**: Autonomous web navigation.
### Searching the web with agenticSeek : ### Searching the web with agenticSeek :
@ -52,7 +51,7 @@
## **Installation** ## **Installation**
Make sure you have chrome driver and docker installed. Make sure you have chrome driver, docker and python3.10 (or newer) installed.
For issues related to chrome driver, see the **Chromedriver** section. For issues related to chrome driver, see the **Chromedriver** section.
@ -125,7 +124,7 @@ provider_server_address = 127.0.0.1:11434
start all services : start all services :
```sh ```sh
./start_services.sh sudo ./start_services.sh
``` ```
Run the assistant: Run the assistant:
@ -150,7 +149,7 @@ Warning: currently the system that choose the best AI agent routing system will
Make sure the services are up and running with `./start_services.sh` and run the agenticSeek with `python3 main.py` Make sure the services are up and running with `./start_services.sh` and run the agenticSeek with `python3 main.py`
```sh ```sh
./start_services.sh sudo ./start_services.sh
python3 main.py python3 main.py
``` ```
@ -247,7 +246,7 @@ provider_server_address = x.x.x.x:5000
Run the assistant: Run the assistant:
```sh ```sh
./start_services.sh sudo ./start_services.sh
python3 main.py python3 main.py
``` ```
@ -268,7 +267,7 @@ provider_server_address = 127.0.0.1:5000 # can be set to anything, not used
Run the assistant: Run the assistant:
```sh ```sh
./start_services.sh sudo ./start_services.sh
python3 main.py python3 main.py
``` ```
@ -278,22 +277,25 @@ python3 main.py
## Speech to Text ## Speech to Text
The speech to text is disabled by default, you can enable it by setting listen to true in the config.ini: The speech-to-text functionality is disabled by default. To enable it, set the listen option to True in the config.ini file:
``` ```
listen = True listen = True
``` ```
The speech to text will await for a AI name as a trigger keyword before it start listening, you can change the AI name by changing the agent_name in the config.ini: When enabled, the speech-to-text feature listens for a trigger keyword, which is the agent's name, before it begins processing your input. You can customize the agent's name by updating the `agent_name` value in the *config.ini* file:
``` ```
agent_name = Friday agent_name = Friday
``` ```
It will work better if you use a common english name like John or Emma. For optimal recognition, we recommend using a common English name like "John" or "Emma" as the agent name
After hearing it's name agenticSeek will listen until it hear one of the following keyword for confirmation: Once you see the transcript start to appear, say the agent's name aloud to wake it up (e.g., "Friday").
Speak your query clearly.
End your request with a confirmation phrase to signal the system to proceed. Examples of confirmation phrases include:
``` ```
"do it", "go ahead", "execute", "run", "start", "thanks", "would ya", "please", "okay?", "proceed", "continue", "go on", "do that", "go it", "do you understand?" "do it", "go ahead", "execute", "run", "start", "thanks", "would ya", "please", "okay?", "proceed", "continue", "go on", "do that", "go it", "do you understand?"
``` ```
@ -321,7 +323,7 @@ provider_server_address = 127.0.0.1:5000
``` ```
`is_local`: should be True for any locally running LLM, otherwise False. `is_local`: should be True for any locally running LLM, otherwise False.
`provider_name`: Select the provider to use by its name, see the provider list above. `provider_name`: Select the provider to use by it's name, see the provider list above.
`provider_model`: Set the model to use by the agent. `provider_model`: Set the model to use by the agent.
@ -351,6 +353,7 @@ And download the chromedriver version matching your OS.
![alt text](./media/chromedriver_readme.png) ![alt text](./media/chromedriver_readme.png)
## FAQ ## FAQ
**Q: What hardware do I need?** **Q: What hardware do I need?**
7B Model: GPU with 8GB VRAM. 7B Model: GPU with 8GB VRAM.
@ -365,10 +368,6 @@ Deepseek R1 excels at reasoning and tool use for its size. We think its a sol
Ensure Ollama is running (`ollama serve`), your `config.ini` matches your provider, and dependencies are installed. If none work feel free to raise an issue. Ensure Ollama is running (`ollama serve`), your `config.ini` matches your provider, and dependencies are installed. If none work feel free to raise an issue.
**Q: How to join the discord ?**
Ask in the Community section for an invite.
**Q: Can it really run 100% locally?** **Q: Can it really run 100% locally?**
Yes with Ollama or Server providers, all speech to text, LLM and text to speech model run locally. Non-local options (OpenAI or others API) are optional. Yes with Ollama or Server providers, all speech to text, LLM and text to speech model run locally. Non-local options (OpenAI or others API) are optional.

View File

@ -1,33 +1,35 @@
requests==2.31.0 requests>=2.31.0
openai==1.61.1 colorama>=0.4.6
colorama==0.4.6 python-dotenv>=1.0.0
python-dotenv==1.0.0 playsound>=1.3.0
playsound==1.3.0 soundfile>=0.13.1
soundfile==0.13.1 transformers>=4.46.3
transformers==4.48.3 torch>=2.4.1
torch==2.5.1 python-dotenv>=1.0.0
ollama==0.4.7 ollama>=0.4.7
scipy==1.15.1 scipy>=1.15.1
kokoro==0.7.12 kokoro>=0.7.12
flask==3.1.0 flask>=3.1.0
soundfile==0.13.1 soundfile>=0.13.1
protobuf==3.20.3 protobuf>=3.20.3
termcolor==2.5.0 termcolor>=2.5.0
ipython==8.34.0 ipython>=8.34.0
gliclass==0.1.8 gliclass>=0.1.8
pyaudio==0.2.14 pyaudio>=0.2.14
librosa==0.10.2.post1 librosa>=0.10.2.post1
selenium==4.29.0 selenium>=4.29.0
markdownify==1.1.0 markdownify>=1.1.0
text2emotion==0.0.5 text2emotion>=0.0.5
langid==1.1.6 langid>=1.1.6
chromedriver-autoinstaller==0.6.4 chromedriver-autoinstaller>=0.6.4
httpx>=0.27,<0.29 httpx>=0.27,<0.29
anyio>=3.5.0,<5 anyio>=3.5.0,<5
distro>=1.7.0,<2 distro>=1.7.0,<2
jiter>=0.4.0,<1 jiter>=0.4.0,<1
sniffio sniffio
tqdm>4 tqdm>4
# for api provider
openai
# if use chinese # if use chinese
ordered_set ordered_set
pypinyin pypinyin

View File

@ -5,6 +5,8 @@ echo "Starting installation for Linux..."
# Update package list # Update package list
sudo apt-get update sudo apt-get update
pip install --upgrade pip
# Install Python dependencies from requirements.txt # Install Python dependencies from requirements.txt
pip3 install -r requirements.txt pip3 install -r requirements.txt

View File

@ -1,3 +1,4 @@
version: '3'
services: services:
redis: redis:
container_name: redis container_name: redis

View File

@ -8,34 +8,35 @@ setup(
version="0.1.0", version="0.1.0",
author="Fosowl", author="Fosowl",
author_email="mlg.fcu@gmail.com", author_email="mlg.fcu@gmail.com",
description="A Python project for agentic search and processing", description="The open, local alternative to ManusAI",
long_description=long_description, long_description=long_description,
long_description_content_type="text/markdown", long_description_content_type="text/markdown",
url="https://github.com/Fosowl/agenticSeek", url="https://github.com/Fosowl/agenticSeek",
packages=find_packages(), packages=find_packages(),
include_package_data=True, include_package_data=True,
install_requires=[ install_requires=[
"requests==2.31.0", "requests>=2.31.0",
"openai==1.61.1", "openai",
"colorama==0.4.6", "colorama>=0.4.6",
"python-dotenv==1.0.0", "python-dotenv>=1.0.0",
"playsound==1.3.0", "playsound>=1.3.0",
"soundfile==0.13.1", "soundfile>=0.13.1",
"transformers==4.48.3", "transformers>=4.46.3",
"torch==2.5.1", "torch>=2.4.1",
"ollama==0.4.7", "ollama>=0.4.7",
"scipy==1.15.1", "scipy>=1.15.1",
"kokoro==0.7.12", "kokoro>=0.7.12",
"flask==3.1.0", "flask>=3.1.0",
"protobuf==3.20.3", "protobuf>=3.20.3",
"termcolor==2.5.0", "termcolor>=2.5.0",
"gliclass==0.1.8", "gliclass>=0.1.8",
"ipython==8.34.0", "ipython>=8.34.0",
"librosa==0.10.2.post1", "librosa>=0.10.2.post1",
"selenium==4.29.0", "selenium>=4.29.0",
"markdownify==1.1.0", "markdownify>=1.1.0",
"text2emotion==0.0.5", "text2emotion>=0.0.5",
"langid==1.1.6", "python-dotenv>=1.0.0",
"langid>=1.1.6",
"httpx>=0.27,<0.29", "httpx>=0.27,<0.29",
"anyio>=3.5.0,<5", "anyio>=3.5.0,<5",
"distro>=1.7.0,<2", "distro>=1.7.0,<2",
@ -61,5 +62,5 @@ setup(
"License :: OSI Approved :: GNU General Public License v3 (GPLv3)", "License :: OSI Approved :: GNU General Public License v3 (GPLv3)",
"Operating System :: OS Independent", "Operating System :: OS Independent",
], ],
python_requires=">=3.6", python_requires=">=3.9",
) )

View File

@ -33,7 +33,7 @@ class Provider:
if self.provider_name in self.unsafe_providers: if self.provider_name in self.unsafe_providers:
pretty_print("Warning: you are using an API provider. You data will be sent to the cloud.", color="warning") pretty_print("Warning: you are using an API provider. You data will be sent to the cloud.", color="warning")
self.api_key = self.get_api_key(self.provider_name) self.api_key = self.get_api_key(self.provider_name)
elif self.server != "": elif self.server != "ollama":
pretty_print(f"Provider: {provider_name} initialized at {self.server}", color="success") pretty_print(f"Provider: {provider_name} initialized at {self.server}", color="success")
self.check_address_format(self.server) self.check_address_format(self.server)
if not self.is_ip_online(self.server.split(':')[0]): if not self.is_ip_online(self.server.split(':')[0]):
@ -54,6 +54,7 @@ class Provider:
Validate if the address is valid IP. Validate if the address is valid IP.
""" """
try: try:
address = address.replace('http://', '')
ip, port = address.rsplit(":", 1) ip, port = address.rsplit(":", 1)
if all(c.lower() in ".:abcdef0123456789" for c in ip): if all(c.lower() in ".:abcdef0123456789" for c in ip):
ipaddress.ip_address(ip) ipaddress.ip_address(ip)
@ -143,6 +144,7 @@ class Provider:
if e.status_code == 404: if e.status_code == 404:
animate_thinking(f"Downloading {self.model}...") animate_thinking(f"Downloading {self.model}...")
ollama.pull(self.model) ollama.pull(self.model)
self.ollama_fn(history, verbose)
if "refused" in str(e).lower(): if "refused" in str(e).lower():
raise Exception("Ollama connection failed. is the server running ?") from e raise Exception("Ollama connection failed. is the server running ?") from e
raise e raise e

View File

@ -6,6 +6,7 @@ import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
import time import time
import librosa import librosa
import pyaudio
audio_queue = queue.Queue() audio_queue = queue.Queue()
done = False done = False
@ -14,7 +15,7 @@ class AudioRecorder:
""" """
AudioRecorder is a class that records audio from the microphone and adds it to the audio queue. AudioRecorder is a class that records audio from the microphone and adds it to the audio queue.
""" """
def __init__(self, format: int, channels: int = 1, rate: int = 4096, chunk: int = 8192, record_seconds: int = 5, verbose: bool = False): def __init__(self, format: int = pyaudio.paInt16, channels: int = 1, rate: int = 4096, chunk: int = 8192, record_seconds: int = 5, verbose: bool = False):
import pyaudio import pyaudio
self.format = format self.format = format
self.channels = channels self.channels = channels