mirror of
https://github.com/maglore9900/max_headroom.git
synced 2025-06-02 17:40:30 +00:00
.
This commit is contained in:
parent
cc38d7426d
commit
a1113d701a
64
README.md
64
README.md
@ -11,32 +11,50 @@ written to work on Windows. Agent and logic will run on linux but some tools are
|
||||
it currently will respond as an LLM like usual, but also has the following capabilities:
|
||||
|
||||
- custom prompt options
|
||||
- can also control spotify
|
||||
- can open applications on windows
|
||||
- can change the focused window
|
||||
- control spotify
|
||||
- open applications (windows only)
|
||||
- change the focused (window only)
|
||||
- set timer
|
||||
- coming soon: journalling in markdown, with a save target for obsidian
|
||||
|
||||
this is a fun work in progress. if you want to use it and or develop for it be my guest. would love to have more tools designed.
|
||||
|
||||
Note:
|
||||
|
||||
1) this will work with openai or ollama models. you will need to set up the .env for that as well as spotify
|
||||
2) this is designed to pull a custom voice from the [alltalk project https://github.com/erew123/alltalk_tts, that is how I am cloning max headroom's voice. You can alter or simply not use this, it will currently fallback to pyttsx3 aka a robot voice.
|
||||
3) speech-to-text can use google, or faster-whisper. faster-whisper is currently the default and optimal method.
|
||||
> Note:
|
||||
>
|
||||
> 1. this will work with openai or ollama models. you will need to set up the .env for that as well as spotify
|
||||
> 2. this is designed to pull a custom voice from the [alltalk project](https://github.com/erew123/alltalk_tts), that is how I am cloning max headroom's voice. You can alter or simply not use this, it will currently fallback to pyttsx3 aka a robot voice
|
||||
> 3. speech-to-text can use google, or faster-whisper. faster-whisper is currently the default and optimal method.
|
||||
|
||||
# INSTALLATION
|
||||
|
||||
so basically the steps are pretty simple
|
||||
|
||||
- download the code (clone it or download it and unzip it)
|
||||
- install python 3.10 on the system
|
||||
- create a virtual environment using `python -m venv .` in the folder/dir of the code
|
||||
- activate the environment with `Scripts\activate.bat` on windows or `source bin/activate` on linux
|
||||
- run pip install to install all the required modules `pip install -r requirements_windows.txt`
|
||||
- then copy example_env.txt to `.env`
|
||||
- open that, and put in your info, like openai key or ollama or whatever
|
||||
- then run `python main.py` to start the whole thing up
|
||||
1. download the code (clone it or download it and unzip it)
|
||||
2. install python 3.10 on the system
|
||||
3. create a virtual environment using `python -m venv .` in the folder/dir of the code
|
||||
4. activate the environment with `Scripts\activate.bat` on windows or `source bin/activate` on linux
|
||||
5. run pip install to install all the required modules `pip install -r requirements_windows.txt`
|
||||
6. then `cp example_env.txt to .env`
|
||||
7. open that, and put in your info, like openai key or ollama or whatever
|
||||
8. If you are using an Nvidia GPU and dont already have the CUDA toolkit and such, see note below
|
||||
9. then run `python main.py` to start the whole thing up
|
||||
|
||||
> Note: If you are using faster-whisper and have an nvidia GPU you will need to download the cuda tool kit and cudann to leverage your GPU.
|
||||
>
|
||||
> If this seems too complicated you can change Max to use google for speech-to-text instead in the .env
|
||||
>
|
||||
> 1. download the cuda toolkit: `https://developer.nvidia.com/cuda-downloads`
|
||||
> 2. download cudann: `https://developer.nvidia.com/cudnn-downloads`
|
||||
> 3. unzip cudann and copy all of the .dll files
|
||||
> 4. paste the .dll files in the toolkit\cuda\bin folder (for example: `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin`)
|
||||
> 5. now we need to add those .dll's to your PATH, to do this hit the windows key and type "enviro",
|
||||
> 6. select "edit the system environment variables"
|
||||
> 7. select button on the bottom right "Environment Variables"
|
||||
> 8. in the lower window "System variables" find and select "Path"
|
||||
> 9. select "Edit"
|
||||
> 10. select "Browse"
|
||||
> 11. browse to the same location as step 4, where you just put the .dll files
|
||||
> 12. then select a ok a bunch of times and close out the menu
|
||||
|
||||
# TOOLS
|
||||
|
||||
@ -48,25 +66,25 @@ you can find information on getting that information here: https://developer.spo
|
||||
|
||||
max can take the following commands: play, pause, stop, next, previous, favorite
|
||||
|
||||
`hey max play spotify` for example
|
||||
> Example: `hey max play spotify`
|
||||
|
||||
***note: you can say really any words that are similiar, max will attempt to read your intent and use the right command**
|
||||
> Note: you can say really any words that are similiar, max will attempt to read your intent and use the right command
|
||||
|
||||
## Window Focus
|
||||
|
||||
this tool brings the focus of whatever app you name to the front, it will not open an app
|
||||
|
||||
`hey max show obisidian` for example
|
||||
> Example: `hey max show obisidian`
|
||||
|
||||
***note: only works on windows**
|
||||
> Note: only works on windows
|
||||
|
||||
## Open App
|
||||
|
||||
this tool will open an application. when you run max it will create an index of the apps installed on your system
|
||||
|
||||
`hey max open obsidian` for example
|
||||
> Example: `hey max open obsidian`
|
||||
|
||||
***note: only works on windows**
|
||||
> Note: only works on windows
|
||||
|
||||
## Timer
|
||||
|
||||
@ -74,7 +92,7 @@ this tool will set a timer with a popup. you tell max to set a time for X time,
|
||||
|
||||
the default timer will have a "clippy" popup, with potentially custom text
|
||||
|
||||
`hey max set timer 2 hours` for example
|
||||
> Example: `hey max set timer 2 hours`
|
||||
|
||||
# Customization
|
||||
|
||||
|
14
cuda.md
Normal file
14
cuda.md
Normal file
@ -0,0 +1,14 @@
|
||||
If this seems too complicated you can change Max to use google for speech-to-text instead in the .env
|
||||
|
||||
1. download the cuda toolkit: `https://developer.nvidia.com/cuda-downloads`
|
||||
2. download cudann: `https://developer.nvidia.com/cudnn-downloads`
|
||||
3. unzip cudann and copy all of the .dll files
|
||||
4. paste the .dll files in the toolkit\cuda\bin folder (for example: `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin`)
|
||||
5. now we need to add those .dll's to your PATH, to do this hit the windows key and type "enviro",
|
||||
6. select "edit the system environment variables"
|
||||
7. select button on the bottom right "Environment Variables"
|
||||
8. in the lower window "System variables" find and select "Path"
|
||||
9. select "Edit"
|
||||
10. select "Browse"
|
||||
11. browse to the same location as step 4, where you just put the .dll files
|
||||
12. then select a ok a bunch of times and close out the menu
|
BIN
cudann/cudnn_ops_infer64_8.dll
Normal file
BIN
cudann/cudnn_ops_infer64_8.dll
Normal file
Binary file not shown.
@ -1,6 +1,6 @@
|
||||
OPENAI_API_KEY = ''
|
||||
#LLM_TYPE will take openai, local
|
||||
#LLM_TYPE will take openai, local. Local will use Ollama
|
||||
LLM_TYPE = 'openai'
|
||||
OPENAI_API_KEY = ''
|
||||
OPENAI_MODEL = 'gpt-4o-mini'
|
||||
#OLLAMA_MODEL will take any model you can load in ollama
|
||||
OLLAMA_MODEL = 'gemma2'
|
||||
|
BIN
images/path.png
Normal file
BIN
images/path.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 2.2 KiB |
BIN
images/windows_env.png
Normal file
BIN
images/windows_env.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 29 KiB |
Loading…
x
Reference in New Issue
Block a user