mirror of
https://github.com/tcsenpai/multi1.git
synced 2025-06-06 19:15:23 +00:00
improve docs
This commit is contained in:
parent
99e09821f6
commit
395810d797
88
README.md
88
README.md
@ -1,11 +1,12 @@
|
|||||||
# g1: Early Prototype of Using Llama-3.1 70b on Groq to create o1-like reasoning chains
|
# g1: Early Prototype of Using Llama-3.1 70b on Groq to create o1-like reasoning chains
|
||||||
|
|
||||||
This is an early prototype of using prompting strategies to improve the LLM's reasoning capabilities through o1-like reasoning chains. This allows the LLM to "think" and solve logical problems that usually otherwise stump leading models. Unlike o1, all the reasoning tokens are shown.
|
This is an early prototype of using prompting strategies to improve the LLM's reasoning capabilities through o1-like reasoning chains. This allows the LLM to "think" and solve logical problems that usually otherwise stump leading models. Unlike o1, all the reasoning tokens are shown, and the app uses an open source model.
|
||||||
|
|
||||||
|
|
||||||
### Examples
|
### Examples
|
||||||
|
|
||||||
> [!IMPORTANT]
|
> [!IMPORTANT]
|
||||||
> g1 is not perfect, but seems to perform significantly better than LLMs out-of-the-box. From initial testing, g1 accurately solves logic problems 60-80% of the time that usually stump LLMs. See examples below.
|
> g1 is not perfect, but it can perform significantly better than LLMs out-of-the-box. From initial testing, g1 accurately solves simple logic problems 60-80% of the time that usually stump LLMs. See examples below.
|
||||||
|
|
||||||
|
|
||||||
##### How many Rs are in strawberry?
|
##### How many Rs are in strawberry?
|
||||||
@ -15,5 +16,88 @@ Result:
|
|||||||
|
|
||||||

|

|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
Prompt: Which is larger, .9 or .11?
|
||||||
|
|
||||||
|
Result:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
|
||||||
|
### Quickstart
|
||||||
|
|
||||||
|
~~~
|
||||||
|
python3 -m venv venv
|
||||||
|
~~~
|
||||||
|
|
||||||
|
~~~
|
||||||
|
source venv/bin/activate
|
||||||
|
~~~
|
||||||
|
|
||||||
|
~~~
|
||||||
|
pip3 install -r requirements.txt
|
||||||
|
~~~
|
||||||
|
|
||||||
|
~~~
|
||||||
|
export GROQ_API_KEY=gsk...
|
||||||
|
~~~
|
||||||
|
|
||||||
|
~~~
|
||||||
|
streamlit run app.py
|
||||||
|
~~~
|
||||||
|
|
||||||
|
|
||||||
### Prompting Strategy
|
### Prompting Strategy
|
||||||
|
|
||||||
|
The prompt is as follows:
|
||||||
|
|
||||||
|
```
|
||||||
|
You are an expert AI assistant that explains your reasoning step by step. For each step, provide a title that describes what you're doing in that step, along with the content. Decide if you need another step or if you're ready to give the final answer. Respond in JSON format with 'title', 'content', and 'next_action' (either 'continue' or 'final_answer') keys. USE AS MANY REASONING STEPS AS POSSIBLE. AT LEAST 3. BE AWARE OF YOUR LIMITATIONS AS AN LLM AND WHAT YOU CAN AND CANNOT DO. IN YOUR REASONING, INCLUDE EXPLORATION OF ALTERNATIVE ANSWERS. CONSIDER YOU MAY BE WRONG, AND IF YOU ARE WRONG IN YOUR REASONING, WHERE IT WOULD BE. FULLY TEST ALL OTHER POSSIBILITIES. YOU CAN BE WRONG. WHEN YOU SAY YOU ARE RE-EXAMINING, ACTUALLY RE-EXAMINE, AND USE ANOTHER APPROACH TO DO SO. DO NOT JUST SAY YOU ARE RE-EXAMINING. USE AT LEAST 3 METHODS TO DERIVE THE ANSWER. USE BEST PRACTICES.
|
||||||
|
|
||||||
|
Example of a valid JSON response:
|
||||||
|
json
|
||||||
|
{
|
||||||
|
"title": "Identifying Key Information",
|
||||||
|
"content": "To begin solving this problem, we need to carefully examine the given information and identify the crucial elements that will guide our solution process. This involves...",
|
||||||
|
"next_action": "continue"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Breakdown
|
||||||
|
|
||||||
|
First, a persona is added:
|
||||||
|
```
|
||||||
|
You are an expert AI assistant that explains your reasoning step by step.
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
Then, instructions to describe the expected step-by-step reasoning process while titling each reasoning step. This includes the ability for the LLM to decide if another reasoning step is needed or if the final answer can be provided.
|
||||||
|
```
|
||||||
|
For each step, provide a title that describes what you're doing in that step, along with the content. Decide if you need another step or if you're ready to give the final answer.
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
JSON formatting is introduced with an example provided later.
|
||||||
|
```
|
||||||
|
Respond in JSON format with 'title', 'content', and 'next_action' (either 'continue' or 'final_answer') keys.
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
In all-caps to improve prompt compliance by emphesizing the importance of the instruction, a set of tips and best practices are included.
|
||||||
|
|
||||||
|
1. Use as many reasoning steps as possible. At least 3. -> This ensures the LLM actually takes the time to think first, and results usually in about 5-10 steps.
|
||||||
|
2. Be aware of your limitations as an llm and what you can and cannot do. -> This helps the LLM remember to use techniques which produce better results, like breaking "strawberry" down into individual letters before counting.
|
||||||
|
3. Include exploration of alternative answers. Consider you may be wrong, and if you are wrong in your reasoning, where it would be. -> A large part of the gains seem to come from the LLM re-evaluating its initial response to ensure it logically aligns with the problem.
|
||||||
|
4. When you say you are re-examining, actually re-examine, and use another approach to do so. Do not just say you are re-examining. -> This encourages the prevention of the LLM just saying it re-examined a problem without actually trying a new approach.
|
||||||
|
5. Use at least 3 methods to derive the answer. -> This helps the LLM come to the right answer by trying multiple methods to derive it.
|
||||||
|
6. Use best practices. -> This is as simple as the "Do better" prompts which improve LLM code output. By telling the LLM to use best practices, or do better, it generally performs better!
|
||||||
|
|
||||||
|
```
|
||||||
|
USE AS MANY REASONING STEPS AS POSSIBLE. AT LEAST 3. BE AWARE OF YOUR LIMITATIONS AS AN LLM AND WHAT YOU CAN AND CANNOT DO. IN YOUR REASONING, INCLUDE EXPLORATION OF ALTERNATIVE ANSWERS. CONSIDER YOU MAY BE WRONG, AND IF YOU ARE WRONG IN YOUR REASONING, WHERE IT WOULD BE. FULLY TEST ALL OTHER POSSIBILITIES. YOU CAN BE WRONG. WHEN YOU SAY YOU ARE RE-EXAMINING, ACTUALLY RE-EXAMINE, AND USE ANOTHER APPROACH TO DO SO. DO NOT JUST SAY YOU ARE RE-EXAMINING. USE AT LEAST 3 METHODS TO DERIVE THE ANSWER. USE BEST PRACTICES.
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
### Credits
|
||||||
|
|
||||||
|
This app was developed by [Benjamin Klieger](https://x.com/benjaminklieger).
|
17
app.py
17
app.py
@ -4,11 +4,10 @@ import os
|
|||||||
import json
|
import json
|
||||||
import time
|
import time
|
||||||
|
|
||||||
# Set up Groq client
|
|
||||||
client = groq.Groq()
|
client = groq.Groq()
|
||||||
|
|
||||||
def make_api_call(messages, max_tokens, is_final_answer=False):
|
def make_api_call(messages, max_tokens, is_final_answer=False):
|
||||||
for attempt in range(3): # Try up to 3 times
|
for attempt in range(3):
|
||||||
try:
|
try:
|
||||||
response = client.chat.completions.create(
|
response = client.chat.completions.create(
|
||||||
model="llama-3.1-70b-versatile",
|
model="llama-3.1-70b-versatile",
|
||||||
@ -19,7 +18,7 @@ def make_api_call(messages, max_tokens, is_final_answer=False):
|
|||||||
)
|
)
|
||||||
return json.loads(response.choices[0].message.content)
|
return json.loads(response.choices[0].message.content)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
if attempt == 2: # If this was the last attempt
|
if attempt == 2:
|
||||||
if is_final_answer:
|
if is_final_answer:
|
||||||
return {"title": "Error", "content": f"Failed to generate final answer after 3 attempts. Error: {str(e)}"}
|
return {"title": "Error", "content": f"Failed to generate final answer after 3 attempts. Error: {str(e)}"}
|
||||||
else:
|
else:
|
||||||
@ -79,14 +78,14 @@ Example of a valid JSON response:
|
|||||||
yield steps, total_thinking_time
|
yield steps, total_thinking_time
|
||||||
|
|
||||||
def main():
|
def main():
|
||||||
st.set_page_config(page_title="Groq Dynamic Reasoning Chain", page_icon="🧠", layout="wide")
|
st.set_page_config(page_title="g1 prototype", page_icon="🧠", layout="wide")
|
||||||
|
|
||||||
st.title("Early Prototype of g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains")
|
st.title("Early Prototype of g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains")
|
||||||
|
|
||||||
st.markdown("""
|
st.markdown("""
|
||||||
This is an early prototype of using prompting to create o1-like reasoning chains to improve output accuracy. It is not perfect, it seems to be accurate on about 60-80% of runs on logic problems leading LLMs typically get right 0-20% of the time. It is powered by Groq so that the reasoning step is fast!
|
This is an early prototype of using prompting to create o1-like reasoning chains to improve output accuracy. It is not perfect, it seems to be accurate on about 60-80% of runs on logic problems leading LLMs typically get right 0-20% of the time. It is powered by Groq so that the reasoning step is fast!
|
||||||
|
|
||||||
Created by @benjaminklieger, open sourced here:
|
Open source [repository here](https://github.com/bklieger-groq)
|
||||||
""")
|
""")
|
||||||
|
|
||||||
# Text input for user query
|
# Text input for user query
|
||||||
@ -105,16 +104,14 @@ def main():
|
|||||||
for i, (title, content, thinking_time) in enumerate(steps):
|
for i, (title, content, thinking_time) in enumerate(steps):
|
||||||
if title.startswith("Final Answer"):
|
if title.startswith("Final Answer"):
|
||||||
st.markdown(f"### {title}")
|
st.markdown(f"### {title}")
|
||||||
st.markdown(content)
|
st.markdown(content.replace('\n', '<br>'), unsafe_allow_html=True)
|
||||||
else:
|
else:
|
||||||
with st.expander(title, expanded=True):
|
with st.expander(title, expanded=True):
|
||||||
st.markdown(content)
|
st.markdown(content.replace('\n', '<br>'), unsafe_allow_html=True)
|
||||||
|
|
||||||
# Only show total time when it's available (i.e., at the end)
|
# Only show total time when it's available at the end
|
||||||
if total_thinking_time is not None:
|
if total_thinking_time is not None:
|
||||||
time_container.markdown(f"**Total thinking time: {total_thinking_time:.2f} seconds**")
|
time_container.markdown(f"**Total thinking time: {total_thinking_time:.2f} seconds**")
|
||||||
|
|
||||||
time.sleep(0.1) # Add a small delay to make the step-by-step effect visible
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
main()
|
main()
|
@ -0,0 +1 @@
|
|||||||
|
GROQ_API_KEY=gsk...
|
BIN
examples/math.png
Normal file
BIN
examples/math.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 247 KiB |
Loading…
x
Reference in New Issue
Block a user