mirror of
https://github.com/tcsenpai/multi1.git
synced 2025-06-06 19:15:23 +00:00
Merge pull request #1 from fengwang/main
Encourage self-hosted ollama to generate valid json response
This commit is contained in:
commit
3e00dfc89a
13
README.md
13
README.md
@ -21,7 +21,7 @@
|
|||||||
|
|
||||||
This is an early prototype of using prompting strategies to improve the LLM's reasoning capabilities through o1-like reasoning chains. This allows the LLM to "think" and solve logical problems that usually otherwise stump leading models. Unlike o1, all the reasoning tokens are shown, and the app uses an open source model.
|
This is an early prototype of using prompting strategies to improve the LLM's reasoning capabilities through o1-like reasoning chains. This allows the LLM to "think" and solve logical problems that usually otherwise stump leading models. Unlike o1, all the reasoning tokens are shown, and the app uses an open source model.
|
||||||
|
|
||||||
multi1 is experimental and being open sourced to help inspire the open source community to develop new strategies to produce o1-like reasoning. This experiment helps show the power of prompting reasoning in visualized steps, not a comparison to or full replication of o1, which uses different techniques. OpenAI's o1 is instead trained with large-scale reinforcement learning to reason using Chain of Thought, achieving state-of-the-art performance on complex PhD-level problems.
|
multi1 is experimental and being open sourced to help inspire the open source community to develop new strategies to produce o1-like reasoning. This experiment helps show the power of prompting reasoning in visualized steps, not a comparison to or full replication of o1, which uses different techniques. OpenAI's o1 is instead trained with large-scale reinforcement learning to reason using Chain of Thought, achieving state-of-the-art performance on complex PhD-level problems.
|
||||||
|
|
||||||
multi1 demonstrates the potential of prompting alone to overcome straightforward LLM logic issues like the Strawberry problem, allowing existing open source models to benefit from dynamic reasoning chains and an improved interface for exploring them.
|
multi1 demonstrates the potential of prompting alone to overcome straightforward LLM logic issues like the Strawberry problem, allowing existing open source models to benefit from dynamic reasoning chains and an improved interface for exploring them.
|
||||||
|
|
||||||
@ -57,6 +57,11 @@ Result:
|
|||||||

|

|
||||||
|
|
||||||
|
|
||||||
|
Prompt: In the context of Lie Group and Lie Algebra, let $R \in E$ be an irreducible root system. Show that then $E$ is an irreducible representation of the Weyl group $W$.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
|
||||||
### Quickstart
|
### Quickstart
|
||||||
|
|
||||||
To use the launcher, follow these instructions:
|
To use the launcher, follow these instructions:
|
||||||
@ -152,13 +157,13 @@ First, a persona is added:
|
|||||||
|
|
||||||
Then, instructions to describe the expected step-by-step reasoning process while titling each reasoning step. This includes the ability for the LLM to decide if another reasoning step is needed or if the final answer can be provided.
|
Then, instructions to describe the expected step-by-step reasoning process while titling each reasoning step. This includes the ability for the LLM to decide if another reasoning step is needed or if the final answer can be provided.
|
||||||
|
|
||||||
> For each step, provide a title that describes what you're doing in that step, along with the content. Decide if you need another step or if you're ready to give the final answer.
|
> For each step, provide a title that describes what you're doing in that step, along with the content. Decide if you need another step or if you're ready to give the final answer.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
JSON formatting is introduced with an example provided later.
|
JSON formatting is introduced with an example provided later.
|
||||||
|
|
||||||
> Respond in JSON format with 'title', 'content', and 'next_action' (either 'continue' or 'final_answer') keys.
|
> Respond in JSON format with 'title', 'content', and 'next_action' (either 'continue' or 'final_answer') keys.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@ -167,7 +172,7 @@ In all-caps to improve prompt compliance by emphesizing the importance of the in
|
|||||||
1. Use as many reasoning steps as possible. At least 3. -> This ensures the LLM actually takes the time to think first, and results usually in about 5-10 steps.
|
1. Use as many reasoning steps as possible. At least 3. -> This ensures the LLM actually takes the time to think first, and results usually in about 5-10 steps.
|
||||||
2. Be aware of your limitations as an llm and what you can and cannot do. -> This helps the LLM remember to use techniques which produce better results, like breaking "strawberry" down into individual letters before counting.
|
2. Be aware of your limitations as an llm and what you can and cannot do. -> This helps the LLM remember to use techniques which produce better results, like breaking "strawberry" down into individual letters before counting.
|
||||||
3. Include exploration of alternative answers. Consider you may be wrong, and if you are wrong in your reasoning, where it would be. -> A large part of the gains seem to come from the LLM re-evaluating its initial response to ensure it logically aligns with the problem.
|
3. Include exploration of alternative answers. Consider you may be wrong, and if you are wrong in your reasoning, where it would be. -> A large part of the gains seem to come from the LLM re-evaluating its initial response to ensure it logically aligns with the problem.
|
||||||
4. When you say you are re-examining, actually re-examine, and use another approach to do so. Do not just say you are re-examining. -> This encourages the prevention of the LLM just saying it re-examined a problem without actually trying a new approach.
|
4. When you say you are re-examining, actually re-examine, and use another approach to do so. Do not just say you are re-examining. -> This encourages the prevention of the LLM just saying it re-examined a problem without actually trying a new approach.
|
||||||
5. Use at least 3 methods to derive the answer. -> This helps the LLM come to the right answer by trying multiple methods to derive it.
|
5. Use at least 3 methods to derive the answer. -> This helps the LLM come to the right answer by trying multiple methods to derive it.
|
||||||
6. Use best practices. -> This is as simple as the "Do better" prompts which improve LLM code output. By telling the LLM to use best practices, or do better, it generally performs better!
|
6. Use best practices. -> This is as simple as the "Do better" prompts which improve LLM code output. By telling the LLM to use best practices, or do better, it generally performs better!
|
||||||
|
|
||||||
|
@ -1,7 +1,7 @@
|
|||||||
GROQ_API_KEY=gsk...
|
GROQ_API_KEY=gsk...
|
||||||
|
|
||||||
OLLAMA_URL=http://localhost:11434
|
OLLAMA_URL=http://localhost:11434
|
||||||
OLLAMA_MODEL=llama2
|
OLLAMA_MODEL=llama3.1:70b
|
||||||
|
|
||||||
PERPLEXITY_API_KEY=your_perplexity_api_key
|
PERPLEXITY_API_KEY=your_perplexity_api_key
|
||||||
PERPLEXITY_MODEL=llama-3.1-sonar-small-128k-online
|
PERPLEXITY_MODEL=llama-3.1-sonar-small-128k-online
|
BIN
examples/lie.1.png
Normal file
BIN
examples/lie.1.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 344 KiB |
42
ol1.py
42
ol1.py
@ -10,7 +10,7 @@ load_dotenv()
|
|||||||
|
|
||||||
# Get configuration from .env file
|
# Get configuration from .env file
|
||||||
OLLAMA_URL = os.getenv('OLLAMA_URL', 'http://localhost:11434')
|
OLLAMA_URL = os.getenv('OLLAMA_URL', 'http://localhost:11434')
|
||||||
OLLAMA_MODEL = os.getenv('OLLAMA_MODEL', 'llama2')
|
OLLAMA_MODEL = os.getenv('OLLAMA_MODEL', 'llama3.1:70b')
|
||||||
|
|
||||||
def make_api_call(messages, max_tokens, is_final_answer=False):
|
def make_api_call(messages, max_tokens, is_final_answer=False):
|
||||||
for attempt in range(3):
|
for attempt in range(3):
|
||||||
@ -21,6 +21,7 @@ def make_api_call(messages, max_tokens, is_final_answer=False):
|
|||||||
"model": OLLAMA_MODEL,
|
"model": OLLAMA_MODEL,
|
||||||
"messages": messages,
|
"messages": messages,
|
||||||
"stream": False,
|
"stream": False,
|
||||||
|
"format": "json", # important, or most of the time ollama does not generate valid json response
|
||||||
"options": {
|
"options": {
|
||||||
"num_predict": max_tokens,
|
"num_predict": max_tokens,
|
||||||
"temperature": 0.2
|
"temperature": 0.2
|
||||||
@ -38,7 +39,7 @@ def make_api_call(messages, max_tokens, is_final_answer=False):
|
|||||||
time.sleep(1) # Wait for 1 second before retrying
|
time.sleep(1) # Wait for 1 second before retrying
|
||||||
|
|
||||||
def generate_response(prompt):
|
def generate_response(prompt):
|
||||||
messages = [
|
messages = [ # add two sentences to encourage json format response
|
||||||
{"role": "system", "content": """You are an expert AI assistant that explains your reasoning step by step. For each step, provide a title that describes what you're doing in that step, along with the content. Decide if you need another step or if you're ready to give the final answer. Respond in JSON format with 'title', 'content', and 'next_action' (either 'continue' or 'final_answer') keys. USE AS MANY REASONING STEPS AS POSSIBLE. AT LEAST 3. BE AWARE OF YOUR LIMITATIONS AS AN LLM AND WHAT YOU CAN AND CANNOT DO. IN YOUR REASONING, INCLUDE EXPLORATION OF ALTERNATIVE ANSWERS. CONSIDER YOU MAY BE WRONG, AND IF YOU ARE WRONG IN YOUR REASONING, WHERE IT WOULD BE. FULLY TEST ALL OTHER POSSIBILITIES. YOU CAN BE WRONG. WHEN YOU SAY YOU ARE RE-EXAMINING, ACTUALLY RE-EXAMINE, AND USE ANOTHER APPROACH TO DO SO. DO NOT JUST SAY YOU ARE RE-EXAMINING. USE AT LEAST 3 METHODS TO DERIVE THE ANSWER. USE BEST PRACTICES.
|
{"role": "system", "content": """You are an expert AI assistant that explains your reasoning step by step. For each step, provide a title that describes what you're doing in that step, along with the content. Decide if you need another step or if you're ready to give the final answer. Respond in JSON format with 'title', 'content', and 'next_action' (either 'continue' or 'final_answer') keys. USE AS MANY REASONING STEPS AS POSSIBLE. AT LEAST 3. BE AWARE OF YOUR LIMITATIONS AS AN LLM AND WHAT YOU CAN AND CANNOT DO. IN YOUR REASONING, INCLUDE EXPLORATION OF ALTERNATIVE ANSWERS. CONSIDER YOU MAY BE WRONG, AND IF YOU ARE WRONG IN YOUR REASONING, WHERE IT WOULD BE. FULLY TEST ALL OTHER POSSIBILITIES. YOU CAN BE WRONG. WHEN YOU SAY YOU ARE RE-EXAMINING, ACTUALLY RE-EXAMINE, AND USE ANOTHER APPROACH TO DO SO. DO NOT JUST SAY YOU ARE RE-EXAMINING. USE AT LEAST 3 METHODS TO DERIVE THE ANSWER. USE BEST PRACTICES.
|
||||||
|
|
||||||
Example of a valid JSON response:
|
Example of a valid JSON response:
|
||||||
@ -47,30 +48,31 @@ Example of a valid JSON response:
|
|||||||
"title": "Identifying Key Information",
|
"title": "Identifying Key Information",
|
||||||
"content": "To begin solving this problem, we need to carefully examine the given information and identify the crucial elements that will guide our solution process. This involves...",
|
"content": "To begin solving this problem, we need to carefully examine the given information and identify the crucial elements that will guide our solution process. This involves...",
|
||||||
"next_action": "continue"
|
"next_action": "continue"
|
||||||
}```
|
}```.
|
||||||
|
You MUST response using the expected json schema, and your response must be valid json. This JSON response is essential for our job.
|
||||||
"""},
|
"""},
|
||||||
{"role": "user", "content": prompt},
|
{"role": "user", "content": prompt},
|
||||||
{"role": "assistant", "content": "Thank you! I will now think step by step following my instructions, starting at the beginning after decomposing the problem."}
|
{"role": "assistant", "content": "Thank you! I will now think step by step following my instructions, starting at the beginning after decomposing the problem."}
|
||||||
]
|
]
|
||||||
|
|
||||||
steps = []
|
steps = []
|
||||||
step_count = 1
|
step_count = 1
|
||||||
total_thinking_time = 0
|
total_thinking_time = 0
|
||||||
|
|
||||||
while True:
|
while True:
|
||||||
start_time = time.time()
|
start_time = time.time()
|
||||||
step_data = make_api_call(messages, 300)
|
step_data = make_api_call(messages, 300)
|
||||||
end_time = time.time()
|
end_time = time.time()
|
||||||
thinking_time = end_time - start_time
|
thinking_time = end_time - start_time
|
||||||
total_thinking_time += thinking_time
|
total_thinking_time += thinking_time
|
||||||
|
|
||||||
steps.append((f"Step {step_count}: {step_data['title']}", step_data['content'], thinking_time))
|
steps.append((f"Step {step_count}: {step_data['title']}", step_data['content'], thinking_time))
|
||||||
|
|
||||||
messages.append({"role": "assistant", "content": json.dumps(step_data)})
|
messages.append({"role": "assistant", "content": json.dumps(step_data)})
|
||||||
|
|
||||||
if step_data['next_action'] == 'final_answer':
|
if step_data['next_action'] == 'final_answer':
|
||||||
break
|
break
|
||||||
|
|
||||||
step_count += 1
|
step_count += 1
|
||||||
|
|
||||||
# Yield after each step for Streamlit to update
|
# Yield after each step for Streamlit to update
|
||||||
@ -78,25 +80,25 @@ Example of a valid JSON response:
|
|||||||
|
|
||||||
# Generate final answer
|
# Generate final answer
|
||||||
messages.append({"role": "user", "content": "Please provide the final answer based on your reasoning above."})
|
messages.append({"role": "user", "content": "Please provide the final answer based on your reasoning above."})
|
||||||
|
|
||||||
start_time = time.time()
|
start_time = time.time()
|
||||||
final_data = make_api_call(messages, 200, is_final_answer=True)
|
final_data = make_api_call(messages, 200, is_final_answer=True)
|
||||||
end_time = time.time()
|
end_time = time.time()
|
||||||
thinking_time = end_time - start_time
|
thinking_time = end_time - start_time
|
||||||
total_thinking_time += thinking_time
|
total_thinking_time += thinking_time
|
||||||
|
|
||||||
steps.append(("Final Answer", final_data['content'], thinking_time))
|
steps.append(("Final Answer", final_data['content'], thinking_time))
|
||||||
|
|
||||||
yield steps, total_thinking_time
|
yield steps, total_thinking_time
|
||||||
|
|
||||||
def main():
|
def main():
|
||||||
st.set_page_config(page_title="ol1 prototype - Ollama version", page_icon="🧠", layout="wide")
|
st.set_page_config(page_title="ol1 prototype - Ollama version", page_icon="🧠", layout="wide")
|
||||||
|
|
||||||
st.title("ol1: Using Ollama to create o1-like reasoning chains")
|
st.title("ol1: Using Ollama to create o1-like reasoning chains")
|
||||||
|
|
||||||
st.markdown("""
|
st.markdown("""
|
||||||
This is an early prototype of using prompting to create o1-like reasoning chains to improve output accuracy. It is not perfect and accuracy has yet to be formally evaluated. It is powered by Ollama so that the reasoning step is local!
|
This is an early prototype of using prompting to create o1-like reasoning chains to improve output accuracy. It is not perfect and accuracy has yet to be formally evaluated. It is powered by Ollama so that the reasoning step is local!
|
||||||
|
|
||||||
Forked from [bklieger-groq](https://github.com/bklieger-groq)
|
Forked from [bklieger-groq](https://github.com/bklieger-groq)
|
||||||
Open source [repository here](https://github.com/tcsenpai/ol1-p1)
|
Open source [repository here](https://github.com/tcsenpai/ol1-p1)
|
||||||
""")
|
""")
|
||||||
@ -107,25 +109,27 @@ def main():
|
|||||||
|
|
||||||
# Text input for user query
|
# Text input for user query
|
||||||
user_query = st.text_input("Enter your query:", placeholder="e.g., How many 'R's are in the word strawberry?")
|
user_query = st.text_input("Enter your query:", placeholder="e.g., How many 'R's are in the word strawberry?")
|
||||||
|
|
||||||
if user_query:
|
if user_query:
|
||||||
st.write("Generating response...")
|
st.write("Generating response...")
|
||||||
|
|
||||||
# Create empty elements to hold the generated text and total time
|
# Create empty elements to hold the generated text and total time
|
||||||
response_container = st.empty()
|
response_container = st.empty()
|
||||||
time_container = st.empty()
|
time_container = st.empty()
|
||||||
|
|
||||||
# Generate and display the response
|
# Generate and display the response
|
||||||
for steps, total_thinking_time in generate_response(user_query):
|
for steps, total_thinking_time in generate_response(user_query):
|
||||||
with response_container.container():
|
with response_container.container():
|
||||||
for i, (title, content, thinking_time) in enumerate(steps):
|
for i, (title, content, thinking_time) in enumerate(steps):
|
||||||
if title.startswith("Final Answer"):
|
if title.startswith("Final Answer"):
|
||||||
st.markdown(f"### {title}")
|
st.markdown(f"### {title}")
|
||||||
st.markdown(content.replace('\n', '<br>'), unsafe_allow_html=True)
|
# this will not work were there codes in the content
|
||||||
|
#st.markdown(content.replace('\n', '<br>'), unsafe_allow_html=True)
|
||||||
|
st.markdown(content, unsafe_allow_html=True)
|
||||||
else:
|
else:
|
||||||
with st.expander(title, expanded=True):
|
with st.expander(title, expanded=True):
|
||||||
st.markdown(content.replace('\n', '<br>'), unsafe_allow_html=True)
|
st.markdown(content.replace('\n', '<br>'), unsafe_allow_html=True)
|
||||||
|
|
||||||
# Only show total time when it's available at the end
|
# Only show total time when it's available at the end
|
||||||
if total_thinking_time is not None:
|
if total_thinking_time is not None:
|
||||||
time_container.markdown(f"**Total thinking time: {total_thinking_time:.2f} seconds**")
|
time_container.markdown(f"**Total thinking time: {total_thinking_time:.2f} seconds**")
|
||||||
|
Loading…
x
Reference in New Issue
Block a user