diff --git a/README.md b/README.md
index 603cc38..287f0fa 100644
--- a/README.md
+++ b/README.md
@@ -21,7 +21,7 @@
This is an early prototype of using prompting strategies to improve the LLM's reasoning capabilities through o1-like reasoning chains. This allows the LLM to "think" and solve logical problems that usually otherwise stump leading models. Unlike o1, all the reasoning tokens are shown, and the app uses an open source model.
-multi1 is experimental and being open sourced to help inspire the open source community to develop new strategies to produce o1-like reasoning. This experiment helps show the power of prompting reasoning in visualized steps, not a comparison to or full replication of o1, which uses different techniques. OpenAI's o1 is instead trained with large-scale reinforcement learning to reason using Chain of Thought, achieving state-of-the-art performance on complex PhD-level problems.
+multi1 is experimental and being open sourced to help inspire the open source community to develop new strategies to produce o1-like reasoning. This experiment helps show the power of prompting reasoning in visualized steps, not a comparison to or full replication of o1, which uses different techniques. OpenAI's o1 is instead trained with large-scale reinforcement learning to reason using Chain of Thought, achieving state-of-the-art performance on complex PhD-level problems.
multi1 demonstrates the potential of prompting alone to overcome straightforward LLM logic issues like the Strawberry problem, allowing existing open source models to benefit from dynamic reasoning chains and an improved interface for exploring them.
@@ -57,6 +57,11 @@ Result:

+Prompt: In the context of Lie Group and Lie Algebra, let $R \in E$ be an irreducible root system. Show that then $E$ is an irreducible representation of the Weyl group $W$.
+
+
+
+
### Quickstart
To use the launcher, follow these instructions:
@@ -152,13 +157,13 @@ First, a persona is added:
Then, instructions to describe the expected step-by-step reasoning process while titling each reasoning step. This includes the ability for the LLM to decide if another reasoning step is needed or if the final answer can be provided.
-> For each step, provide a title that describes what you're doing in that step, along with the content. Decide if you need another step or if you're ready to give the final answer.
+> For each step, provide a title that describes what you're doing in that step, along with the content. Decide if you need another step or if you're ready to give the final answer.
JSON formatting is introduced with an example provided later.
-> Respond in JSON format with 'title', 'content', and 'next_action' (either 'continue' or 'final_answer') keys.
+> Respond in JSON format with 'title', 'content', and 'next_action' (either 'continue' or 'final_answer') keys.
@@ -167,7 +172,7 @@ In all-caps to improve prompt compliance by emphesizing the importance of the in
1. Use as many reasoning steps as possible. At least 3. -> This ensures the LLM actually takes the time to think first, and results usually in about 5-10 steps.
2. Be aware of your limitations as an llm and what you can and cannot do. -> This helps the LLM remember to use techniques which produce better results, like breaking "strawberry" down into individual letters before counting.
3. Include exploration of alternative answers. Consider you may be wrong, and if you are wrong in your reasoning, where it would be. -> A large part of the gains seem to come from the LLM re-evaluating its initial response to ensure it logically aligns with the problem.
-4. When you say you are re-examining, actually re-examine, and use another approach to do so. Do not just say you are re-examining. -> This encourages the prevention of the LLM just saying it re-examined a problem without actually trying a new approach.
+4. When you say you are re-examining, actually re-examine, and use another approach to do so. Do not just say you are re-examining. -> This encourages the prevention of the LLM just saying it re-examined a problem without actually trying a new approach.
5. Use at least 3 methods to derive the answer. -> This helps the LLM come to the right answer by trying multiple methods to derive it.
6. Use best practices. -> This is as simple as the "Do better" prompts which improve LLM code output. By telling the LLM to use best practices, or do better, it generally performs better!
diff --git a/example.env b/example.env
index f66b230..4645af3 100644
--- a/example.env
+++ b/example.env
@@ -1,7 +1,7 @@
GROQ_API_KEY=gsk...
OLLAMA_URL=http://localhost:11434
-OLLAMA_MODEL=llama2
+OLLAMA_MODEL=llama3.1:70b
PERPLEXITY_API_KEY=your_perplexity_api_key
PERPLEXITY_MODEL=llama-3.1-sonar-small-128k-online
\ No newline at end of file
diff --git a/examples/lie.1.png b/examples/lie.1.png
new file mode 100644
index 0000000..f990d3d
Binary files /dev/null and b/examples/lie.1.png differ
diff --git a/ol1.py b/ol1.py
index 870bd15..b0dbdc4 100644
--- a/ol1.py
+++ b/ol1.py
@@ -10,7 +10,7 @@ load_dotenv()
# Get configuration from .env file
OLLAMA_URL = os.getenv('OLLAMA_URL', 'http://localhost:11434')
-OLLAMA_MODEL = os.getenv('OLLAMA_MODEL', 'llama2')
+OLLAMA_MODEL = os.getenv('OLLAMA_MODEL', 'llama3.1:70b')
def make_api_call(messages, max_tokens, is_final_answer=False):
for attempt in range(3):
@@ -21,6 +21,7 @@ def make_api_call(messages, max_tokens, is_final_answer=False):
"model": OLLAMA_MODEL,
"messages": messages,
"stream": False,
+ "format": "json", # important, or most of the time ollama does not generate valid json response
"options": {
"num_predict": max_tokens,
"temperature": 0.2
@@ -38,7 +39,7 @@ def make_api_call(messages, max_tokens, is_final_answer=False):
time.sleep(1) # Wait for 1 second before retrying
def generate_response(prompt):
- messages = [
+ messages = [ # add two sentences to encourage json format response
{"role": "system", "content": """You are an expert AI assistant that explains your reasoning step by step. For each step, provide a title that describes what you're doing in that step, along with the content. Decide if you need another step or if you're ready to give the final answer. Respond in JSON format with 'title', 'content', and 'next_action' (either 'continue' or 'final_answer') keys. USE AS MANY REASONING STEPS AS POSSIBLE. AT LEAST 3. BE AWARE OF YOUR LIMITATIONS AS AN LLM AND WHAT YOU CAN AND CANNOT DO. IN YOUR REASONING, INCLUDE EXPLORATION OF ALTERNATIVE ANSWERS. CONSIDER YOU MAY BE WRONG, AND IF YOU ARE WRONG IN YOUR REASONING, WHERE IT WOULD BE. FULLY TEST ALL OTHER POSSIBILITIES. YOU CAN BE WRONG. WHEN YOU SAY YOU ARE RE-EXAMINING, ACTUALLY RE-EXAMINE, AND USE ANOTHER APPROACH TO DO SO. DO NOT JUST SAY YOU ARE RE-EXAMINING. USE AT LEAST 3 METHODS TO DERIVE THE ANSWER. USE BEST PRACTICES.
Example of a valid JSON response:
@@ -47,30 +48,31 @@ Example of a valid JSON response:
"title": "Identifying Key Information",
"content": "To begin solving this problem, we need to carefully examine the given information and identify the crucial elements that will guide our solution process. This involves...",
"next_action": "continue"
-}```
+}```.
+You MUST response using the expected json schema, and your response must be valid json. This JSON response is essential for our job.
"""},
{"role": "user", "content": prompt},
{"role": "assistant", "content": "Thank you! I will now think step by step following my instructions, starting at the beginning after decomposing the problem."}
]
-
+
steps = []
step_count = 1
total_thinking_time = 0
-
+
while True:
start_time = time.time()
step_data = make_api_call(messages, 300)
end_time = time.time()
thinking_time = end_time - start_time
total_thinking_time += thinking_time
-
+
steps.append((f"Step {step_count}: {step_data['title']}", step_data['content'], thinking_time))
-
+
messages.append({"role": "assistant", "content": json.dumps(step_data)})
-
+
if step_data['next_action'] == 'final_answer':
break
-
+
step_count += 1
# Yield after each step for Streamlit to update
@@ -78,25 +80,25 @@ Example of a valid JSON response:
# Generate final answer
messages.append({"role": "user", "content": "Please provide the final answer based on your reasoning above."})
-
+
start_time = time.time()
final_data = make_api_call(messages, 200, is_final_answer=True)
end_time = time.time()
thinking_time = end_time - start_time
total_thinking_time += thinking_time
-
+
steps.append(("Final Answer", final_data['content'], thinking_time))
yield steps, total_thinking_time
def main():
st.set_page_config(page_title="ol1 prototype - Ollama version", page_icon="🧠", layout="wide")
-
+
st.title("ol1: Using Ollama to create o1-like reasoning chains")
-
+
st.markdown("""
This is an early prototype of using prompting to create o1-like reasoning chains to improve output accuracy. It is not perfect and accuracy has yet to be formally evaluated. It is powered by Ollama so that the reasoning step is local!
-
+
Forked from [bklieger-groq](https://github.com/bklieger-groq)
Open source [repository here](https://github.com/tcsenpai/ol1-p1)
""")
@@ -107,25 +109,27 @@ def main():
# Text input for user query
user_query = st.text_input("Enter your query:", placeholder="e.g., How many 'R's are in the word strawberry?")
-
+
if user_query:
st.write("Generating response...")
-
+
# Create empty elements to hold the generated text and total time
response_container = st.empty()
time_container = st.empty()
-
+
# Generate and display the response
for steps, total_thinking_time in generate_response(user_query):
with response_container.container():
for i, (title, content, thinking_time) in enumerate(steps):
if title.startswith("Final Answer"):
st.markdown(f"### {title}")
- st.markdown(content.replace('\n', '
'), unsafe_allow_html=True)
+ # this will not work were there codes in the content
+ #st.markdown(content.replace('\n', '
'), unsafe_allow_html=True)
+ st.markdown(content, unsafe_allow_html=True)
else:
with st.expander(title, expanded=True):
st.markdown(content.replace('\n', '
'), unsafe_allow_html=True)
-
+
# Only show total time when it's available at the end
if total_thinking_time is not None:
time_container.markdown(f"**Total thinking time: {total_thinking_time:.2f} seconds**")