Merge pull request #1 from fengwang/main

Encourage self-hosted ollama to generate valid json response
2025-06-06 19:15:23 +00:00 · 2024-09-17 09:03:56 +02:00 · 2024-09-17 09:03:56 +02:00 · 3e00dfc89a
commit 3e00dfc89a
parent c2c4213f0f 03ab63dae5
4 changed files with 33 additions and 24 deletions
--- a/README.md
+++ b/README.md
@ -21,7 +21,7 @@
 This is an early prototype of using prompting strategies to improve the LLM's reasoning capabilities through o1-like reasoning chains. This allows the LLM to "think" and solve logical problems that usually otherwise stump leading models. Unlike o1, all the reasoning tokens are shown, and the app uses an open source model.
-multi1 is experimental and being open sourced to help inspire the open source community to develop new strategies to produce o1-like reasoning. This experiment helps show the power of prompting reasoning in visualized steps, not a comparison to or full replication of o1, which uses different techniques. OpenAI's o1 is instead trained with large-scale reinforcement learning to reason using Chain of Thought, achieving state-of-the-art performance on complex PhD-level problems. 
+multi1 is experimental and being open sourced to help inspire the open source community to develop new strategies to produce o1-like reasoning. This experiment helps show the power of prompting reasoning in visualized steps, not a comparison to or full replication of o1, which uses different techniques. OpenAI's o1 is instead trained with large-scale reinforcement learning to reason using Chain of Thought, achieving state-of-the-art performance on complex PhD-level problems.
 multi1 demonstrates the potential of prompting alone to overcome straightforward LLM logic issues like the Strawberry problem, allowing existing open source models to benefit from dynamic reasoning chains and an improved interface for exploring them.
@ -57,6 +57,11 @@ Result:
 ![0.9 or 0.11 example](examples/math.png)
 Prompt: In the context of Lie Group and Lie Algebra, let $R \in E$ be an irreducible root system. Show that then $E$ is an irreducible representation of the Weyl group $W$.
 ![](examples/lie.1.png)
 ### Quickstart
 To use the launcher, follow these instructions:
@ -152,13 +157,13 @@ First, a persona is added:
 Then, instructions to describe the expected step-by-step reasoning process while titling each reasoning step. This includes the ability for the LLM to decide if another reasoning step is needed or if the final answer can be provided.
-> For each step, provide a title that describes what you're doing in that step, along with the content. Decide if you need another step or if you're ready to give the final answer. 
+> For each step, provide a title that describes what you're doing in that step, along with the content. Decide if you need another step or if you're ready to give the final answer.
 JSON formatting is introduced with an example provided later.
-> Respond in JSON format with 'title', 'content', and 'next_action' (either 'continue' or 'final_answer') keys. 
+> Respond in JSON format with 'title', 'content', and 'next_action' (either 'continue' or 'final_answer') keys.
@ -167,7 +172,7 @@ In all-caps to improve prompt compliance by emphesizing the importance of the in
 1. Use as many reasoning steps as possible. At least 3. -> This ensures the LLM actually takes the time to think first, and results usually in about 5-10 steps.
 2. Be aware of your limitations as an llm and what you can and cannot do. -> This helps the LLM remember to use techniques which produce better results, like breaking "strawberry" down into individual letters before counting.
 3. Include exploration of alternative answers. Consider you may be wrong, and if you are wrong in your reasoning, where it would be. -> A large part of the gains seem to come from the LLM re-evaluating its initial response to ensure it logically aligns with the problem.
-4. When you say you are re-examining, actually re-examine, and use another approach to do so. Do not just say you are re-examining. -> This encourages the prevention of the LLM just saying it re-examined a problem without actually trying a new approach. 
+4. When you say you are re-examining, actually re-examine, and use another approach to do so. Do not just say you are re-examining. -> This encourages the prevention of the LLM just saying it re-examined a problem without actually trying a new approach.
 5. Use at least 3 methods to derive the answer. -> This helps the LLM come to the right answer by trying multiple methods to derive it.
 6. Use best practices. -> This is as simple as the "Do better" prompts which improve LLM code output. By telling the LLM to use best practices, or do better, it generally performs better!
--- a/example.env
+++ b/example.env
@ -1,7 +1,7 @@
 GROQ_API_KEY=gsk...
 OLLAMA_URL=http://localhost:11434
-OLLAMA_MODEL=llama2
+OLLAMA_MODEL=llama3.1:70b
 PERPLEXITY_API_KEY=your_perplexity_api_key
 PERPLEXITY_MODEL=llama-3.1-sonar-small-128k-online
--- a/examples/lie.1.png
+++ b/examples/lie.1.png
--- a/ol1.py
+++ b/ol1.py
@ -10,7 +10,7 @@ load_dotenv()
 # Get configuration from .env file
 OLLAMA_URL = os.getenv('OLLAMA_URL', 'http://localhost:11434')
-OLLAMA_MODEL = os.getenv('OLLAMA_MODEL', 'llama2')
+OLLAMA_MODEL = os.getenv('OLLAMA_MODEL', 'llama3.1:70b')
 def make_api_call(messages, max_tokens, is_final_answer=False):
    for attempt in range(3):
@ -21,6 +21,7 @@ def make_api_call(messages, max_tokens, is_final_answer=False):
                    "model": OLLAMA_MODEL,
                    "messages": messages,
                    "stream": False,
                    "format": "json", # important, or most of the time ollama does not generate valid json response
                    "options": {
                        "num_predict": max_tokens,
                        "temperature": 0.2
@ -38,7 +39,7 @@ def make_api_call(messages, max_tokens, is_final_answer=False):
            time.sleep(1)  # Wait for 1 second before retrying
 def generate_response(prompt):
-    messages = [
+    messages = [ # add two sentences to encourage json format response
        {"role": "system", "content": """You are an expert AI assistant that explains your reasoning step by step. For each step, provide a title that describes what you're doing in that step, along with the content. Decide if you need another step or if you're ready to give the final answer. Respond in JSON format with 'title', 'content', and 'next_action' (either 'continue' or 'final_answer') keys. USE AS MANY REASONING STEPS AS POSSIBLE. AT LEAST 3. BE AWARE OF YOUR LIMITATIONS AS AN LLM AND WHAT YOU CAN AND CANNOT DO. IN YOUR REASONING, INCLUDE EXPLORATION OF ALTERNATIVE ANSWERS. CONSIDER YOU MAY BE WRONG, AND IF YOU ARE WRONG IN YOUR REASONING, WHERE IT WOULD BE. FULLY TEST ALL OTHER POSSIBILITIES. YOU CAN BE WRONG. WHEN YOU SAY YOU ARE RE-EXAMINING, ACTUALLY RE-EXAMINE, AND USE ANOTHER APPROACH TO DO SO. DO NOT JUST SAY YOU ARE RE-EXAMINING. USE AT LEAST 3 METHODS TO DERIVE THE ANSWER. USE BEST PRACTICES.
 Example of a valid JSON response:
@ -47,30 +48,31 @@ Example of a valid JSON response:
    "title": "Identifying Key Information",
    "content": "To begin solving this problem, we need to carefully examine the given information and identify the crucial elements that will guide our solution process. This involves...",
    "next_action": "continue"
-}```
+}```.
 You MUST response using the expected json schema, and your response must be valid json. This JSON response is essential for our job.
 """},
        {"role": "user", "content": prompt},
        {"role": "assistant", "content": "Thank you! I will now think step by step following my instructions, starting at the beginning after decomposing the problem."}
    ]
-    
+
    steps = []
    step_count = 1
    total_thinking_time = 0
-    
+
    while True:
        start_time = time.time()
        step_data = make_api_call(messages, 300)
        end_time = time.time()
        thinking_time = end_time - start_time
        total_thinking_time += thinking_time
-        
+
        steps.append((f"Step {step_count}: {step_data['title']}", step_data['content'], thinking_time))
-        
+
        messages.append({"role": "assistant", "content": json.dumps(step_data)})
-        
+
        if step_data['next_action'] == 'final_answer':
            break
-        
+
        step_count += 1
        # Yield after each step for Streamlit to update
@ -78,25 +80,25 @@ Example of a valid JSON response:
    # Generate final answer
    messages.append({"role": "user", "content": "Please provide the final answer based on your reasoning above."})
-    
+
    start_time = time.time()
    final_data = make_api_call(messages, 200, is_final_answer=True)
    end_time = time.time()
    thinking_time = end_time - start_time
    total_thinking_time += thinking_time
-    
+
    steps.append(("Final Answer", final_data['content'], thinking_time))
    yield steps, total_thinking_time
 def main():
    st.set_page_config(page_title="ol1 prototype - Ollama version", page_icon="🧠", layout="wide")
-    
+
    st.title("ol1: Using Ollama to create o1-like reasoning chains")
-    
+
    st.markdown("""
    This is an early prototype of using prompting to create o1-like reasoning chains to improve output accuracy. It is not perfect and accuracy has yet to be formally evaluated. It is powered by Ollama so that the reasoning step is local!
-                
+
    Forked from [bklieger-groq](https://github.com/bklieger-groq)
    Open source [repository here](https://github.com/tcsenpai/ol1-p1)
    """)
@ -107,25 +109,27 @@ def main():
    # Text input for user query
    user_query = st.text_input("Enter your query:", placeholder="e.g., How many 'R's are in the word strawberry?")
-    
+
    if user_query:
        st.write("Generating response...")
-        
+
        # Create empty elements to hold the generated text and total time
        response_container = st.empty()
        time_container = st.empty()
-        
+
        # Generate and display the response
        for steps, total_thinking_time in generate_response(user_query):
            with response_container.container():
                for i, (title, content, thinking_time) in enumerate(steps):
                    if title.startswith("Final Answer"):
                        st.markdown(f"### {title}")
-                        st.markdown(content.replace('\n', '<br>'), unsafe_allow_html=True)
+                        # this will not work were there codes in the content
                        #st.markdown(content.replace('\n', '<br>'), unsafe_allow_html=True)
                        st.markdown(content, unsafe_allow_html=True)
                    else:
                        with st.expander(title, expanded=True):
                            st.markdown(content.replace('\n', '<br>'), unsafe_allow_html=True)
-            
+
            # Only show total time when it's available at the end
            if total_thinking_time is not None:
                time_container.markdown(f"**Total thinking time: {total_thinking_time:.2f} seconds**")