mirror of
https://github.com/tcsenpai/multi1.git
synced 2025-06-06 02:55:21 +00:00
fix wording ambiguities
This commit is contained in:
parent
5c2c466a15
commit
e80dca6cee
@ -11,11 +11,11 @@ g1 demonstrates the potential of prompting alone to overcome straightforward LLM
|
||||
|
||||
### How it works
|
||||
|
||||
g1 on top of Llama3.1-70b creates reasoning chains, in principle a dynamic Chain of Thought, that allows the LLM to "think" and solve some logical problems that usually otherwise stump leading models.
|
||||
g1 powered by Llama3.1-70b creates reasoning chains, in principle a dynamic Chain of Thought, that allows the LLM to "think" and solve some logical problems that usually otherwise stump leading models.
|
||||
|
||||
At each step, the LLM can choose to continue to another reasoning step, or provide a final answer. Each step is titled and visible to the user. The system prompt also includes tips for the LLM. There is a full explanation under Prompt Breakdown, but a few examples are asking the model to “include exploration of alternative answers” and “use at least 3 methods to derive the answer”.
|
||||
|
||||
The reasoning ability of the LLM is improved through combining Chain-of-Thought with the requirement to try multiple methods, explore alternative answers, question previous draft solutions, and consider the LLM’s limitations. This alone, without additonal training, is sufficient to achieve ~70% accuracy on the Strawberry problem (n=10, "How many Rs are in strawberry?"). Without prompting, Llama-3.1-70b had 0% accuracy and ChatGPT-4o had 30% accuracy.
|
||||
The reasoning ability of the LLM is therefore improved through combining Chain-of-Thought with the requirement to try multiple methods, explore alternative answers, question previous draft solutions, and consider the LLM’s limitations. This alone, without any training, is sufficient to achieve ~70% accuracy on the Strawberry problem (n=10, "How many Rs are in strawberry?"). Without prompting, Llama-3.1-70b had 0% accuracy and ChatGPT-4o had 30% accuracy.
|
||||
|
||||
|
||||
### Examples
|
||||
|
Loading…
x
Reference in New Issue
Block a user