skip to Main Content
Back to Insights
AI Apps

New Ideas in AI: Getting LLMs to Reason Through Chain-of-Thought Prompting


    The latest session of our New Ideas in AI Series brought together yet another exciting group of leading engineers, founders, and academics from across the industry to discuss applications of chain-of-thought prompting with Jason Wei, a researcher at OpenAI.

    Jason’s research focuses on properties of LLMs that emerge as a result of model scaling. His 2022 paper Chain-of-Thought Prompting Elicits Reasoning in Large Language Models introduces the idea of “thinking step-by-step” to decompose hard tasks into smaller and simpler few-shot prompts. Chain of thought prompting thus enables a new breed of AI applications which can solve more complex problems through applying the reasoning demonstrated in an example solution for a similar problem.

    Below is an example of the sort of example that might be provided to the model for a given problem. The LLM can then use that information to reason through additional questions. This allows it to solve more complex problems like arithmetic, in the example, that open source LLMs tend to struggle with.

    The dinner series presented an opportunity for our group to press Jason further on some of his findings. Here’s a highlight of some of the more thought provoking questions and discussion topics: 

    In the examples, it seems like the user is already demonstrating the reasoning required to get to the end result in the prompt itself. So why is an LLM useful in the first place?

    A major benefit of CoT is in leveraging the LLM to apply a broader framework to the specifics of a given problem. It is thus extremely useful in automating the problem-solving work across a volume of sufficiently similar tasks.

    For extremely complex problems, a problem solver often won’t themselves know what the right reasoning framework to solve the problem is. In such cases, without a known right-choice example to offer to the model, can CoT still be useful?

    If the user has a set of potentially useful reasoning frameworks for such an extremely complex problem, CoT might be best engaged as a way to apply these frameworks at scale and efficiently generate a set of possible solutions to the problem. These potential solutions can then be tested for correctness. On the other hand, if the user lacks a set of relevant reasoning examples to apply, they will be unable to generate an adequate chain-of-thought prompt, a seeming limitation of the approach.

    What is a simple, practical example of applying CoT?  

    Jason suggests the simple example of a travel app that uses GPT-4 to offer recommendations at the user-level. The product might explicitly prompt the model to list out the known traits about the user before asking itself about best-fit recommendations given this knowledge of the user.

    How effective might Chain of Thought be in performance over standard prompting with GPT-4 versus prior models?

    Though Jason is unable to address the specifics of GPT-4, he explained that his published research has shown two things:

    1. Chain of Thought prompting is an emergent ability of model scale, meaning it does not positively impact performance until used with a model of sufficient scale.
    2. The effectiveness of Chain of Thought prompting tends to increase as model size increases. So, the larger the model, the greater the impact of Chain of Thought prompting.

    Consider the chart below, reformatted from Jason’s original paper.

    When to build a model?

    Another key finding of Jason’s research has been that sufficient model size is critical for logical chains of thought, which led to discussion of when to, or not to, build a model. Jason’s take on the matter is that there are three reasons to build an LLM in-house:

    1.     You have proprietary data: off-the-shelf models will, by definition, not include these data in their knowledge.
    2.     You want your model to serve a very specific task: smaller LLMs can be more performant on a narrow set of tasks if trained for that specific domain.
    3.     You’re a foundation model provider

    Given the relative simplicity of CoT in action, it greatly extends the type of problems LLMs can solve. We’re grateful to Jason for joining us. Learn more about our New Ideas in AI Series here.

    Back To Top