\ In Tab. 18, we demonstrate that the language model (ChatGPT) not only successfully identifies ungrounded information, but also identifies logical errors within the given solutions.

\ In Tab. 19, we illustrate a case where the language model fails to detect ungrounded premise numbers, mistakenly assuming that these numbers can be derived from grounded ones.

\ Lastly, in Tab. 20, we illustrate a case where the language model is sometimes unable to correctly identify grounded numbers.

\ Two-shot prompt for direct reasoning chain verification without Natural Program format.

\ One-shot Natural Program prompt for reasoning chain generation on math word problems.

\ One-shot Natural Program prompt for reasoning chain generation on math word problems with multiple choice.

\ Two-shot Natural Program prompt for reasoning chain generation on the Date dataset.

\ One-shot Natural Program prompt for reasoning chain generation on the Last Letters dataset.

\ One-shot prompt for deductive verification of a single reasoning step, following our Natural Program format and step-by-step reasoning chain decomposition.

\ our deductive verification approach successfully discovers ungrounded information and reasoning mistakes.

\ our deductive verification process fails to find out ungrounded information in the reasoning step. The number 240 in the reasoning step is ungrounded, but the model states that it can be calculated from grounded numbers.

\ our deductive verification process sometimes treats grounded information as if they were ungrounded. The number 120 is provided in the given information, but the model states that it is ungrounded.

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

Feed: Hacker Noon - Medium

View: Original article