By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Can Transformers Learn Logical Reasoning from Scratch? | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Can Transformers Learn Logical Reasoning from Scratch? | HackerNoon
Computing

Can Transformers Learn Logical Reasoning from Scratch? | HackerNoon

News Room
Last updated: 2025/11/03 at 6:31 PM
News Room Published 3 November 2025
Share
Can Transformers Learn Logical Reasoning from Scratch? | HackerNoon
SHARE

:::info
Authors:

(1) Emmanuel Abbe, Apple and EPFL;

(2) Samy Bengio, Apple;

(3) Aryo Lotf, EPFL;

(4) Colin Sandon, EPFL;

(5) Omid Saremi, Apple.

:::

Table of Links

Abstract and 1. Introduction

1.1 Syllogisms composition

1.2 Hardness of long compositions

1.3 Hardness of global reasoning

1.4 Our contributions

  1. Results on the local reasoning barrier

    2.1 Defining locality and auto-regressive locality

    2.2 Transformers require low locality: formal results

    2.3 Agnostic scratchpads cannot break the locality

  2. Scratchpads to break the locality

    3.1 Educated scratchpad

    3.2 Inductive Scratchpads

  3. Conclusion, Acknowledgments, and References

A. Further related literature

B. Additional experiments

C. Experiment and implementation details

D. Proof of Theorem 1

E. Comment on Lemma 1

F. Discussion on circuit complexity connections

G. More experiments with ChatGPT

Abstract

Can Transformers predict new syllogisms by composing established ones? More generally, what type of targets can be learned by such models from scratch? Recent works show that Transformers can be Turing-complete in terms of expressivity, but this does not address the learnability objective. This paper puts forward the notion of distribution locality to capture when weak learning is efficiently achievable by regular Transformers, where the locality measures the least number of tokens required in addition to the tokens histogram to correlate nontrivially with the target. As shown experimentally and theoretically under additional assumptions, distributions with high locality cannot be learned efficiently. In particular, syllogisms cannot be composed on long chains. Furthermore, we show that (i) an agnostic scratchpad cannot help to break the locality barrier, (ii) an educated scratchpad can help if it breaks the locality at each step, (iii) a notion of ‘inductive scratchpad’ can both break the locality and improve the out-of-distribution generalization, e.g., generalizing to almost double input size for some arithmetic tasks.

1 Introduction

Transformers [1] have proved to have strong learning capabilities, in particular in applications with large amounts of text, image, or audio data [2, 3]. Some reasoning capabilities are also notable in these settings, however, the picture deteriorates when the target complexity increases, such as in tasks involving more advanced forms of ‘reasoning’ [4, 5, 6, 7, 8, 9, 10]. While reasoning is present at all levels of learning, it is pushed to a higher level in tasks such as logic or mathematics, where ‘learning by seeing enough representative examples’ is precluded by the more combinatorial nature of the task. For such tasks, combining learned concepts in order to extrapolate seems necessary, as for the length generalization problem [11]. Current Transformer-based models exhibit difficulties learning at scale on such tasks. Can we understand why and what is missing? We start with a specific motivational example before expanding the discussion to more general tasks.

1.1 Syllogisms composition

Reasoning relates to the process of inferring new knowledge by composing efficiently some prior knowledge. A basic notion of reasoning is syllogism composition, e.g., inferring a ⇒ c from a ⇒ b and b ⇒ c. For instance, one may be given a set of implications:

n and without additional background information, one would like to know using logic whether

The goal here is to identify whether a syllogism can be composed[1] by prior ones. Simplifying the input format, the above correspond to identifying paths in token sequences describing the directed edges of an underlying graph, i.e., whether there is a directed path 3 → 5 (case 1) or 4 → 2 (case 2) using the directed edges {(1 → 2),(1 → 3),(4 → 1),(1 → 5)}.

This type of task is nontrivial for current LLMs, and we refer to Appendix G for some further experiments with GPT models.[2] Note that here we are not interested specifically in solving a graph-based task, but rather in understanding when Transformers can compose and more generally how far they can do so. We would like to identify particular measures on the data distribution (e.g., syllogisms topologies in the above example) that capture when Transformers can efficiently learn.

1.2 Hardness of long compositions

Consider the previous syllogism composition task where implications are drawn on a graph with 24 edges drawn randomly over 24 vertices. Picking vertices at distances 1 to 4 for the connected case and picking disconnected vertices uniformly at random lets a Transformer achieve a test accuracy of more than 80% after about 2K iterations. However, does this mean that the model has learned to compose syllogisms, or has it found shortcuts, e.g., based on node degrees, to guess the implications often enough? In Appendix B.1, we provide empirical evidence supporting the latter. Motivated by this issue and to preclude spurious correlations, we consider the following distribution.

Definition 1. For n ≥ 1, consider the binary classification task with equiprobable classes defined by

  1. Class 1: a graph uniformly drawn on 2n vertices with two disjoint cycles of length n and a pair of vertices in disjoint cycles queried for path;

  2. Class 2: a graph uniformly drawn on 2n vertices with one cycle of length 2n and a pair of vertices at distance n queried for path.

The input of this task is the graph edge set with the queried vertices. The label is 0 if the two queried vertices are not connected (Class 1) and 1 if they are (Class 2). We refer to this task as the ‘cycle task’. See Figure 1a for an illustration.

Figure 1b shows that the learning complexity increases ‘exponentially’ as n grows using GPT2-style Transformers of more than 10M, 25M, 85M parameters; e.g., the 10M model fails to learn for n ≥ 7 in 100K iterations. Why is that? Can a larger scale further help here?

Can a large (poly-size) Transformer learn the cycle task when n gets large? If not, why so?

A challenge for the cycle task is that there is no clear ‘low-complexity pattern’ in the input representation that indicates whether there are 1 or 2 cycles. No simple statistics based on degrees, edge counts, or finite motif counts that can tell if the vertices are connected or not. One has to consider at least n edges in order to get any correlation with the presence of a path. In other words, the task requires a ‘global reasoning’ involving a ‘large’ number of input tokens and this seems hard for Transformers.

:::info
This paper is available on arxiv under CC BY 4.0 license.

:::

n

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article LivLive ($LIVE) Takes On Solana as the Best Crypto to Buy Before the Next Big Bull Run LivLive ($LIVE) Takes On Solana as the Best Crypto to Buy Before the Next Big Bull Run
Next Article Elon Musk endorses Andrew Cuomo for NY mayor Elon Musk endorses Andrew Cuomo for NY mayor
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Sustainability in 2026, central discipline of the strategy
Sustainability in 2026, central discipline of the strategy
Mobile
Competitive Benchmarking on Social Media: How to Outpace Your Rivals
Computing
Netflix: 26 of Best Sci-Fi TV Shows To Stream Right Now
Netflix: 26 of Best Sci-Fi TV Shows To Stream Right Now
News
YouTube TV is finally getting the feature Back to the Future II promised
YouTube TV is finally getting the feature Back to the Future II promised
Gadget

You Might also Like

Competitive Benchmarking on Social Media: How to Outpace Your Rivals

1 Min Read
The Code is No Longer the Source of Truth: Why Documentation is the New “Source Code” | HackerNoon
Computing

The Code is No Longer the Source of Truth: Why Documentation is the New “Source Code” | HackerNoon

7 Min Read
North Korean PurpleBravo Campaign Targeted 3,136 IP Addresses via Fake Job Interviews
Computing

North Korean PurpleBravo Campaign Targeted 3,136 IP Addresses via Fake Job Interviews

5 Min Read
PyTorch 2.10 Released With More Improvements For AMD ROCm & Intel GPUs
Computing

PyTorch 2.10 Released With More Improvements For AMD ROCm & Intel GPUs

2 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?