By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: A New Neural Memory Trick Helps AI Handle Much Longer Sequences | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > A New Neural Memory Trick Helps AI Handle Much Longer Sequences | HackerNoon
Computing

A New Neural Memory Trick Helps AI Handle Much Longer Sequences | HackerNoon

News Room
Last updated: 2025/04/01 at 2:10 PM
News Room Published 1 April 2025
Share
SHARE

Authors:

(1) Hung Le, Applied AI Institute, Deakin University, Geelong, Australia;

(2) Dung Nguyen, Applied AI Institute, Deakin University, Geelong, Australia;

(3) Kien Do, Applied AI Institute, Deakin University, Geelong, Australia;

(4) Svetha Venkatesh, Applied AI Institute, Deakin University, Geelong, Australia;

(5) Truyen Tran, Applied AI Institute, Deakin University, Geelong, Australia.

Table of Links

Abstract & Introduction

Methods

Methods Part 2

Experimental Results

Experimental Results Part 2

Related Works, Discussion, & References

Appendix A, B, & C

Appendix D

Abstract

We propose Pointer-Augmented Neural Memory (PANM) to help neural networks understand and apply symbol processing to new, longer sequences of data. PANM integrates an external neural memory that uses novel physical addresses and pointer manipulation techniques to mimic human and computer symbol processing abilities. PANM facilitates pointer assignment, dereference, and arithmetic by explicitly using physical pointers to access memory content. Remarkably, it can learn to perform these operations through end-to-end training on sequence data, powering various sequential models. Our experiments demonstrate PANM’s exceptional length extrapolating capabilities and improved performance in tasks that require symbol processing, such as algorithmic reasoning and Dyck language recognition. PANM helps Transformer achieve up to 100% generalization accuracy in compositional learning tasks and significantly better results in mathematical reasoning, question answering and machine translation tasks.

1. Introduction

Systematic generalization underpins intelligence, and it relies on the ability to recognize abstract rules, extrapolating them to novel contexts that are distinct yet semantically similar to the seen data. Current neural networks or statistical machine learning fall short of handling novel data generated by symbolic rules even though they have achieved state-of-the-art results in various domains. Some approaches can show decent generalization for single or set input data [Bahdanau et al., 2018, Gao et al., 2020, Webb et al., 2020]. Yet, neural networks in general still fail in sequential symbol processing tasks, even with slight novelty during inference [Lake and Baroni, 2018, Del´etang et al., 2022]. For instance, these models can easily learn to duplicate sequences of 10 items, but they will fail to copy sequences of 20 items if they were not part of the training data. These models overfit the training data and perform poorly on out-of-distribution samples such as sequences of greater length or sequences with novel compositions. The issue also affects big models like Large Language Models, making them struggle with symbolic manipulation tasks [Qian et al., 2023]. This indicates that current methods lack a principled mechanism for systematic generalization.

From a neuroscience perspective, it has been suggested that the brain can execute symbol processing through variable binding and neural pointers, wherein the sensory data are conceptualized into symbols that can be assigned arbitrary values [Kriete et al., 2013]. Like the brain, computer programs excel at symbolic computations. Programmers use address pointers to dynamically access data or programs, and have flexible control over the variable. Their programs can work appropriately with unseen inputs.

Building on these insights, we propose a pointer-based mechanism to enhance generalization to unseen length in sequence prediction, which is a crucial problem that unifies all computable problems [Solomonoff, 2010]. Our mechanism is based on two principles: (I) explicitly modeling pointers as physical addresses, and (II) strictly isolating pointer manipulation from input data. As such, we need

to design a memory that supports physical pointers, and create a model that manipulates the pointers to perform abstract rules and access to the memory. Our memory, dubbed Pointer-Augmented Neural Memory (PANM), is slot-based RAM [Von Neumann, 1993] where each memory slot consists of two components: data and address. Unlike initial endeavors that implicitly model pointers as attention softmax [Vinyals et al., 2015, Kurach et al., 2015, Le et al., 2018, Khan et al., 2021], our addresses are generated to explicitly simulate physical memory addresses, i.e., incremental binary numbers, which is critical for generalization to longer sequences.

To manipulate a pointer, we create an address bank that contains physical addresses corresponding to the input sequence, and use a neural network called Pointer Unit that is responsible for transforming pointers from an initial address in the address bank. Through attention to the address bank, a new pointer is generated as a mixture of the physical addresses, which can point to different memory slots to follow the logic of the task. We aim to let the Pointer Unit learn the symbolic rules of the task in an end-to-end manner. Finally, given a (manipulated) pointer, the model can access the data through 2 modes of pointer-based access: pointer dereference (Mode-1) and relational access (Mode-2). Our memory can be plugged into common encoder-decoder backbones such as LSTM or Transformer.

Our contribution is a novel memory architecture that incorporates explicit pointer and symbol processing, working seamlessly with sequential models to generalize better. We examine our model in symbol-processing domains such as algorithms and context-free grammar where PANM effectively works with LSTM and StackRNN. We apply PANM to improve the generalization of Transformer models on compositional learning, using SCAN and mathematics datasets. Also, we observe PANM’s superior performance in more realistic question answering and machine translation tasks. Our focus is not on striving for state-of-the-art results requiring specialized designs tailored to specific tasks. Our objective is to highlight the generalization improvement achieved by integrating our memory module into fundamental sequential models, with minimal architectural changes, and showcase the importance of using fundamental generalizing principles to address limitations of current deep learning.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Microsoft Office 2024 is the last productivity suite you’ll ever need — get it for life for A$206
Next Article This incredible notifications feature is coming to your iPhone in iOS 18.4 | Stuff
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

The government invests 37 million in two Spanish technology: Sparc and Quantix
Mobile
XWayland 24.1.8 & X.Org Server 21.1.18 Further Address Yesterday’s Security Disclosures
Computing
Remarkable new AI can tell your age by looking at your eyes
News
This Australian moth uses the stars as a compass to travel hundreds of miles
News

You Might also Like

Computing

XWayland 24.1.8 & X.Org Server 21.1.18 Further Address Yesterday’s Security Disclosures

1 Min Read
Computing

Alibaba yields “good results” from three-year inspection, says regulator · TechNode

1 Min Read
Computing

Uber Eats now lets South Africans order from their seats at events

4 Min Read
Computing

What Is Google Ads Performance Max & How Does It Work? | WordStream

13 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?