By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Beyond Seen Worlds: EXPLORER’s Journey into Generalized Reasoning | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Beyond Seen Worlds: EXPLORER’s Journey into Generalized Reasoning | HackerNoon
Computing

Beyond Seen Worlds: EXPLORER’s Journey into Generalized Reasoning | HackerNoon

News Room
Last updated: 2025/04/01 at 4:58 PM
News Room Published 1 April 2025
Share
SHARE

Authors:

(1) Kinjal Basu, IBM Research;

(2) Keerthiram Murugesan, IBM Research;

(3) Subhajit Chaudhury, IBM Research;

(4) Murray Campbell, IBM Research;

(5) Kartik Talamadupula, Symbl.ai;

(6) Tim Klinger, IBM Research.

Table of Links

Abstract and 1 Introduction

2 Background

3 Symbolic Policy Learner

3.1 Learning Symbolic Policy using ILP

3.2 Exception Learning

4 Rule Generalization

4.1 Dynamic Rule Generalization

5 Experiments and Results

5.1 Dataset

5.2 Experiments

5.3 Results

6 Related Work

7 Future Work and Conclusion, Limitations, Ethics Statement, and References

4 Rule Generalization

Importance of Rule Generalization: An ideal RL agent should not only perform well on entities it has seen but also on unseen entities or out-of-distribution (OOD) data. To accomplish this, policy generalization is a crucial feature that an ideal RL agent should have. To verify this, we used EXPLORER without generalization on the TW-Cooking domain, where it performs well, however, it struggles on the TWC games. TWC games are designed to test agents on OOD entities that were not seen during training but are similar to the training data. As a result, the policies learned as logic rules will not work on unseen objects.

For example, the rule for apple (e.g., insert(X, fridge) <- apple(X). ) cannot work on another fruit such as orange. To tackle this, we lift the learned policies using WordNet’s (Miller, 1995) hypernym-hyponym relations to get the generalized rules (illustration in Figure 5). Motivation comes from the way humans perform tasks. For example, if we know a dirty shirt goes to the washing machine and we have seen a dirty pant, we would put the dirty pant into the washing machine as both are of type clothes and dirty.

Excessive Generalization is Bad: On one hand, generalization results in better policies to work with unseen entities; however, too much generalization leads to a drastic increment in false-positive results. To keep the balance, EXPLORER should know how much generalization is good. For an example, “apple is a fruit”, “fruits are part of a plant”, and “plants are living thing”. Now, if we apply the same rule that explains a property of an apple to all living things, the generalization will have gone too far. So, to solve this, we have proposed a novel approach described in Section – 4.1.

4.1 Dynamic Rule Generalization

In this paper, we introduce a novel algorithm to dynamically generate the generalized rules exploring the hypernym relations from WordNet (WN). The algorithm is based on information gain calculated using the entropy of the positive and negative set of examples (collected by EXPLORER). The illustration of the process is given in the Algorithm 1. The algorithm takes the collected set of examples and returns the generalized rules set. First, similar to the ILP data preparation procedure, the goals are extracted from the examples. For each goal, examples are split into two sets – E+ and E−. Next, the hypernyms are extracted using the hypernym-hyponym relations of the WordNet ontology. The combined set of hypernyms from (E+, E−) gives the body predicates for the generalized rules. Similar to the ILP (discussed above) the goal will be the head of a generalized rule. Next, the best-generalized rules are generated by calculating the max information gain between the hypernyms. Information gain for a given clause is calculated using the below formula (Mitchell, 1997) —

where h is the candidate hypernym predicate to add to the rule R, p0 is the number of positive examples implied by the rule R, n0 is the number of negative examples implied by the rule R, p1 is the number of positive examples implied by the rule R + h, n1 is the number of negative examples implied by the rule R + h, total is the number of positive examples implied by R also covered by R + h. Finally, it collects all the generalized rules set and returns. It is important to mention that this algorithm only learns the generalized rules which

Table 1: TWC performance comparison results for within distribution (IN) and out-of-distribution (OUT) gamesTable 1: TWC performance comparison results for within distribution (IN) and out-of-distribution (OUT) games

are used in addition to the rules learned by ILP and exception learning (discussed in section 3).

5.1 Dataset

In our work, we want to show that if an RL agent uses symbolic and neural reasoning in tandem, where the neural module is mainly responsible for exploration and the symbolic component for exploitation, then the performance of that agent increases drastically in text-based games. At first, we verify our approach with TW-Cooking domain (Adhikari et al., 2020a), where we have used levels 1-4 from the GATA dataset[3] for testing. As the name suggests, this game suit is about collecting various cooking ingredients and preparing a meal following an in-game recipe.

To showcase the importance of generalization, we have tested our EXPLORER agent on TWC games with OOD data. Here, the goal is to tidy up the house by putting objects in their commonsense locations. With the help of TWC framework (Murugesan et al., 2021a), we have generated a set of games with 3 different difficulty levels – (i) easy level: that contains 1 room with 1 to 3 objects; (ii) medium level: that contains 1 or 2 rooms with 4 or 5 objects; and (iii) hard level: a mix of games with a high number of objects (6 or 7 objects in 1 or 2 rooms) or a high number of rooms (3 or 4 rooms containing 4 or 5 objects).

We chose TW-Cooking and TWC games as our test-bed because these are benchmark datasets for evaluating neuro-symbolic agents in text-based games (Chaudhury et al., 2021, 2023; Wang et al., 2022; Kimura et al., 2021; Basu et al., 2022a). Also, these environments require the agents to exhibit skills such as exploration, planning, reasoning, and OOD generalization, which makes them ideal environments to evaluate EXPLORER.

5.2 Experiments

To explain EXPLORER works better than a neuralonly agent, we have selected two neural baseline models for each of our datasets (TWC and TWCooking) and compared them with EXPLORER. In our evaluation, for both the datasets, we have used LSTM-A2C (Narasimhan et al., 2015) as the Text-Only agent, which uses the encoded history of observation to select the best action. For TWCooking, we have compared EXPLORER with the SOTA model on the TW-Cooking domain – Graph Aided Transformer Agent (GATA) (Adhikari et al., 2020a). Also, we have done a comparative study of neuro-symbolic models on TWC (section 5.3) with SOTA neuro-symbolic model CBR (Atzeni et al., 2022), where we have used SOTA neural model BiKE (Murugesan et al., 2021b) as the neural module in both EXPLORER and CBR.

We have tested with four neuro-symbolic settings of EXPLORER, where one without generalization – EXPLORER-w/o-GEN and the other three uses EXPLORER with different settings of generalization. Below are the details of different generalization settings in EXPLORER:

Exhaustive Rule Generalization: This setting lifts the rules exhaustively with all the hypernyms up to WordNet level 3 from an object or in other words select those hypernyms of an object whose path-distance with the object is ≤ 3.

IG-based generalization (hypernym Level 2/3): Here, EXPLORER uses the rule generalization algorithm (algorithm 1). It takes WordNet hypernyms up to level 2 or 3 from an object.

For both datasets in all the settings, agents are trained using 100 episodes with 50 steps maximum. On TW-Cooking domain, it is worth mentioning that while we have done the pre-training tasks (such as graph encoder, graph updater, action scorer, etc) for GATA as in (Adhikari et al., 2020a), both text-only agent and EXPLORER do not have any pretraining advantage to boost the performance.

Table 2: TW-Cooking domain — Comparison Results (with Mean and SD)Table 2: TW-Cooking domain — Comparison Results (with Mean and SD)

[3] https://github.com/xingdi-eric-yuan/GATA-public

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article New bill requires IT firms to bolster safeguards amid rising cyber threats
Next Article Today's NYT Connections: Sports Edition Hints, Answers for April 2 #191
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

UK broadband hits 2025 target with strong first quarter | Computer Weekly
News
Samsung Galaxy Ring deal: Get a $100 gift card with your purchase!
News
Paycom Software, Inc. Beat win expectations and analysts now have new predictions
News
has also caused the anger of neighbors, according to NYT
Mobile

You Might also Like

Computing

GNOME Showtime Accepted As Video Player App For GNOME 49

0 Min Read
Computing

The HackerNoon Newsletter: If Youre an Amazon Ring Owner, You May Be an Accidental Spy (5/9/2025) | HackerNoon

2 Min Read

New Purpose-Built Blockchain T-Rex Raises $17 Million to Transform Attention Layer In Web3 | HackerNoon

8 Min Read
Computing

Ninja Deep Research: The AI Agent Everyone Can Actually Start Using Now | HackerNoon

10 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?