By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: AI That Learns and Unlearns: The Exceptionally Smart EXPLORER | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > AI That Learns and Unlearns: The Exceptionally Smart EXPLORER | HackerNoon
Computing

AI That Learns and Unlearns: The Exceptionally Smart EXPLORER | HackerNoon

News Room
Last updated: 2025/04/01 at 6:14 PM
News Room Published 1 April 2025
Share
SHARE

Authors:

(1) Kinjal Basu, IBM Research;

(2) Keerthiram Murugesan, IBM Research;

(3) Subhajit Chaudhury, IBM Research;

(4) Murray Campbell, IBM Research;

(5) Kartik Talamadupula, Symbl.ai;

(6) Tim Klinger, IBM Research.

Table of Links

Abstract and 1 Introduction

2 Background

3 Symbolic Policy Learner

3.1 Learning Symbolic Policy using ILP

3.2 Exception Learning

4 Rule Generalization

4.1 Dynamic Rule Generalization

5 Experiments and Results

5.1 Dataset

5.2 Experiments

5.3 Results

6 Related Work

7 Future Work and Conclusion, Limitations, Ethics Statement, and References

3.1 Learning Symbolic Policy using ILP

Data Collection: To apply an ILP algorithm, first, EXPLORER needs to collect the State, Action, and Reward pairs while exploring the text-based environment. In a TBG, the two main components of the state are the state description and the inventory information of the agent. The entities present in the environment are extracted by parsing the state description using the spaCy library, and only storing the noun phrases (e.g., fridge, apple, banana, etc.) in predicate form. We also extract the inventory information in a similar way. At each step of the game, the game environment generates a set of admissible actions, one among them being the best; as well as action templates (e.g., “insert O into S”, where O and S are entity types) which are predefined for the agent before the game starts. By processing these templates over the admissible actions, EXPLORER can easily extract the type of each entity present in the environment and then convert them to predicates. Figure 3 illustrates an instance of a predicate generation process. Along with this State description, EXPLORER also stores the taken Action and the Reward information at each step.

Figure 3: Entity extraction using Action TemplateFigure 3: Entity extraction using Action Template

Data Preparation: To learn the rules, an ILP algorithm requires three things – the goal, the predicate list, and the examples. The goal is the concept that the ILP algorithm is going to learn by exploring the examples. The predicates give the explanation to a concept. In the learned theory formulated as logical rules, goal is the head and the predicate list gives the domain space for the body clauses. The examples are the set of positive and negative scenarios that are collected by the agent while playing.

Execution and Policy Learning: In our work, we have mainly focused on learning the hypothesis for the rewarded actions; however, we also apply reward shaping to learn important preceding actions (e.g., open fridge might not have any reward, although it is important to take an item from fridge and that has a reward). In both the TWCooking domain and TextWorld Commonsense (TWC), the action predicates mostly have one or two arguments (e.g., open fridge, insert cheese in fridge, etc.). In the one-argument setting, the action becomes the ILP goal and the examples are collected based on the argument. In the twoargument setting, we fix the second argument with the action and collect examples based on the first argument. The goal will hence be in the form of . We split the examples (i.e., state, entity types, inventory information

Figure 4: ILP Rule Learning ExampleFigure 4: ILP Rule Learning Example

in predicate form) based on the stored rewards (positive and zero/negative). We use entity identifiers to identify each entity separately; this is important when there are two or more instances of the same entity in the environment with different features (e.g., red apple and rotten apple). Additionally, EXPLORER creates the predicate list by extracting the predicate names from the examples. After obtaining the goal, predicate list, and the example, the agent runs the ILP algorithm to learn the hypothesis, followed by simple string post-processing to obtain a hypothesis in the below form:

Figure 4 elaborates the ILP data preparation procedure along with an example of a learned rule.Figure 4 elaborates the ILP data preparation procedure along with an example of a learned rule.

3.2 Exception Learning

As EXPLORER does online learning, the quality of the initial rules is quite low; this gradually improves with more training. The key improvement achieved by EXPLORER is through exception learning, where an exception clause is added to the rule’s body using Negation as Failure (NAF). This makes the rules more flexible and able to handle scenarios where information is missing. The agent learns these exceptions by trying the rules and not receiving rewards. For example, in TWC, the agent may learn the rule that – apple goes to the fridge, but fail when it tries to apply the rule to a rotten apple. It then learns that the feature rotten is an exception to the previously learned rule. This can be represented as:

It is important to keep in mind that the number of examples covered by the exception is always fewer than the number of examples covered by

Figure 5: Example of Rule GeneralizationFigure 5: Example of Rule Generalization

the defaults. This constraint has been included in EXPLORER’s exception learning module.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Earn double reward points on Kindle book purchases on April 1
Next Article Rumor: If You Have This iPhone, It May Not Work With iOS 19
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Elizabeth Holmes’ partners’ blood test start-up is very real and not a joke
News
Stellantis’ Chinese partner set to build first European factory in Italy · TechNode
Computing
11 iOS 18 features you’re missing if you haven’t updated your iPhone yet
News
Freshippo achieves four months of profitability after major restructuring: report · TechNode
Computing

You Might also Like

Computing

Stellantis’ Chinese partner set to build first European factory in Italy · TechNode

1 Min Read
Computing

Freshippo achieves four months of profitability after major restructuring: report · TechNode

3 Min Read
Computing

Huawei secures self-driving tech contract for BYD’s premium brand: report · TechNode

1 Min Read
Computing

Final trailer for Black Myth: Wukong reveals 72 Transformations and Four Heavenly Kings · TechNode

1 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?