By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Markov Chains, Rewards & Rules | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Markov Chains, Rewards & Rules | HackerNoon
Computing

Markov Chains, Rewards & Rules | HackerNoon

News Room
Last updated: 2025/09/24 at 7:26 AM
News Room Published 24 September 2025
Share
SHARE

Table of Links

Abstract and 1. Introduction and Related Work

  1. Methodology

    2.1 LLM-Sim Task

    2.2 Data

    2.3 Evaluation

  2. Experiments

  3. Results

  4. Conclusion

  5. Limitations and Ethical Concerns, Acknowledgements, and References

A. Model details

B. Game transition examples

C. Game rules generation

D. Prompts

E. GPT-3.5 results

F. Histograms

2 Methodology

We examine the abilities of LLMs to serve as world simulators in text-based virtual environments, in which an agent receives observations and proposes actions in natural language in order to complete certain objectives. Each text environment can be formally represented as a goal-conditioned partially observable Markov decision process (POMDP) (Kaelbling et al., 1998) with the 7-tuple (S, A, T , O, R, C, D), where S denotes the state space, A denotes the action space, T : S × A → S denotes the transition function, O denotes the observation function, R : S × A → R denotes the reward function, C denotes a natural language “context message” that describes the goal and action semantics, and D : S × A → {0, 1} denotes the binary completion indicator function.

2.1 LLM-Sim Task

In practice, the whole state transition simulator F should consider two types of state transitions: action-driven transitions and environment-driven transitions. For the example in Figure 1, the action-driven transition is that the sink is turned on (isOn=true) after taking the action turn on sink, and the environment-driven transition is that water fills up the cup in the sink when the sink is on. To better understand LLM’s ability to model each of these transitions, we further decompose the simulator function F into three steps:

2.2 Data

Additional Context: Each game also includes a context message, c, that provides additional information to the model. The context consists of four parts: action rules describing the effect of each action on the game state, object rules describing the meaning of each object property and whether they are affected by the game’s underlying dynamics, scoring rules describing how an agent earns reward and the conditions under which the game is won or lost, and one or two example transitions (see Appendix B for details) from the held-out game mentioned above. For each game we generate three

Table 3: GPT-4 game progress prediction results

versions of the context, one where the rules are written by a human expert (one of the game authors), and one where they are produced by an LLM with access to the game code, and one where no rules are provided. See Appendix C for additional details.

2.3 Evaluation

Performance on LLM-Sim is determined by the model’s prediction accuracy w.r.t. the ground truth labels over a dataset of test samples. Depending on the experimental condition, the LLM must model object properties (when simulating Fact, Fenv, or F) and / or game progress (when simulating FR or F), defined as:

Object Properties: a list of all objects in the game, along with each object’s properties (e.g., temperature, size) and relationships to other objects (e.g., being within or on top of another object).

Game Progress: the status of the agent w.r.t. the overall goal, consisting of the current accumulated reward, whether the game has terminated, and whether the overall goal has been achieved.

:::info
Authors:

(1) Ruoyao Wang, University of Arizona ([email protected]);

(2) Graham Todd, New York University ([email protected]);

(3) Ziang Xiao, Johns Hopkins University ([email protected]);

(4) Xingdi Yuan, Microsoft Research Montréal ([email protected]);

(5) Marc-Alexandre Côté, Microsoft Research Montréal ([email protected]);

(6) Peter Clark, Allen Institute for AI ([email protected]).;

(7) Peter Jansen, University of Arizona and Allen Institute for AI ([email protected]).

:::


:::info
This paper is available on arxiv under CC BY 4.0 license.

:::

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Coke Designed a Plastic Bottle to Sell Us All More Soda
Next Article Europe has a Russian drone problem. Here are ways it could be solved
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

How to Use Facebook Groups to Grow Your Business – Blog
Computing
You can search your PC in an instant with this tiny app
News
Nikon Coolpix P1100 Review
Gadget
In Uganda, a vibrant tourist economy boosts the conservation of endangered mountain gorillas
News

You Might also Like

Computing

How to Use Facebook Groups to Grow Your Business – Blog

18 Min Read
Computing

How to schedule posts on Facebook Groups (2025 guide)

12 Min Read
Computing

Etor Uncovers License Violations, Plagiarism, and More in Open-Source Projects | HackerNoon

22 Min Read
Computing

New YiBackdoor Malware Shares Major Code Overlaps with IcedID and Latrodectus

4 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?