By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: How Reinforcement Learning and Stable Diffusion Are Being Combined to Simulate Game Worlds | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > How Reinforcement Learning and Stable Diffusion Are Being Combined to Simulate Game Worlds | HackerNoon
Computing

How Reinforcement Learning and Stable Diffusion Are Being Combined to Simulate Game Worlds | HackerNoon

News Room
Last updated: 2026/01/28 at 1:35 PM
News Room Published 28 January 2026
Share
How Reinforcement Learning and Stable Diffusion Are Being Combined to Simulate Game Worlds | HackerNoon
SHARE

Table of Links

ABSTRACT

1 INTRODUCTION

2 INTERACTIVE WORLD SIMULATION

3 GAMENGEN

3.1 DATA COLLECTION VIA AGENT PLAY

3.2 TRAINING THE GENERATIVE DIFFUSION MODEL

4 EXPERIMENTAL SETUP

4.1 AGENT TRAINING

4.2 GENERATIVE MODEL TRAINING

5 RESULTS

5.1 SIMULATION QUALITY

5.2 ABLATIONS

6 RELATED WORK

7 DISCUSSION, ACKNOWLEDGEMENTS AND REFERENCES

4 EXPERIMENTAL SETUP

4.1 AGENT TRAINING

The agent model is trained using PPO (Schulman et al., 2017), with a simple CNN as the feature network, following Mnih et al. (2015). It is trained on CPU using the Stable Baselines 3 infrastructure (Raffin et al., 2021). The agent is provided with downscaled versions of the frame images and in-game map, each at resolution 160×120. The agent also has access to the last 32 actions it performed. The feature network computes a representation of size 512 for each image. PPO’s actor and critic are 2-layer MLP heads on top of a concatenation of the outputs of the image feature network and the sequence of past actions. We train the agent to play the game using the Vizdoom environment (Wydmuch et al., 2019). We run 8 games in parallel, each with a replay buffer size of 512, a discount factor γ = 0.99, and an entropy coefficient of 0.1. In each iteration, the network is trained using a batch size of 64 for 10 epochs, with a learning rate of 1e-4. We perform a total of 10M environment steps.

4.2 GENERATIVE MODEL TRAINING

We train all simulation models from a pretrained checkpoint of Stable Diffusion 1.4, unfreezing all U-Net parameters. We use a batch size of 128 and a constant learning rate of 2e-5, with the Adafactor optimizer without weight decay (Shazeer & Stern, 2018) and gradient clipping of 1.0. We change the diffusion loss parameterization to be v-prediction (Salimans & Ho (2022a). The context frames condition is dropped with probability 0.1 to allow CFG during inference. We train using 128 TPU-v5e devices with data parallelization. Unless noted otherwise, all results in the paper are after 700,000 training steps. For noise augmentation (Section 3.2.1), we use a maximal noise level of 0.7, with 10 embedding buckets. We use a batch size of 2,048 for optimizing the latent decoder, other training parameters are identical to those of the denoiser. For training data, we use all trajectories played by the agent during RL training as well as evaluation data during training, unless mentioned otherwise. Overall we generate 900M frames for training. All image frames (during training, inference, and conditioning) are at a resolution of 320×240 padded to 320×256. We use a context length of 64 (i.e. the model is provided its own last 64 predictions as well as the last 64 actions).

:::info
Authors:

  1. Dani Valevski
  2. Yaniv Leviathan
  3. Moab Arar
  4. Shlomi Fruchter

:::

:::info
This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Coinbase adverts banned in UK over ‘irresponsible’ messaging – UKTN Coinbase adverts banned in UK over ‘irresponsible’ messaging – UKTN
Next Article Winter Sale: Get 95% Off an All-Access Pass to iPhone Photography School Winter Sale: Get 95% Off an All-Access Pass to iPhone Photography School
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Today's NYT Connections Hints, Answers for Jan. 29 #963
Today's NYT Connections Hints, Answers for Jan. 29 #963
News
Windows 11 and Android are about to mimic Apple’s cool Continuity feature
Windows 11 and Android are about to mimic Apple’s cool Continuity feature
Gadget
SpaceX is coming to the public markets, and secondaries are already on fire |  News
SpaceX is coming to the public markets, and secondaries are already on fire | News
News
Tsinghua University launches AI-driven hospital to train next-gen doctors · TechNode
Tsinghua University launches AI-driven hospital to train next-gen doctors · TechNode
Computing

You Might also Like

Tsinghua University launches AI-driven hospital to train next-gen doctors · TechNode
Computing

Tsinghua University launches AI-driven hospital to train next-gen doctors · TechNode

1 Min Read

How to Plan 30 Days of Content in Under an Hour (Using Data)

3 Min Read
How to Grow Your Reach and Authority as a Writer  | HackerNoon
Computing

How to Grow Your Reach and Authority as a Writer | HackerNoon

7 Min Read
Seattle’s data privacy chief falls victim to her own identity theft, and shares tips for how to recover
Computing

Seattle’s data privacy chief falls victim to her own identity theft, and shares tips for how to recover

6 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?