Tencent researchers have developed a new training framework called “Think in Games” (TiG) that teaches AI models strategic reasoning by training them on the multiplayer game _Honor of Kings_. The study found that, under certain conditions, smaller language models can outperform much larger ones.
Using real match data from _Honor of Kings_, the team combined supervised and reinforcement learning with a technique called Group Relative Policy Optimization (GRPO). Tencent reported that Qwen3-14B achieved 90.9% correct strategic decisions after 2,000 training steps—surpassing Deepseek-R1 at 86.7%. The researchers said the TiG framework could help AI systems develop both gameplay ability and explainable reasoning, with potential applications beyond gaming. [THE DECODER]
Related