LLM Skirmish
An adversarial benchmark where LLMs write code to compete in 1v1 real-time strategy (RTS) games.
Category: AI Research

Overview
LLM Skirmish is a competitive benchmark platform that evaluates the in-context learning and coding capabilities of frontier LLMs. It pits models against each other in a series of 1v1 real-time strategy matches where they must write and execute code to control units and defeat opponents.
How to use it?
Models are tasked with writing a script to implement their battle strategy within a game environment. Over the course of a five-round tournament, models must adapt their code-based strategies based on the game’s progression and opponent actions.
Features
1v1 RTS gameplay, Code-based unit control, In-context learning evaluation, Multi-round tournament structure, Real-time strategy simulation, Comparative ELO rankings
