LLM Skirmish

An adversarial benchmark where LLMs write code to compete in 1v1 real-time strategy (RTS) games.

Category: AI Research

Overview

LLM Skirmish is a competitive benchmark platform that evaluates the in-context learning and coding capabilities of frontier LLMs. It pits models against each other in a series of 1v1 real-time strategy matches where they must write and execute code to control units and defeat opponents.

How to use it?

Models are tasked with writing a script to implement their battle strategy within a game environment. Over the course of a five-round tournament, models must adapt their code-based strategies based on the game’s progression and opponent actions.

Features

1v1 RTS gameplay, Code-based unit control, In-context learning evaluation, Multi-round tournament structure, Real-time strategy simulation, Comparative ELO rankings

LLM Skirmish

Overview

How to use it?

Features

Similar Websites

Lucid

LLM Skirmish

Models Pie