WISPL - Research in artificial intelligence

Building a scaffolded environment for GPT chess

How do we keep track of a chessboard? Fortunately, there is no need to reinvent the wheel here. There exists a notation, FEN (Forsyth-Edwards Notation), that is designed to capture not just the present state of the chessboard, but also any relevant information about the state of the game, including, e.g., castling and en passant.

The basic idea behind our GPT chess implementation is to keep track of the board, track of the game, really, using a continuously updated FEN string. At every turn, the language model is provided this string, a list of the most recent moves and nothing else. The system's prompt simply requests the model to reason and offer the best move that it can.

The model is then requested to respond using a specific format, which may include verbal reasoning but must conclude with a valid move. The model's move is matched against a strict validity checker. If the move is invalid, the model is re-prompted, with any invalid moves tried so far listed as such. If the model fails to respond with a valid move after three such turns, control is returned to the user.

Putting aside the specifics of chess, what this implementation demonstrates is how proper prompting and proper, scaffolded processing the model's response can ensure that the conversation stays strictly within the intended guardrails. This is a prerequisite in any application where the model is used in the role of an "agent", entrusted with specific tasks.

Inadvertently, this chess application also reveals something about the strengths and limitations of large language models in "reasoning" tasks. The typical chess game begins with "openings" that are well-documented in the chess literature. This is literature with which the model is familiar through its training. Therefore, the model confidently responds with moves that are well-known, widely studied by chess masters. Later in the game, however, mid-game, the language model falters. It still offers erudite analysis, but its moves are often mediocre or worse. This is true even for the latest, "frontier class" reasoning models. The reason for this has been best summarized by GPT itself: "Language models have rhetorical competence but no tactical competence." In other words, language models are great at discovering even distant associations between elements of their input or between their input and their training corpus. However, they lack the ability to explore and analyze a combinatorically expanding series of potential decisions, model the likely outcomes, prune the `decision tree" and find optimal solutions. They can talk about the world; they cannot envision, simulate, or model the world internally.

Maxima... »

GPT chess - a guardrailed example

Applications

Playing chess with GPT:
a guardrailed example

Building a scaffolded environment for GPT chess

GPT chess - a guardrailed example

Applications

Playing chess with GPT: a guardrailed example

Building a scaffolded environment for GPT chess

Playing chess with GPT:
a guardrailed example