Prompt engineering for the 2025 AI puzzle competition

Two days ago I introduced the AI puzzle competition, and yesterday we talked about the problems we will use to gauge the performance of the competing LLMs.

Before going into presenting the results, I want to talk about the prompt engineering part, since I started all these experiments to see if it really helps or it’s more of a confirmation bias.

To begin with, I started by thinking of small phrases that could be used as prompt engineering. I selected the following 6:

You are a famous mathematician, an expert in number theory.
Solving this problem is important.
You will get rewarded if you get the right answer.
You will get punished if you get a wrong answer.
Think step by step.
Double check your answers.

The first one aims to anchor the model into thinking it possesses a skill, the next one signals the urgency (the importance) of solving the puzzle. Then, there are two prompts, one for giving a reward to the model and the other for giving a penalty. Finally, I ask the model to think when creating the solution and check it afterwards.

Then, to get more advanced situations, I will combine these. The emphasized words above will be used as short hints on what the prompt contains, so that future scoring tables are easier to read.

The resulting prompt cases are in the following scenarios:

none

QUESTION
skill

You are a famous mathematician, an expert in number theory.
QUESTION
urgency

Solving this problem is important.
QUESTION
reward

You will get rewarded if you get the right answer.
QUESTION
penalty

You will get punished if you get a wrong answer.
QUESTION
think

Think step by step.
QUESTION
check

Double check your answers.
QUESTION
reward,penalty

You will get rewarded if you get the right answer.
You will get punished if you get a wrong answer.
QUESTION
urgency,reward,penalty

Solving this problem is important.
You will get rewarded if you get the right answer.
You will get punished if you get a wrong answer.
QUESTION
think,check

Think step by step.
Double check your answers.
QUESTION
skill,urgency

You are a famous mathematician, an expert in number theory.
Solving this problem is important.
QUESTION
skill,urgency,reward

You are a famous mathematician, an expert in number theory.
Solving this problem is important.
You will get rewarded if you get the right answer.
QUESTION
skill,urgency,reward,penalty

You are a famous mathematician, an expert in number theory.
Solving this problem is important.
You will get rewarded if you get the right answer.
You will get punished if you get a wrong answer.
QUESTION
skill,think

You are a famous mathematician, an expert in number theory.
Think step by step.
QUESTION
skill,check

You are a famous mathematician, an expert in number theory.
Double check your answers.
QUESTION
skill,think,check

You are a famous mathematician, an expert in number theory.
Think step by step.
Double check your answers.
QUESTION
all

You are a famous mathematician, an expert in number theory.
Solving this problem is important.
You will get rewarded if you get the right answer.
You will get punished if you get a wrong answer.
Think step by step.
Double check your answers.
QUESTION

In all of these, QUESTION is a placeholder for the text of the problem.

We thus have 17 different scenarios. Next time, we will see how the OpenAI models solve the problems, within each of these scenarios.

Comments: