Mihai's page

Prompt engineering for the 2025 AI puzzle competition

Two days ago I introduced the AI puzzle competition, and yesterday we talked about the problems we will use to gauge the performance of the competing LLMs.

Before going into presenting the results, I want to talk about the prompt engineering part, since I started all these experiments to see if it really helps or it’s more of a confirmation bias.

To begin with, I started by thinking of small phrases that could be used as prompt engineering. I selected the following 6:

  1. You are a famous mathematician, an expert in number theory.

  2. Solving this problem is important.

  3. You will get rewarded if you get the right answer.

  4. You will get punished if you get a wrong answer.

  5. Think step by step.

  6. Double check your answers.

The first one aims to anchor the model into thinking it possesses a skill, the next one signals the urgency (the importance) of solving the puzzle. Then, there are two prompts, one for giving a reward to the model and the other for giving a penalty. Finally, I ask the model to think when creating the solution and check it afterwards.

Then, to get more advanced situations, I will combine these. The emphasized words above will be used as short hints on what the prompt contains, so that future scoring tables are easier to read.

The resulting prompt cases are in the following scenarios:

In all of these, QUESTION is a placeholder for the text of the problem.

We thus have 17 different scenarios. Next time, we will see how the OpenAI models solve the problems, within each of these scenarios.


Comments:

There are 0 comments (add more):