Apparently, LLMs are really bad at playing chess
Not all LLMs are equal: GPT-3.5-turbo-instruct stands out as the most capable chess-playing model tested.
Fine-tuning is crucial: Instruction tuning and targeted dataset exposure dramatically enhance...