Giving RNNs extra thinking time boosts their planning skills in Sokoban, helping them solve harder puzzles by pacing to buy more computation time. By training linear probes, we can predict and modify the agent’s internal plans, while model surgery allows the network to generalize to larger levels. Additionally, interpretability techniques like action channels and sparse autoencoders reveal how the RNN organizes its decision-making.