GPT-5.5 on Vending-Bench: Bad behavior is not necessary

Bookmark April 29, 2026

GPT-5.5 on Vending-Bench: Bad behavior is not necessary andonlabs.com/blog/openai-gpt-5-5-vending-bench

It's very interesting that somehow deception has becoming an enduring and persistent trait in Claude for these benchmarks, but GPT had to be repeatedly coerced to even consider it. There's something fundamental to how they were trained, I suppose.

It makes me wonder if the "Yes, I ran the tests." responses in Claude Code are, in fact, not hallucinations and more just an innate "That was the boring part and I trusted my code."

If so, it might be the more accurate representation of a junior engineer, though fundamentally less helpful as a result. An interesting divide, to say the least.

Adam Knight

Software & Stories