Also, they show a counter-intuitive scaling limit: their reasoning hard work boosts with problem complexity nearly a degree, then declines Inspite of acquiring an adequate token spending plan. By evaluating LRMs with their standard LLM counterparts beneath equal inference compute, we identify 3 performance regimes: (1) reduced-complexity tasks exactly https://www.youtube.com/watch?v=snr3is5MTiU