A New Google AI Research Proposes Deep-Thinking Ratio to Improve LLM Accuracy While Cutting Total Inference Costs by Half
For the last few years, the AI world has followed a simple rule: if you want a Large Language Model (LLM) to solve a harder problem, make its Chain-of-Thought (CoT) longer. But new research from the University of Virginia and Google proves that ‘thinking long’ is not the same as ‘thinking hard’. The research team reveals that simply adding more tokens to a response can actually make an AI less accurate. Instead of counting words, the Google researchers introduce a new measurement: the Deep-Thinking Ratio (DTR) . https://ift.tt/ymIMfiU The Failure of ‘Token Maxing ‘ Engineers often use token count as a proxy for the effort an AI puts into a task. However, the researchers found that raw token count has an average correlation of r= -0.59 with accuracy. This negative number means that as the model generates more text, it is more likely to be wrong. This happens because of ‘overthinking,’ where the model gets stuck in loops, repeats redundant steps, or amplifies its own mistak...
