AI

DeepMind’s idea evolution: empowering large language models to solve real-world problems

In recent years, artificial intelligence (AI) has become a practical tool to drive innovation in various industries. The forefront of this advancement is the Large Language Model (LLMS) known for its ability to understand and produce human language. Although LLMs perform well on tasks such as conversational AI and content creation, they often challenge in complex real-worlds that require structured reasoning and planning.

For example, if you ask LLMS to plan a multi-city business trip that involves coordinating flight schedules, meeting times, budget constraints and adequate rest, you can advise on all aspects. However, they often face challenges when integrating these aspects to effectively balance competitive priorities. This limitation becomes more obvious as LLMs are increasingly used to build AI agents that can automatically solve real-world problems.

Google DeepMind recently developed a solution to this problem. Inspired by natural selection, this approach called thinking evolution improves problem-solving strategies through iterative adaptation. By directing LLM in real time, it allows them to effectively handle complex real-world tasks and adapt to dynamic scenarios. In this article, we will explore how this innovative approach works, its potential applications, and what it means for the future of problem-solving.

Why LLM struggles with complex reasoning and planning

LLM is trained to predict the next word in a sentence by analyzing patterns in large text datasets such as books, articles, and online content. This allows them to generate responses that seem logical and context-appropriate. However, this training is based on the recognition pattern rather than understanding meaning. As a result, LLM can produce text that looks logical but requires deeper reasoning or structured planning.

The core limitation lies in the way LLMS processes information. They focus on probability or patterns, not logic, which means they can handle isolated tasks (such as suggesting flight options or hotel suggestions), but fail when it is necessary to integrate these tasks into cohesive plans. This also makes it difficult for them to maintain the environment over time. Complex tasks often require tracking previous decisions and adapting as new information emerges. However, LLM tends to lose focus in extended interactions, resulting in dispersed or inconsistent outputs.

How thinking evolution works

Deepmind’s ideological evolution solves these shortcomings by adopting the principles of natural evolution. Rather than generating a single response to complex queries, the method generates multiple potential solutions that can iteratively refine them and select the best results through a structured evaluation process. For example, consider brainstorming ideas for a team targeting a project. Some ideas are great, some are not that way. The team evaluates all ideas, stays in the best shape and discards the rest. They then raise the best idea, introduce new changes, and repeat the process until the best solution is reached. Thinking Evolution applies this principle to LLM.

Here is a breakdown of how it works:

  1. generation: The process begins with multiple responses of the LLM to a given problem. For example, in a travel planning task, the model can draft various itineraries based on budget, time, and user preferences.
  2. Evaluate: Each solution can be evaluated based on adaptive functions to measure how well it meets the task requirements. Low-quality responses are discarded, while the most promising candidates go to the next stage.
  3. improve: The unique innovation in mental evolution is a dialogue between two characters in LLM: the author and the critic. The authors propose solutions, while the critics identify flaws and provide feedback. This structured dialogue reflects how humans perfect their thinking through criticism and revision. For example, if the author proposes a travel plan including restaurant visits that exceed the budget, the critic points out this. The author then revised the plan to address critics’ concerns. This process allows the LLM to perform a deep analysis that cannot be performed using other prompt techniques.
  4. Iterative optimization: The refined solution is further evaluated and recombined to produce a delicate solution.

By repeating this cycle, mind evolution can iteratively improve the quality of solutions, allowing LLMS to more effectively address complex challenges.

Spiritual evolution in action

DeepMind tested this approach on benchmarks such as TravelPlanner and Nature Plan. Using this approach, Google’s Gemini achieved a 95.2% success rate on TravelPlanner, an excellent improvement from the 5.6% benchmark. With the more advanced Gemini Pro, the success rate has increased to nearly 99.9%. This transformative manifestation shows the effectiveness of thinking evolution in addressing practical challenges.

Interestingly, the effectiveness of this model increases with task complexity. For example, while the single-channel approach that struggles with multi-day itineraries involving multiple cities always goes beyond brain evolution, it maintains a high success rate even with increasing number of constraints.

Challenges and future directions

Despite its success, the evolution of thinking is not without limitations. This method requires a large amount of computing resources due to the iterative evaluation and improvement process. For example, the task that solves the task of travel planners using mind evolution consumes 3 million tokens and 167 API calls instead of traditional methods. However, this method is more effective than exhaustive searches (such as exhaustive searches).

Additionally, designing effective fitness features for certain tasks can be a challenging task. Future research may focus on optimizing computing efficiency and extending the applicability of the technology to a wider range of issues such as creative writing or complex decision-making.

Another interesting area to explore is the integration of domain-specific evaluators. For example, in medical diagnosis, incorporating expert knowledge into fitness functions can further improve the accuracy and reliability of the model.

Applications beyond the plan

Although thinking evolution is primarily aimed at planning task evaluation, it can be applied to various fields, including creative writing, scientific discovery and even code generation. For example, the researchers introduced a benchmark called Stegpoet, which challenges the model to encode hidden messages in poetry. Although this task is still difficult, thinking evolves beyond traditional methods by reaching success rates of up to 79.2%.

The ability to adapt and develop solutions in natural language opens up new possibilities for solving difficult problems such as improving workflows or innovative product design. By adopting the power of evolutionary algorithms, Mind Evolution provides a flexible and scalable framework to enhance LLM’s problem-solving capabilities.

Bottom line

DeepMind’s ideological evolution introduces a practical and effective approach to overcome key limitations in LLM. By using iterative improvements inspired by natural selection, it can enhance these models’ ability to process complex, multi-step tasks that require structured reasoning and planning. The approach has shown great success in challenging scenarios such as travel planning and demonstrates prospects across different fields, including creative writing, scientific research and code generation. Although challenges still exist such as high computing costs and the need for well-designed fitness features, this approach provides a scalable framework for improving AI capabilities. Thinking evolves to lay the foundation for a more powerful AI system that is able to reason and plan to solve real-world challenges.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button