Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
When large language models (LLMs) emerged, enterprises quickly brought them into their workflows. They developed LLMs applications using Retrieval-Augmented Generation (RAG), a technique that tapped internal datasets to ensure models provide answers with relevant business context and reduced hallucinations. The approach worked like a charm, leading to the rise of functional chatbots and search products that helped users instantly find the information they needed, be it a specific clause in a policy or questions about an ongoing project.
However, even as RAG continues to thrive across multiple domains, enterprises have run into instances where it fails to deliver the expected results. This is the case of agentic RAG, where a series of AI agents enhance the RAG pipeline. It is still new and can run into occasional issues but it promises to be a game-changer in how LLM-powered applications process and retrieve data to handle complex user queries.
“Agentic RAG… incorporates AI agents into the RAG pipeline to orchestrate its components and perform additional actions beyond simple information retrieval and generation to overcome the limitations of the non-agentic pipeline,” vector database company Weaviate’s technology partner manager Erika Cardenas and ML engineer Leonie Monigatti wrote in a joint blog post describing the potential of agentic RAG.
The problem of ‘vanilla’ RAG
While widely used across use cases, traditional RAG is often impacted due to the inherent nature of how it works.
At the core, a vanilla RAG pipeline consists of two main components—a retriever and a generator. The retriever component uses a vector database and embedding model to take the user query and run a similarity search over the indexed documents to retrieve the most similar documents to the query. Meanwhile, the generator grounds the connected LLM with the retrieved data to generate responses with relevant business context.
The architecture helps organizations deliver fairly accurate answers, but the problem begins when the need is to go beyond one source of knowledge (vector database). Traditional pipelines just can’t ground LLMs with two or more sources, restricting the capabilities of downstream products and keeping them limited to select applications only.
Further, there can also be certain complex cases where the apps built with traditional RAG can suffer from reliability issues due to the lack of follow-up reasoning or validation of the retrieved data. Whatever the retriever component pulls in one shot ends up forming the basis of the answer given by the model.
Agentic RAG to the rescue
As enterprises continue to level up their RAG applications, these issues are becoming more prominent, forcing users to explore additional capabilities. One such capability is agentic AI, where LLM-driven AI agents with memory and reasoning capabilities plan a series of steps and take action across different external tools to handle a task. It is particularly being used for use cases like customer service but can also orchestrate different components of the RAG pipeline, starting with the retriever component.
According to the Weaviate team, AI agents can access a wide range of tools – like web search, calculator or a software API (like Slack/Gmail/CRM) – to retrieve data, going beyond fetching information from just one knowledge source.
As a result, depending on the user query, the reasoning and memory-enabled AI agent can decide whether it should fetch information, which is the most appropriate tool to fetch the required information and whether the retrieved context is relevant (and if it should re-retrieve) before pushing the fetched data to the generator component to produce an answer.
The approach expands the knowledge base powering downstream LLM applications, enabling them to produce more accurate, grounded and validated responses to complex user queries.
For instance, if a user has a vector database full of support tickets and the query is “What was the most commonly raised issue today?” the agentic experience would be able to run a web search to determine the day of the query and combine that with the vector database information to provide a complete answer.
“By adding agents with access to tool use, the retrieval agent can route queries to specialized knowledge sources. Furthermore, the reasoning capabilities of the agent enable a layer of validation of the retrieved context before it is used for further processing. As a result, agentic RAG pipelines can lead to more robust and accurate responses,” the Weaviate team noted.
Easy implementation but challenges remain
Organizations have already started upgrading from vanilla RAG pipelines to agentic RAG, thanks to the wide availability of large language models with function calling capabilities. There’s also been the rise of agent frameworks like DSPy, LangChain, CrewAI, LlamaIndex and Letta that simplify building agentic RAG systems by plugging pre-built templates together.
There are two main ways to set up these pipelines. One is by incorporating a single agent system that works through multiple knowledge sources to retrieve and validate data. The other is a multi-agent system, where a series of specialized agents, run by a master agent, work across their respective sources to retrieve data. The master agent then works through the retrieved information to pass it ahead to the generator.
However, regardless of the approach used, it is pertinent to note that the agentic RAG is still new and can run into occasional issues, including latencies stemming from multi-step processing and unreliability.
“Depending on the reasoning capabilities of the underlying LLM, an agent may fail to complete a task sufficiently (or even at all). It is important to incorporate proper failure modes to help an AI agent get unstuck when they are unable to complete a task,” the Weaviate team pointed out.
The company’s CEO, Bob van Luijt, also told VentureBeat that the agentic RAG pipeline could also be expensive, as the more requests the LLM agent makes, the higher the computational costs. However, he also noted that how the whole architecture is set up could make a difference in costs in the long run.
“Agentic architectures are critical for the next wave of AI applications that can “do” tasks rather than just retrieve information. As teams push the first wave of RAG applications into production and get comfortable with LLMs, they should look for educational resources about new techniques like agentic RAG or Generative Feedback Loops, an agentic architecture for tasks like data cleansing and enrichment,” he added.
[ad_2]
Source link