Use RAG to Polish Your Chatbot: Retrieval-Augmented Generation Improves the “I” in AI

It seems the entire digital universe is caught up in the whirlwind of Large Language Models (LLMs) and the chatbots that use them. Large tech companies—including giants like Microsoft, Meta and Alphabet—are going all-in on generative Artificial Intelligence (AI), including LLMs. This is driving many companies to learn about and adopt generative AI and the chatbots that use it. The reasons are varied, but the goal is common: we all want to use these exciting technologies to better serve our businesses, our teams and our customers.

But many of us are uncertain about how to use chatbots and LLMs safely, effectively and ethically within our industry. There is also doubt as to whether they have access to the correct information for our industry. These are compelling concerns to be sure, but there is an equally compelling solution. Enter Retrieval-Augmented Generation (RAG)—sure to be a valuable tool in your company’s toolbox, it can really make your chatbot shine.

But before we get to WHY and HOW, let’s start with WHAT we’re talking about. There are many buzzwords and potentially confusing information, so this will help set the stage.

WHAT are we talking about?

In the most basic of terms: LLMs are a type of generative AI, and RAG helps make LLMs smarter by feeding them additional data that leads to more accurate, contextually appropriate answers. Let’s break it down from the beginning.

Gen AI: FYI

Generative AI is a branch of artificial intelligence that can create new forms of text, images, audio or other content based on a given input or prompt. This is the actual technology people are using when they access LLMs like OpenAI’s ChatGPT, Meta’s Llama or Google’s Bard. LLMs use chatbots and chat-based systems, sort of like text messaging. The chatbot takes the prompt, sends it to the LLM for some processing and information retrieval, and then the chatbot selects the best from the LLM’s response options.

The 411 on LLM

LLMs are trained on massive amounts of data to produce responses for various tasks like question answering, summarization, translation or dialogue generation. They produce fluent, human-like text using probable, next-word predictions based on the data they are trained on. However, LLMs have some limitations. They are sometimes prone to factual errors and generating generic, biased or irrelevant responses—often referred to as “hallucinating.” LLMs also often lack access to up-to-date or domain-specific information, which could help avoid those hallucinations and, instead, provide more useful answers for your specific business needs. This is the problem that RAG is meant to solve by grounding the LLM on data that is related to, and vetted by, your business.

The Lowdown on Data

The massive amounts of data we generate and use within our companies is scattered in different systems—our HR data is somewhere, our engineering data is somewhere else, financial data is somewhere else, and so on.

Some of this data is housed in databases. But a lot of the data we want our chatbots to access and use is “unstructured data”—information in documents, spreadsheets, PDF files, text files, even drawings and other images. Unstructured data needs extra processing to be put into a usable format.

Each set of data can be processed into a model, and then made available to a chatbot using a plug-in (that’s a software component that adds a customized feature to an existing computer program).

Once a plug-in is created, it’s exposed as an Application Programming Interface (API), which signals to your chatbot that the data can be accessed. If a plug-in contains multiple datasets, it manages the relationship between them, so the chatbot doesn’t have to figure that out.

RAG to the Rescue

RAG is a framework applied within your generative AI system that improves LLM-generated responses by guiding the model with additional, relevant data sources.

RAG examines what the user asked and determines their intent. It uses the intent to retrieve pertinent information from a database or index via a plug-in. Next, it feeds that retrieved information as specific context—along with the user’s original prompt—to the LLM. Using this context, the LLM can generate more accurate, specific and informative responses.

RAG can be applied to all tasks that your chatbot can perform. Assuming that the data being retrieved is factual and accurate, RAG should greatly reduce your chatbot’s hallucinations—which, in turn, should optimize the value of the information returned to the user.

Cloud Mine
LLM and other AI technologies are typically accessed through a web browser. This means that unless your company wants to spend millions of dollars a year on its own infrastructure, your system is going to reside in the cloud.

Cloud storage is still an unsettling concept for many people for one main reason—cyberattacks are real, they are rampant, and they are crippling. Microsoft and other providers spend billions of dollars every year to secure their cloud platforms, so it’s hard to argue that they aren’t committed to data privacy and cybersecurity. For example, after careful research and consideration, we chose Microsoft’s Azure platform for our system. Microsoft has provided assurances that “our data is our data,” and that they are not capturing or otherwise accessing the data we use for our RAG framework.

WHY is generative AI good for your organization?

You might be asking yourself what benefit a chatbot powered with generative AI can offer your business. Let’s look at some of the benefits this technology can provide, and how RAG makes it even better.

Producing up-to-date and up-to-snuff deliverables—reports, proposals, presentations, or other materials—is easier than ever before using generative AI. RAG helps to ensure that documents are accurate and relevant by retrieving information from your own databases or indexes, as well as from relevant external sources—like industry standards, regulations, best practices or approved news articles.
Interactive, personalized services to clients and employees can be offered via AI. RAG enhances your chatbot’s ability to answer complex or open-ended questions and provide detailed explanations or recommendations. Using RAG and some clever programming, your chatbot can handle multiple topics or domains by retrieving information from various sources based on the user’s prompt (and, specifically, the intent of their prompt).
Even generative AI systems like LLMs need a little inspiration—and RAG can provide it. It retrieves information from diverse and/or unconventional sources of knowledge to guide the LLM (based on the user’s prompt and intent), which might lead your chatbot to new ideas or perspectives that aid the problem-solving process.

As we begin to design and test a RAG-powered chatbot at POWER, we are seeing some of these benefits already. We’ve had some excellent results in our proposal writing, including resumes and project descriptions. We’ve also used the system to provide Q&A-based chat interaction. In our beta testing (soon to go prime time), the RAG pattern has provided grounded, accurate answers to user questions. Finally, we’re exploring the use of generative AI in our design organizations and seeing some very promising opportunities there as well.

HOW can you create your own chatbot with RAG powers?

If you’re intrigued about the WHAT and the WHY, let’s get into the HOW—a high-level solution your company can use as a starting point for your own RAG-enhanced generative AI tool. This is a simplified explanation of the system we have designed to establish a RAG-enhanced chatbot accessing various company data using plug-ins.

This basic flow diagram illustrates a RAG-enhanced chatbot accessing various company data using plug-ins.

POWER employees access the chatbot via several applications—including our intranet and Microsoft Teams. Each access point (or “front-end client”) authenticates the user and sets some boundaries they must stay within when using that front-end client. For example, if we are working on engineering materials, we don’t need HR information, so access to that data is limited. Then the front-end client calls the chatbot, and the real work begins.

The chatbot looks at the user instructions and extracts the intent, along with any flags or other information from the front-end client. Then the chatbot scans its catalog of available plug-ins and calls up the ones it needs. It uses RAG to ground the request in the most accurate data, and then sends the whole package (along with user intent) to the LLM. The LLM chews on it and sends an answer to the chatbot. The chatbot does some fine-tuning and, finally, sends the user the search results.

If relevant, the chatbot can even cite the actual data source. The user can click on the citation to show the information used to create the answer—revealing the document, image, etc. This goes a long way toward building trust in your chatbot, which ultimately helps drive adoption in your organization.

The chatbot can store the last several questions a user has asked, as well as the results from the LLM. This storage gives our chatbot memory, enabling it to learn and get smarter over time. Only the user and the chatbot have access to that memory—it can’t escape to the LLM or become part of it.

POWER’s AI components are independent of one another, so that we can easily separate the work between our development teams. It enables us to make changes quickly, and to do so without putting another part of the system at risk with a sneaky bug. This approach also allows for flexibility. For example, if we choose to use a different LLM in the future, we will be able to swap it out easily.

Ready, Set, RAG!

RAG can harness the power of LLMs and other generative AI systems and, in turn, be a force for good at your organization. Using this framework to leverage the information your business already has grounds your generative AI in data that is vetted by and has value to you. RAG will help you turn your generative AI system into a highly polished tool and greatly enhance your digital transformation journey.