Beyond Basic RAG

Building Stateful, Agentic Retrieval Systems with LangGraph

Apr 27, 2025

Retrieval-Augmented Generation (RAG) has evolved dramatically since its introduction in 2020. While simple implementations can deliver impressive results, today's most challenging use cases demand more sophisticated approaches. In this article, I'll show you how to leverage LangGraph—a powerful extension to the LangChain ecosystem—to build advanced RAG systems that are stateful, conversational, and even agentic.

Why Basic RAG Falls Short

Basic RAG follows a straightforward pipeline: retrieve relevant documents based on a query, inject them into a prompt, and generate a response. This approach works well for single-turn, factual queries but quickly breaks down in several common scenarios:

Multi-turn conversations where the context evolves across multiple interactions
Complex queries requiring multiple retrieval steps or synthesis from diverse sources
Ambiguous questions needing clarification or decomposition
Knowledge-intensive tasks requiring both structured and unstructured data
Dynamic information needs that change as the conversation progresses

You might wonder: "With 100K+ token context windows in modern LLMs, do we even need RAG anymore?" The answer is emphatically yes. Larger context windows don't solve the fundamental challenges of information accuracy, recency, and reliability. Furthermore, they bring new challenges:

Cost efficiency - Filling large contexts is expensive, especially at scale
Precision - Targeted, relevant information yields better responses than flooding with data
Proprietary knowledge - External knowledge bases remain essential for domain-specific applications
Dynamic content - Fresh information can't be included in frozen model weights

LangGraph: A Framework for Stateful, Dynamic RAG

LangGraph extends LangChain's capabilities by enabling the creation of stateful, graph-based applications. Unlike linear chains, LangGraph allows for:

Cyclical flows with conditional branching and recursion
Persistent state management across multiple turns
Dynamic decision-making based on intermediate results

At its core, LangGraph represents your application as a directed graph where:

Nodes are computational steps (functions or LangChain runnables)
Edges define transitions between steps
State is persistent and accessible throughout the graph

This structure enables implementation of advanced RAG patterns that were previously difficult to achieve. Let's explore three practical patterns.

Pattern 1: Conversational Memory for RAG

Standard RAG struggles with follow-up questions because it lacks memory across turns. Consider this exchange:

User: "What were the key advances in RAG research in 2023?"
AI: [Provides accurate response based on retrieved documents]
User: "How did they improve on the original 2020 paper?"

Without conversational memory, the system can't understand what "they" refers to in the follow-up. LangGraph elegantly solves this with stateful conversation management:

from typing import List, Union, Optional
from typing_extensions import TypedDict
from langgraph.graph import StateGraph
from langchain_core.messages import HumanMessage, AIMessage
from langchain_openai import ChatOpenAI

from langchain_community.vectorstores import FAISS  # or another vector store
from langchain_openai import OpenAIEmbeddings

# we'll set up a vector store:
embeddings = OpenAIEmbeddings()
vector_store = FAISS.from_texts(["your document content"], embeddings)


# Define state with message history
class ConversationalRAGState(TypedDict):
    messages: List[Union[HumanMessage, AIMessage]]
    context: Optional[Any]  # dict or str
    rewritten_query: Optional[str]

# Create graph
graph = StateGraph(ConversationalRAGState)

# Initialize the chat model for reuse across nodes
chat = ChatOpenAI(temperature=0)

# Define nodes
def rewrite_query(state):
    """Rewrite follow-up queries to be standalone based on conversation history"""
    messages = state["messages"]
    last_message = messages[-1].content
    
    if len(messages) > 1:
        rewritten_query = chat.invoke(
            messages + [HumanMessage(content=f"Rewrite the last query as a standalone query using context from our conversation: {last_message}")]
        ).content
    else:
        rewritten_query = last_message
        
    return {"rewritten_query": rewritten_query}

def retrieve_documents(state):
    """Retrieve relevant documents based on rewritten query"""
    query = state["rewritten_query"]
    docs = vector_store.similarity_search(query, k=3)
    context = "\n\n".join([doc.page_content for doc in docs])
    return {"context": context}

def generate_response(state):
    """Generate response based on retrieved context and conversation history"""
    messages = state["messages"]
    context = state["context"]
    
    response = chat.invoke(
        messages + [
            HumanMessage(content=f"Using the following context, answer the last query: {context}")
        ]
    )
    return {"messages": messages + [response]}

# Add nodes to graph
graph.add_node("rewrite_query", rewrite_query)
graph.add_node("retrieve", retrieve_documents)
graph.add_node("generate", generate_response)

# Define edges
graph.add_edge("rewrite_query", "retrieve")
graph.add_edge("retrieve", "generate")
graph.set_entry_point("rewrite_query")

# Compile graph
chain = graph.compile()


# Run the graph
input_data = {"messages": [HumanMessage(content="Hello, LangGraph!")]}
result = chain.invoke(input_data)
print(f"Result: {result}")

The key insight here is that the message history becomes part of the state and persists across turns. This allows for:

Contextual query rewriting to handle follow-up questions
More targeted retrieval based on the full conversation context
Coherent, contextual responses that acknowledge previous exchanges

Pattern 2: Hybrid Retrieval with Knowledge Graphs

Many real-world applications require information stored in both unstructured documents and structured knowledge graphs. Basic vector RAG can't effectively leverage this structured knowledge.

LangGraph enables hybrid retrieval that combines vector search with graph database queries:

def retrieve_hybrid(state):
    """Perform hybrid retrieval from both vector store and knowledge graph"""
    query = state["query"]
    
    # Vector retrieval
    vector_docs = vector_store.similarity_search(query, k=3)
    
    # Extract entities for graph retrieval
    entities = entity_extractor.extract_entities(query)
    
    # Graph retrieval
    graph_results = []
    for entity in entities:
        # Query knowledge graph for relationships
        neighbors = graph_db.get_entity_neighborhood(
            entity, max_distance=2, max_results=5
        )
        graph_results.extend(neighbors)
    
    # Combine results
    combined_context = {
        "vector_results": [doc.page_content for doc in vector_docs],
        "graph_results": graph_results
    }
    
    return {"context": combined_context}

Please note that this function expects a query key in the state instead of rewritten_query and returns a structured context object.

The power of this approach comes from combining semantic search with explicit relationships:

Vector search identifies topically relevant documents
Graph queries provide structured relationships between entities
Combined context offers the LLM both general content and specific facts

For example, when asked about "the impact of HyDE on modern RAG systems," the vector search might retrieve relevant papers describing HyDE, while the graph retrieval would identify specific relationships like "HyDE → influenced → CRAG" and "HyDE → cited by → RAG-Fusion."

Pattern 3: Towards Agentic RAG

Sometimes a single retrieval step isn't sufficient. The system might need to reason about the query, plan multiple retrieval steps, or refine its search based on initial findings.

LangGraph's conditional edges and cycle management make implementing agentic RAG behavior straightforward:

def plan_retrieval(state):
    """Plan the retrieval strategy based on the query"""
    query = state["query"]
    chat = ChatOpenAI(temperature=0)
    
    planning_prompt = f"""
    You are a retrieval planning agent. Analyze this query and create a plan for gathering information:
    Query: {query}
    
    Output a JSON with these fields:
    - breakdown: Split the query into subquestions if needed
    - retrieval_strategy: For each subquestion, specify what type of retrieval to use (vector/knowledge_graph/web_search)
    """
    
    plan = chat.invoke(planning_prompt).content
    return {"retrieval_plan": plan}

def execute_retrieval(state):
    """Execute the retrieval based on the plan"""
    plan = state["retrieval_plan"]
    results = []
    
    for step in plan["breakdown"]:
        if step["retrieval_strategy"] == "vector":
            docs = vector_store.similarity_search(step["subquestion"])
            results.append({"type": "vector", "docs": docs})
        elif step["retrieval_strategy"] == "knowledge_graph":
            entities = entity_extractor.extract_entities(step["subquestion"])
            graph_results = graph_db.query_entities(entities)
            results.append({"type": "graph", "results": graph_results})
        # Additional retrieval methods...
    
    return {"retrieved_results": results}

def evaluate_results(state):
    """Evaluate if the retrieved information is sufficient"""
    results = state["retrieved_results"]
    query = state["query"]
    
    evaluation_prompt = f"""
    Evaluate if the retrieved information is sufficient to answer the query:
    Query: {query}
    Retrieved Information: {results}
    
    Output JSON with:
    - sufficient: true/false
    - missing_information: list of missing information if insufficient
    """
    
    evaluation = chat.invoke(evaluation_prompt).content
    return {"evaluation": evaluation}

# Conditional edge based on evaluation
def should_retrieve_more(state):
    evaluation = state["evaluation"]
    if evaluation["sufficient"]:
        return "generate_response"
    else:
        return "refine_query"

def refine_query(state):
    """Refine the query based on missing information"""
    evaluation = state["evaluation"]
    original_query = state["query"]
    
    refined_query = chat.invoke(f"""
    Original query: {original_query}
    Missing information: {evaluation["missing_information"]}
    
    Generate a refined query to address the missing information.
    """).content
    
    return {"query": refined_query}

# Update the graph with conditional logic
graph.add_node("plan", plan_retrieval)
graph.add_node("retrieve", execute_retrieval)
graph.add_node("evaluate", evaluate_results)
graph.add_node("refine", refine_query)
graph.add_node("generate", generate_response)

graph.add_edge("plan", "retrieve")
graph.add_edge("retrieve", "evaluate")
graph.add_conditional_edges("evaluate", should_retrieve_more)
graph.add_edge("refine", "retrieve")

This adds a dynamic retrieval strategy with evaluation and refinement capabilities through a basic ReAct (Reasoning + Acting) loop where the model:

Plans a retrieval strategy by decomposing complex queries
Executes retrieval based on that plan
Evaluates if the retrieved information is sufficient
Refines the query if needed, creating an iterative loop

Research papers like Self-RAG (Asai et al., 2023) and CoRAG (Wang et al., 2025) demonstrate that these iterative approaches significantly improve performance on complex tasks requiring multi-step reasoning.

Evaluation and Monitoring

Advanced RAG systems are inherently more complex, making evaluation and debugging critical. LangSmith, LangChain's observability platform, integrates seamlessly with LangGraph to provide visibility into:

Node execution paths to identify bottlenecks
State changes throughout the graph
Retrieval quality metrics like precision and recall
End-to-end performance including response accuracy and relevance

For practical evaluation, consider implementing:

Retrieval-focused metrics like Precision@K and Recall@K for each retrieval step
Answer correctness using either ground truth or LLM-based evaluation
Faithfulness scoring to measure how well responses stick to retrieved information

Conclusion: The Future of RAG

As we've seen, LangGraph enables RAG implementations that are far more sophisticated than the basic retrieve-and-generate pipeline. These stateful, agentic systems better handle the complexity of real-world applications—from maintaining conversational context to executing multi-step reasoning and retrieval.

Despite increasing context window sizes, advanced RAG remains crucial for building reliable, efficient AI applications. The patterns we've explored—conversational memory, hybrid retrieval, and agentic behaviors—represent the current state of the art in production RAG systems.

We went quite fast in this walk-through. For a deeper exploration of these concepts and techniques, including deployment strategies and production-ready patterns, look for my upcoming book on building applications with LangChain, where we'll dive even further into these advanced implementation patterns.

What advanced RAG patterns are you implementing? Let me know in the comments!

This article was written by Ben Auffarth, Chief Data Scientist at Chelsea AI Ventures, an AI consultancy, and author of multiple bestselling books on AI implementation including "Generative AI with LangChain" which is covering these topics in much more detail.