Full project: RAG (Retrieval-Augmented Generation) III

Combining Self-Query and MMR Retrievers in RAG Pipelines: A Practical Guide

In Retrieval-Augmented Generation (RAG) pipelines, the retriever plays a central role. Before the LLM can generate answers, it needs relevant information — and retrievers are the components in charge of finding it. Whether pulling from a vector database, a search index, or a hybrid of both, retrievers define what information the model can see.

In this post, we walk through a Python implementation using LangChain where two types of retrievers — a Self-Query Retriever and a Maximal Marginal Relevance (MMR) Retriever — are defined, and then combined into a MergerRetriever. We’ll break down what each retriever does, when to use them, and why combining them can boost the performance of your RAG system.

What Are Retrievers in RAG?

Before diving into code, let’s briefly recap the role of retrievers:

A retriever takes a user query and returns a set of relevant documents from a knowledge base. These documents are then used by a language model to generate a final response.

In LangChain (and many other RAG frameworks), retrievers are modular and interchangeable — which means you can experiment with different retrieval strategies depending on your data and use case.

Code Walkthrough

We’ll look at three core functions:

get_selfQueryRetriever()
get_retriever()
get_mergeRetriever()

Each sets up a retriever with different strategies.

1. Self-Query Retriever

def get_selfQueryRetriever():
    ...
    selfqueryRetriever = SelfQueryRetriever.from_llm(
        get_llm(),
        get_db(),
        "Subvenciones y ayudas",
        metadata_field_info,
        search_kwargs={'k': 15}
    )
    return selfqueryRetriever

def get_selfQueryRetriever():
    ...
    selfqueryRetriever = SelfQueryRetriever.from_llm(
        get_llm(),
        get_db(),
        "Subvenciones y ayudas",
        metadata_field_info,
        search_kwargs={'k': 15}
    )
    return selfqueryRetriever

The Self-Query Retriever uses a language model to reformulate the user query by generating structured metadata filters. It’s ideal when your documents contain rich, labeled metadata.

Here, metadata_field_info describes the metadata schema:

Destinatarios, Organismo, Sector, Subsector, etc.
Each has a name, description, and type (mostly strings).

The SelfQueryRetriever leverages this schema to interpret queries like:

«Ayudas disponibles para empresas tecnológicas en Andalucía»

And transform them into structured filters like:

{
  "Sector": "Tecnología",
  "AmbitoGeografico": "Andalucía"
}

{
  "Sector": "Tecnología",
  "AmbitoGeografico": "Andalucía"
}

This structured query then searches the vector store using both the text and metadata filters — making results far more precise.

✅ When to Use:

Your documents are well-tagged with metadata.
Queries often refer to attributes like dates, sectors, regions, etc.
You want better control over what’s retrieved.

2. MMR Retriever

def get_retriever():
    retriever = get_db().as_retriever(
        search_type="mmr",
        search_kwargs={"k": 20, "lambda_mult": 0.0}
    )
    return retriever

def get_retriever():
    retriever = get_db().as_retriever(
        search_type="mmr",
        search_kwargs={"k": 20, "lambda_mult": 0.0}
    )
    return retriever

The second retriever uses Maximal Marginal Relevance (MMR) — a technique that balances relevance and diversity in retrieval.

search_type="mmr" tells the retriever to prioritize documents that are both relevant and non-redundant.
lambda_mult=0.0 weights the relevance component fully (if set closer to 1, diversity plays a larger role).

Unlike SelfQueryRetriever, MMR doesn’t rely on metadata. It’s purely vector-based, returning chunks most similar to the query textually.

✅ When to Use:

You want to avoid redundant or overly similar results.
Your metadata is sparse or inconsistent.
You want to ensure broader coverage of possible answers.

3. Combining Retrievers with MergerRetriever

def get_mergeRetriever():    
    mergeRetriever = MergerRetriever(
        retrievers=[get_selfQueryRetriever(), get_retriever()]
    )
    return mergeRetriever

def get_mergeRetriever():    
    mergeRetriever = MergerRetriever(
        retrievers=[get_selfQueryRetriever(), get_retriever()]
    )
    return mergeRetriever

Here’s where it gets powerful.

The MergerRetriever aggregates results from both retrievers — the precision of the metadata-aware SelfQueryRetriever, and the breadth of the MMR-based vector retriever.

This hybrid strategy ensures you get:

Precise hits when metadata is sufficient.
Fallback coverage when queries don’t map neatly to structured fields.
More robust answers in edge cases where one strategy might fail.

Think of it as a safety net: if the LLM can’t get everything it needs from metadata-filtered search, it can still rely on semantic similarity from the MMR retriever.

✅ When to Use:

You’re building a production RAG system.
Your queries and data vary in structure.
You want the best of both retrieval strategies.

Final Thoughts

Retrievers are the unsung heroes of RAG pipelines. By combining Self-Query and MMR retrievers into a single hybrid retriever, you significantly increase the chances that your LLM gets the right context to generate accurate and complete answers.

Key Takeaways:

Use Self-Query Retriever when metadata is structured and rich.
Use MMR Retriever for semantic, metadata-free search with diverse results.
Combine both with MergerRetriever for a more resilient system.

This setup is especially useful in domains like government grants, legal texts, or enterprise knowledge bases, where structured metadata and unstructured text coexist.

If you’re building a real-world RAG application, hybrid retrieval isn’t just an option — it’s a best practice.

See the full code here: https://github.com/dorapps/RAG_Project