RAGWire Providers and Components

Swap between Ollama, OpenAI, Gemini, Groq, and HuggingFace with a single config change. Use Qdrant Cloud with MMR retrieval.

Jun 18, 202611 min readFollow

Topics You Will Master

Switching between Ollama, OpenAI, Gemini, Groq, and HuggingFace providers by changing one YAML file
Connecting RAGWire to Qdrant Cloud for production vector storage
Using MMR retrieval to reduce redundancy in search results
Building a filter-aware agent on any provider with Qdrant Cloud

RAGWire decouples provider choice from pipeline logic — switching from Ollama to OpenAI to Gemini to Groq requires only a config file change. This article walks through every supported provider configuration, then demonstrates a production setup with Qdrant Cloud and MMR retrieval.

Before starting, complete the RAGWire Architecture and Setup article to have a working local pipeline.

95% OFF

Advanced RAG – Build & Deploy Production GenAI Apps

Build RAGWire from scratch — multi-agent RAG with LangGraph, CrewAI, AutoGen, FastAPI, and Chainlit.

Enroll Now — 95% OFF →

RAGWire with All Providers

Switch providers by changing the config file. All tools, agents, and retrieval logic stay identical — only the YAML changes.

Config Embedding LLM
config_ollama.yaml qwen3-embedding:0.6b (Ollama) qwen3.5:9b (Ollama)
config_gemini.yaml gemini-embedding-001 (Google) gemini-2.5-flash (Google)
config_openai.yaml text-embedding-3-small (OpenAI) gpt-5.4-nano (OpenAI)
config_groq.yaml all-MiniLM-L6-v2 (HuggingFace) qwen/qwen3-32b (Groq)

Setup

PYTHON
from dotenv import load_dotenv
load_dotenv()

from ragwire import RAGWire, setup_logging
import ragwire

logger = setup_logging(log_level="INFO")
print(ragwire.__version__)
OUTPUT
1.2.7

Ollama (Local)

YAML
# config_ollama.yaml
embeddings:
  provider: "ollama"
  model: "qwen3-embedding:0.6b"
  base_url: "http://localhost:11434"

llm:
  provider: "ollama"
  model: "qwen3.5:9b"
  base_url: "http://localhost:11434"
  num_ctx: 16384

vectorstore:
  url: "http://localhost:6333"
  collection_name: "finance-rag-ollama"
  use_sparse: true
  force_recreate: false

retriever:
  search_type: "hybrid"
  top_k: 5
  auto_filter: false

metadata:
  config_file: "finance_metadata.yaml"
PYTHON
rag = RAGWire('config_ollama.yaml')
stats = rag.ingest_directory('../data/finance_data')

OpenAI

YAML
# config_openai.yaml
embeddings:
  provider: "openai"
  model: "text-embedding-3-small"
  api_key: "${OPENAI_API_KEY}"

llm:
  provider: "openai"
  model: "gpt-5.4-nano"
  api_key: "${OPENAI_API_KEY}"

vectorstore:
  url: "http://localhost:6333"
  collection_name: "finance-rag-openai"
  use_sparse: true
  force_recreate: false

retriever:
  search_type: "hybrid"
  top_k: 5
  auto_filter: true

metadata:
  config_file: "finance_metadata.yaml"

logging:
  level: "INFO"
  console_output: true
  colored: false
  log_file: "./.log/ragwire.log"
PYTHON
rag_openai = RAGWire('config_openai.yaml')
stats = rag_openai.ingest_directory('../data/finance_data')

Test retrieval:

PYTHON
rag_openai.retrieve("what is apple revenue in 2025?")

Note

Each provider creates its own Qdrant collection with a different name. Embeddings from different models are not interchangeable — you must re-ingest when switching embedding providers.

Groq

Groq provides fast inference for supported models. This config pairs Groq's LLM with HuggingFace embeddings:

YAML
# config_groq.yaml
embeddings:
  provider: "huggingface"
  model_name: "sentence-transformers/all-MiniLM-L6-v2"

llm:
  provider: "groq"
  model: "qwen/qwen3-32b"
  api_key: "${GROQ_API_KEY}"

vectorstore:
  url: "http://localhost:6333"
  collection_name: "finance-rag-groq"
  use_sparse: true
  force_recreate: true

retriever:
  search_type: "hybrid"
  top_k: 5
  auto_filter: true

metadata:
  config_file: "finance_metadata.yaml"

logging:
  level: "INFO"
  console_output: true
  colored: false
  log_file: "./.log/ragwire.log"

Note

HuggingFace embeddings require an additional dependency: pip install langchain-huggingface

PYTHON
load_dotenv(override=True)

rag_groq = RAGWire('config_groq.yaml')
rag_groq.ingest_documents([r'..\data\finance_data\amazon 10-k 2024.pdf'])
OUTPUT
{'total': 1, 'processed': 1, 'skipped': 0, 'failed': 0, 'chunks_created': 39, 'errors': []}

On Linux/macOS: Use forward slashes in paths: ../data/finance_data/amazon 10-k 2024.pdf

Gemini

YAML
# config_gemini.yaml
embeddings:
  provider: "google"
  model: "models/gemini-embedding-001"
  api_key: "${GOOGLE_API_KEY}"

llm:
  provider: "google"
  model: "gemini-2.5-flash"
  api_key: "${GOOGLE_API_KEY}"

vectorstore:
  url: "http://localhost:6333"
  collection_name: "finance-rag-google"
  use_sparse: true
  force_recreate: true

retriever:
  search_type: "hybrid"
  top_k: 5
  auto_filter: true

metadata:
  config_file: "finance_metadata.yaml"

logging:
  level: "INFO"
  console_output: true
  colored: false
  log_file: "./.log/ragwire.log"
PYTHON
load_dotenv(override=True)

rag_gemini = RAGWire('config_gemini.yaml')
rag_gemini.ingest_documents([r'..\data\finance_data\amazon 10-k 2024.pdf'])
OUTPUT
{'total': 1, 'processed': 1, 'skipped': 0, 'failed': 0, 'chunks_created': 39, 'errors': []}

Tip

Use load_dotenv(override=True) when switching between providers in the same notebook session. This forces the environment variables to reload from .env.

Qdrant Cloud with MMR Retrieval

For production, replace the local Qdrant instance with Qdrant Cloud.

Two things change in the config:

  1. vectorstore.url points to your Qdrant Cloud cluster URL
  2. vectorstore.api_key authenticates with your Qdrant API key

Everything else — ingest, retrieve, agent — is identical.

YAML
# config_gemini_qdrant.yaml
embeddings:
  provider: "google"
  model: "models/gemini-embedding-001"
  api_key: "${GOOGLE_API_KEY}"

llm:
  provider: "google"
  model: "gemini-2.5-flash"
  api_key: "${GOOGLE_API_KEY}"

vectorstore:
  url: "${QDRANT_URL}"
  api_key: "${QDRANT_API_KEY}"
  collection_name: "finance-rag-google-qdrant"
  use_sparse: true
  force_recreate: false

retriever:
  search_type: "hybrid"
  top_k: 5
  auto_filter: false

metadata:
  config_file: "finance_metadata.yaml"

logging:
  level: "INFO"
  console_output: true
  colored: false
  log_file: "./.log/ragwire.log"

Set QDRANT_URL and QDRANT_API_KEY in your .env file:

BASH
QDRANT_URL=https://your-cluster-id.cloud.qdrant.io:6333
QDRANT_API_KEY=your_qdrant_api_key

Ingest to Qdrant Cloud

PYTHON
load_dotenv(override=True)

rag_qdrant = RAGWire("config_gemini_qdrant.yaml")
rag_qdrant.ingest_directory("../data/finance_data")
OUTPUT
{'total': 6, 'processed': 6, 'skipped': 0, 'failed': 0, 'chunks_created': 260, 'errors': []}

Build a Filter-Aware Agent on Qdrant Cloud

The agent code is identical regardless of provider — the same two tools (get_filter_context and search_documents) work with any backend:

PYTHON
from langchain.agents import create_agent
from langchain.tools import tool
from langchain.messages import HumanMessage
from langchain_google_genai import ChatGoogleGenerativeAI
from langgraph.checkpoint.memory import InMemorySaver

@tool
def get_filter_context(query: str) -> str:
    """Get available metadata fields, stored values, and filter suggestions for a query.

    Call this before search_documents when the query involves a specific company,
    year, or document type. Skip for purely semantic queries.
    """
    return rag_qdrant.get_filter_context(query)

@tool
def search_documents(query: str, filters=None):
    """Search the document knowledge base for relevant information.

    Args:
        query: The search query
        filters: Optional metadata filters from get_filter_context.
    """
    results = rag_qdrant.retrieve(query=query, filters=filters)
    if not results:
        return "No relevant information is found!"
    else:
        return results

agent = create_agent(
    model=ChatGoogleGenerativeAI(model="gemini-2.5-flash"),
    tools=[get_filter_context, search_documents],
    system_prompt=(
        "You are a helpful financial document assistant. "
        "For complex questions, break them down into simple sub-questions. "
        "Always use search_documents to retrieve information — never answer from general knowledge. "
        "Use get_filter_context before search_documents when the query involves specific metadata. "
        "Always cite the source document in your answer."
    ),
    checkpointer=InMemorySaver(),
)

Interactive Q&A

PYTHON
config = {"configurable": {"thread_id": "demo"}}

print("\nRAG Agent ready. Type 'quit' to exit.\n")

while True:
    question = input("You: ").strip()
    if question.lower() in ("quit", "exit", "q"):
        break
    if not question:
        continue
    response = agent.invoke(
        {"messages": [HumanMessage(question)]},
        config=config,
    )
    print(f"\nAgent: {response['messages'][-1].text}\n")

Important

The Qdrant Cloud collection built in this article is the same collection used by the FastAPI backend in later articles. You do not need to re-ingest — the backend connects to the same finance-rag-google-qdrant collection.

Found this useful? Keep building with me.

New tutorials every week on YouTube — or go deeper with a full structured course.

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments