Skip to content
Back to LLM Ingestion Pipeline

Pinecone-only vector backend (DEPRECATED)

✗ DeprecatedLLM Ingestion Pipeline02 — Dataset → RAG System (sub-part: Retrieval Pipeline)
By AI-DE Engineering Team·Stakeholders: retrieval engineer, infra reviewer, finance

Context (when this was Accepted)

The v1 design called for Pinecone-only as the vector backend. Pinecone's serverless API was the fastest path to a working RAG demo: zero ops, hosted-everywhere, a Python client that just works. The original rag/pipeline.py looked like:

# v1 (deprecated)
import pinecone

class RAGPipeline:
    def __init__(self):
        pinecone.init(api_key=os.environ["PINECONE_API_KEY"])
        self.index = pinecone.Index("ingestion-pipeline")

    def upsert(self, doc_id, embedding, metadata): ...
    def query(self, embedding, top_k=10): ...

The implicit assumption was: Pinecone is the production answer. Tutorial learners would sign up for a Pinecone account, get an API key, and run the RAG demo with managed everything. This worked fine for Module 02's first iteration.

What changed (and why we reversed)

Three things forced the reversal:

  1. Tutorial reproducibility broke. A learner without a Pinecone account couldn't run Module 02 end-to-end. Pinecone's free tier exists but requires email signup + API key generation + usage limits — three friction steps that don't belong in a tutorial. The "first 15 minutes to a working demo" bar got blown.

  2. Module 09 (LLMOps) ran into vendor lock-in. The Airflow ingest_dag.py upserted to Pinecone every weekly run; the eval DAG queried Pinecone every daily run. Both went through the external API. A Pinecone outage = the whole pipeline halts. The on-call runbook had no recovery path.

  3. The cost model in Module 05 wouldn't defend. Pinecone's per-pod-hour pricing at 1M-vector scale is ~$80/mo for a s1.x1 pod; pgvector on db.t4g.medium is $98/mo all-in. At 1M vectors the cost is comparable, but pgvector wins decisively at <100K vectors (free, riding on existing Postgres) and at >10M vectors (vertical scale of RDS is cheaper than horizontal Pinecone pods in this size range). The CFO defense for "always Pinecone" was weak in two of three regimes.

The fix landed in Module 02:

  • Introduce a RetrievalBackend Protocol with upsert() and query() methods.
  • Implement two backends: rag/pgvector_client.py (self-host) and rag/pinecone_client.py (managed).
  • Module 02's rag/pipeline.py accepts a backend instance via DI.
  • The tutorial path defaults to pgvector (zero account signup); Tier-2 documents the Pinecone swap (single-line change).
# Post-reversal
from typing import Protocol

class RetrievalBackend(Protocol):
    def upsert(self, doc_id: str, embedding: list[float], metadata: dict) -> None: ...
    def query(self, embedding: list[float], top_k: int = 10) -> list[dict]: ...

# rag/pgvector_client.py implements RetrievalBackend
# rag/pinecone_client.py implements RetrievalBackend
# rag/pipeline.py takes a RetrievalBackend in __init__

The receipts are physical: two parallel client files exist in the starter kit, with the same Protocol contract. Future readers asking "why two clients?" find this ADR.

Why we left this ADR Deprecated rather than deleting it

A future maintainer will look at rag/pgvector_client.py and rag/pinecone_client.py and wonder why both ship together. The interesting question — why didn't we ship just one? — is answered by this ADR.

The MADR convention treats Deprecated ADRs as part of the permanent record. We follow that convention.

What we got wrong (and what we'd do again)

Got wrong:

  • We treated the vector backend as immutable infrastructure when it isn't. Vector-DB choice is an active engineering tradeoff that shifts every 6-12 months as new options ship (pgvector matured; Qdrant got managed; Pinecone changed pricing).
  • We coupled the tutorial path to a vendor account. Anything that blocks "git clone + make up + first demo" before 15 minutes is a reproducibility regression.
  • We didn't separate the retrieval interface from the retrieval backend in v1. Adding the Protocol after the fact required rewriting rag/pipeline.py to take dependency injection.

Got right:

  • The data shape (embedding, metadata, top_k, score) is identical across backends. Adding the second backend was a 200-line exercise, not a redesign.
  • Pinecone stayed in the kit as a documented alternative. We didn't delete it; we demoted it. Teams that want managed everything can flip one flag.
  • The eval harness in Module 04 is backend-agnostic — it tests the retrieval contract, not a specific backend.

When (if ever) to revisit

A future ADR could simplify back to a single backend if both of these are true:

  1. The team has settled on one backend for >12 months with no migrations attempted or planned.
  2. The cost-model CSV no longer documents a regime where the alternative wins.

Until then, the dual-backend Protocol stays.

References

  • rag/pgvector_client.py (the chosen Tier-1 default)
  • rag/pinecone_client.py (the alternative; same Protocol)
  • rag/pgvector_setup.sql (DDL for pgvector extension)
  • rag/pipeline.py (DI consumer of RetrievalBackend)
  • dags/ingest_dag.py (uses the configured backend; not vendor-locked)
  • ADR-001 (aiohttp crawler — produces input independent of vector backend)
  • ADR-002 (MinHash dedup — happens before embedding; orthogonal)
  • ADR-003 (tokenization — produces input to the embed step that writes to the backend)
Built into the project

This decision shipped as part of LLM Ingestion Pipeline — see the full architecture, starter kit, and 4 more ADRs.

Open project →
Press Cmd+K to open