Writing a Pipeline Script
The fastest path is to copy scripts/nomic-chroma.py and adapt it for your stack — it's a complete working example. If you're building your own, this page shows the correct ingestion loop and the operational details you'll need. See Manifest Reference for field details.
The complete ingestion loop
import json, os, sys
path = sys.argv[1] if len(sys.argv) > 1 else os.environ.get("BRAISED_MANIFEST", "dist/manifest.jsonl")
chunks = []
delete_ids = []
with open(path) as f:
for line in f:
line = line.strip()
if not line:
continue
record = json.loads(line)
if record.get("_deleted"):
delete_ids.append(record["id"])
else:
chunks.append(record)
# Delete before upserting — a changed page emits a deletion record for the old
# chunk AND a new chunk record, so order matters.
for old_id in delete_ids:
delete_from_store(old_id)
for chunk in chunks:
embed_and_upsert(chunk)
On the first build there are no deletion records — the same loop handles it without special-casing.
Reference implementation
scripts/nomic-chroma.py is a complete working example against Ollama and Chroma, using Python stdlib only (no pip install). It covers reading, embedding, upsert, deletion handling, and a smoke-test query. Copy and adapt it for your own stack.
python3 scripts/nomic-chroma.py dist/manifest.jsonl
Embedding
Pass chunk["content"] directly to your embedding model — it's plain text with breadcrumb context pre-baked. The field works with any embedding endpoint:
# OpenAI
embedding = openai_client.embeddings.create(
model="text-embedding-3-small", input=chunk["content"]
).data[0].embedding
# Ollama
embedding = requests.post("http://localhost:11434/api/embeddings",
json={"model": "nomic-embed-text", "prompt": chunk["content"]}
).json()["embedding"]
Storing metadata for retrieval results
Store url, title, heading, and breadcrumb alongside your vectors so query results are usable:
metadatas.append({
"url": chunk["url"],
"title": chunk["title"],
"heading": chunk["heading"],
"breadcrumb": " > ".join(chunk["breadcrumb"]),
})
Using a webhook
If you're running a hosted ingest endpoint, use pipeline: http: instead of a script. Braised POSTs the manifest as application/x-ndjson — your endpoint receives the same stream a script would read from disk. See Configuration in the overview.
.braised/ and CI state
Braised stores source-file hashes in .braised/index-state.json to drive incremental manifests. Add .braised/ to .gitignore and cache it in CI to preserve incremental builds between runs. See Build Outputs for details.
Pages excluded after initial indexing
If a page gains llm_exclude: true after it was previously indexed, braised emits deletion records for all its old chunk IDs on the next build. The same deletion loop handles it automatically.