How we ship Phase-4 reports for $0.14
A line-item breakdown of what an Opus 4.7 feasibility report actually costs in production — and the prompt-cache trick that drops the bill from $1.20 to fourteen cents.
Hackathon demos die at the cost-per-call moment. You build something that runs once on stage, then someone runs the math on a 1,000-user week and the model bill is $4,200. The platform never ships, the GitHub repo lasts six months, the world keeps turning.
RESILAND Intelligence ships Phase-4 feasibility reports for $0.14 each. Here is how.
The line items, before optimisation
A single 9-section feasibility report calls Opus 4.7 with roughly:
- Grounding context, ~35,000 tokens. The RESILAND program brief, the nine reference Phase-4 reports (relevant excerpts), the cadastral schema, the species catalogue, the framework alignment table.
- Per-parcel context, ~8,000 tokens. The cadastral row, neighbours, NDVI 12-month series, climate snapshot, intersected nurseries.
- Output, ~6,000 tokens. Nine markdown sections with citations.
At Opus 4.7 input pricing of $15 per million tokens and output at $75 per million:
- Input: 43,000 × $15/M = $0.65
- Output: 6,000 × $75/M = $0.45
- Specialist sub-calls (S1+S2+S3 + S4 composer): $0.10
- Total: ~$1.20 per report
That is feasible for a paying customer, but it kills the demo allowance. So we cached.
What prompt caching saves
Anthropic's prompt cache lets us mark the 35K-token grounding context as a re-usable prefix. The first time it is read at full cost. Every subsequent call within a five-minute window pays roughly 10% of the input cost for those tokens — a 9× discount on the dominant slice of the bill.
Recompute the line items with caching:
- Cached input (35K × $0.10 / token-relative): $0.05
- Fresh input (8K × $15/M): $0.12
- Output: $0.45
- Less, because the specialists also cache: actual measured cost ~$0.14 per report
It is the same model, the same prompts, the same nine sections — it is just billed under a different rule. We did not change a single line of the cascade logic.
Cascade routing — the second lever
Not every request is a Phase-4 draft. Most are conversational ("how many parcels in Surxondaryo?", "translate this section"), and Opus 4.7 is wasteful for those. The cascade looks like this:
- Haiku 4.5 — ~90% of conversational turns. $0.001 per query, replies in under a second.
- Sonnet 4.6 — ~8% of turns: translation, mid-size synthesis, per-section drafting. $0.02 per query.
- Opus 4.7 — ~2% of turns: deep cross-document reasoning, Vision, full report drafting. $0.14–$0.40 per query.
Median session cost across the demo allowance: $0.18. The 95th-percentile session (someone who drafts three reports and translates them all): $1.40.
Cost guardrails — the third lever
Two daily limits live in the settings:
- $50 alert — a webhook fires when the running daily total crosses $50. Useful for spotting runaway loops in early dev.
- $100 hard-stop — new agent invocations are refused above this. The platform stays up but stops drafting until the next UTC midnight reset.
Plus a per-user demo allowance of 10,000 tokens per day, 30-day TTL — enough for ~5 lookups + 1 report draft, plenty for evaluation, not enough for abuse.
What you do not pay for
- Tile rendering. Carto, MapTiler, ESRI World Imagery — all free tiers.
- Sentinel-2. Open data, served via STAC.
- NASA POWER, ERA5, CHIRPS. Public climate datasets.
- Voyage-3 embeddings for the RAG. Free tier covers 200M tokens/month — far past our needs.
The platform's marginal cost per user is the Anthropic bill plus a few cents of Postgres + MinIO storage. We can leave it running on resilland.com effectively forever, on a single $40/mo VPS, even if the entire Cerebral Valley jury decides to stress-test it.
The takeaway
Caching is not optional. If you are building on Opus 4.7 and you are not caching the dominant prefix, you are leaving 8/9 of the bill on the table. Cascade routing is not optional either — the market for "Opus 4.7 answering 'hi'" is small. The math only works if you put the cheap models in the path of the cheap questions, the smart model in the path of the smart questions, and let the prompt cache eat the repetition.