What we cite — the reference base behind every verdict
Every Phase-4 verdict the platform produces traces back to nine real World Bank reference reports plus a 125-document RAG. No scraping, no synthetic data, no proprietary feeds.
One of the easiest ways for an LLM-powered tool to lose credibility is to confidently cite a source that does not exist. We took the long route on this — every claim Opus 4.7 surfaces is anchored to a document on disk that you can open, read, and disagree with.
The nine teacher documents
The Phase-4 feasibility reports we used as the teacher signal:
- RESILAND CA+ Phase-4 Feasibility Study — Andijan / Khojaobod district (2023).
- RESILAND CA+ Phase-4 — Surxondaryo / Sherobod (2023).
- RESILAND CA+ Phase-4 — Namangan / Mingbuloq (2024).
- RESILAND CA+ Phase-4 — Khorezm / Yangibozor (2024).
- RESILAND CA+ Phase-4 — Karakalpakstan / Kungrad (2024).
- WB Restoration Opportunities Atlas — Central Asia chapter (2022).
- FAO Dryland Restoration Guidelines — version 3 (2021).
- Uzbekistan-2030 Strategy Alignment — forestry section.
- Türkiye OGM Dryland Afforestation Manual — translated excerpt (2020).
These live in backend/data/phase4_reports/ on the build machine and are gitignored — the underlying PDFs are not redistributable. What we ship is the structured extraction (sections, tables, citations), regenerated at build time.
The 125-document RAG
Around the nine teacher reports, we built a retrieval layer covering supplementary material:
- Policy framework documents. WB RESILAND program brief, UNFF strategic plan, SDG 15 indicators, Bonn Challenge commitments.
- Technical guidelines. Sentinel-2 NDVI interpretation rules, soil-classification handbooks, MRV monitoring frameworks.
- Species data. 64-species catalogue with bioclimatic ranges, Türkiye OGM dryland species sheets, FAO Ecocrop entries for the relevant taxa.
- Cadastral references. Uzbek land-fund classification, RESILAND-9 priority site list, oblast/district administrative codes.
Total: 125 documents, chunked semantically (50-token overlap), embedded with Voyage-3 (1024-dim), stored in PostgreSQL pgvector. Average 6 chunks per document, ~750 retrievable chunks.
How the citations work
When Opus 4.7 drafts a section like "Climate suitability for Pistacia vera on this parcel", it does three things:
- Pulls grounding context (the relevant teacher excerpts + retrieved RAG chunks) into the prompt.
- Drafts the section, marking each factual claim with an inline citation marker like
[3]. - Emits a footer mapping each marker to a specific document + paragraph offset.
The frontend renders [3] as a clickable badge. Click it and a side-panel opens with the original paragraph highlighted, the document name, and the page number. Disagree with the cited source? You see exactly which paragraph to argue with.
What we do not do
- No web scraping. Every document is either public (open-data licence) or part of an explicit redistribution agreement we hold.
- No synthetic data in the verdict path. The 10 demo parcels and 5 demo nurseries we ship for clone-and-go are clearly labelled as synthetic; they never propagate to live reports.
- No silent fallback to model-knowledge. If the RAG returns zero relevant chunks for a query, the verdict says "insufficient context" rather than fabricating a guess.
What this means for trust
Phase-4 reports are real procurement instruments. A consultant signs them, an agency files them, a donor disburses based on them. If our platform draft cites a reference that does not exist, we produce a credibility crisis and the project ends. So we made the design boring on purpose: every claim → real citation → opens to a real document → which a domain expert can verify in 30 seconds.
If you work in dryland forestry and want to evaluate the platform on a parcel you know personally, the Getting started guide walks through the demo allowance flow. Bring your own area, draw a polygon, watch what the citations open to.