What We're Building
A trusted geographic intelligence platform that collects, validates, documents, and serves geographic data — built so trust is structural, not a matter of discipline.
Nothing is considered trusted merely because it loaded successfully.
The platform exists to answer geographic questions with data you can stand behind: every number carries its source, its date, its owner, and its rules of use. It serves chapters, disaster operations, field intelligence, county and census-tract analysis, dashboards, maps, AI assistants, and executive reports — all reading from one governed spine instead of a pile of disconnected spreadsheets.
Accuracy & Validity
Is the data correct? Bad rows are rejected at the database, not hoped against.
Currency & Freshness
How current is it? Every source has a refresh schedule and a last-checked date.
Latency & Speed
How fast can we use it? Speed matters — but trust matters more.
Who It Serves
Chapters · disaster operations · field intelligence · county intelligence · census-tract intelligence · dashboards · maps · AI assistants · executive reports.
The Big Insight
We are not trying to collect everything. We are building a system that knows what data exists, where it came from, how current it is, how trustworthy it is, what geography it applies to, who owns it, how it can be used, whether AI can touch it, how to update it, and how to explain it.
How We're Building It
A layered architecture where raw files flow toward trusted answers, and only an approved, AI-safe layer is ever exposed.
Where The Data Lives
Four Postgres schemas separate concerns by trust level. Local for development and archive; one cloud Postgres as canonical.
gov
- source_registry
- field_catalog
- ingestion_log
- validation_results
- dataset_inventory
geo
- county_boundary
- geography_assignment
- tract_county_crosswalk
- geography_spine (view)
- geography_at() (fn)
topic
- fema_mobile_homes_county
- census_demographics…
- nri_risk_county…
- svi_tract…
- (joined by FIPS/GEOID)
serving · AI-SAFE
- geography_current
- fema_mobile_homes_county
- mobile_homes_with_chapter
- approved views only
ai_reader role is granted serving only and revoked from the rest, so approved_for_ai is now an enforced gate, not a label. Proven: ai_reader reads serving.* and is denied every base table (make secure, 5/5 pass).geography_at(fips, date) returns the assignment valid on the event date. A 2023 disaster resolves to the chapter that owned the county then, not now — validation concern #15, fixed in the schema rather than left to memory.Local vs. Cloud
Development & Archive
Raw archives, DuckDB analytics, testing, experiments, backup copy. Originals are preserved exactly as received and kept out of git.
Supabase Postgres + PostGIS
App data, APIs, Vercel apps, shared query access, AI access. Matches the existing stack (Keystone, Final Mile, Ask Clara). Canonical store; DuckDB stays local for analytics. Awaiting your sign-off — see Decisions.
Progress
The scaffold is built and proven against a live database. Every architecture choice is logged and reversible.
Schema Applies Clean
All five SQL files run end-to-end on Postgres 16 + PostGIS 3.4.
Guards Reject Bad Data
Integer-coerced FIPS, a second "current" assignment, and orphan topic rows are all refused by the database.
History Resolves Right
A 2023 event joins to the old chapter; current joins to the new one.
The AI Gate Holds
ai_reader reads the serving layer and is denied every base table. 5/5 pass via make secure.
+ D-001 · Split geography spine into boundary + assignment ACCEPTED
The manual specifies one spine table holding polygon, assignment, dates, and a flag. We split it into geo.county_boundary (stable polygon) and geo.geography_assignment (temporal, SCD-2), and rebuilt the manual's spine as a view. Why: county shape is stable while chapter assignment changes — one table duplicates the full polygon on every realignment. Reversible on request.
+ D-002 · No stored current_flag ACCEPTED
is_current is derived (effective_end IS NULL) in the view; a unique partial index enforces one open assignment per county. Why: a stored flag can silently contradict the dates.
+ D-003 · Tract crosswalk + geography_at() ACCEPTED
Added geo.tract_county_crosswalk and geo.geography_at(fips, date). Why: the spine is county-grain; tract topic tables had no bridge. geography_at() returns the assignment valid at an event date — fixing the "current geography applied to historical events" concern by construction.
+ D-004 · FIPS and GEOID stored as text ACCEPTED
county_fips char(5), tract_geoid char(11), with regex + prefix CHECK constraints. Why: integer storage drops the leading zero (01001 → 1001). Verified: a malformed FIPS insert is rejected.
+ D-005 · One canonical cloud Postgres ACCEPTED
Decision: Supabase canonical (PostGIS, matches the existing stack), DuckDB local for analytics, GitHub for schema/metadata. Neon and Railway dropped. The schema runs unchanged on any of them.
+ D-006 · ai_reader role isolation ACCEPTED · PROVEN
Built and proven. The ai_reader role (NOLOGIN group) is granted USAGE + SELECT on serving only and revoked from gov/geo/topic; serving views expose data via owner's rights. tests/ai_reader_isolation.sql confirms on Postgres 16 / PostGIS 3.4 that ai_reader reads serving.* and is denied all four base tables — 5/5 pass. On Supabase: GRANT ai_reader TO anon, authenticated; (never service_role).
+ D-007 · Canonical geometry source ACCEPTED
Decision: PostGIS canonical, GeoParquet archive-only. No dual-write validation test needed, since Parquet is not a competing source of truth.
+ D-008 · Owner's manual is the in-repo canonical spec ACCEPTED
The specification now lives in the repo at docs/OWNERS_MANUAL.md with a provenance header, sourced from the "Data Project" Google Doc. The build narrative lives in docs/origins/. Why: a governed platform keeps its own spec under version control.
+ D-009 · Universal geography vs. optional Red Cross overlay ACCEPTED
The universal layer (county_boundary, tracts, topic.*, geometry) carries no chapter/region/division and serves any project. The Red Cross hierarchy lives only in geo.geography_assignment (temporal), joined on demand. Why: chapter assignment is RC-specific, has quirks (Fairfield County → ARC of Metro NY North), and changes over time — non-RC projects shouldn't see or pay for it. Result: two serving views — serving.county_reference (chapter-free) and serving.county_intelligence (with chapter).
+ D-010 · Hierarchy vintages dated to the RC fiscal-year boundary (July 1) ACCEPTED
Red Cross geography changes take effect on the fiscal-year boundary, July 1 (FY27 = 2026-07-01). The ingestion dates each hierarchy vintage to that boundary, not the upstream file's build date. Why: the CA realignment + SW Florida rename are effective when the fiscal year turns, not when the CSV was generated.
+ D-011 · Never trust a source's embedded chapter; re-enrich by FIPS ACCEPTED
Red Cross ArcGIS / operational sources are often behind and carry stale chapter/region/division values. The platform never stores a source's chapter — it derives chapter from the authoritative spine by county_fips (geography_at() for historical correctness). Topic tables structurally forbid a chapter column (D-001). Why: the source of truth for hierarchy is the governed spine, not whatever a layer embedded when it was last updated.
+ D-012 · Connecticut representation ACCEPTED · BUILT
CT abolished county government; the Census adopted 9 planning regions as county-equivalents (FIPS 09110–09190). Decision (revised by research): represent CT as the 9 planning regions, not the legacy 8 counties — 2022+ CT tract GEOIDs already embed the region code, so adding 9 rows to the spine rolls up all 884 CT tracts with no crosswalk. Geometry from Census 2024. Chapter mapping is approximate (regions don't align to RC chapter lines): Greater Bridgeport + Western Connecticut → ARC of Metro NY North (preserving the Fairfield→NY rule), other 7 → ARC of Connecticut. Legacy 8 kept only for pre-2022 data (OpenFEMA declarations). Built & verified: 884 tracts roll up, rejects 1,868 → 984.
+ D-013 · First-load effective-dating uses a baseline sentinel ACCEPTED · BUILT
A naive first load stamped every county effective_start = 2026-07-01, which made geography_at() return nothing for pre-FY27 dates — silently breaking the historical join the split spine exists to protect. Fix: unchanged counties get 1900-01-01 (baseline_sentinel, honest "unknown start"); the ~64 FY27-changed counties + 9 CT regions get 2026-07-01 (fy_boundary). New effective_start_basis column + a btree_gist EXCLUDE constraint banning overlapping spans. Verified: 3,076 baseline + 73 fy_boundary; a 2020 lookup resolves correctly.
+ D-014 · Source staleness is quarantined + reported, not trusted ACCEPTED · BUILT
RC ArcGIS/operational sources lag and ship stale embedded chapter/region/division. A source's claimed hierarchy may land ONLY in gov.source_assignment_observation (audit-only) — never a topic/serving table; chapter is always re-derived from the spine by county_fips. gov.source_vs_canonical_mismatch compares each claim to the assignment valid at the source's own vintage (stale vs. genuine conflict). Validation check #11 fails the load if any topic.* table grows a chapter/region/division column.
The Path
The first build sequence from the owner's manual, with the deviations baked in. Green is done; red is where we are now.
Create repo structure & ordered schemas
Four schemas, five SQL files, validation suite, intake form, Makefile.
Prove it on a live database
Applied clean on Postgres 16 + PostGIS 3.4; guards reject bad data; history resolves correctly.
Resolve decisions & wire the ai_reader gate
D-005 Supabase · D-007 PostGIS · D-006 built & proven — the serving wall is now enforced, not advisory.
Ingest the master geography from red-cross-data
3,140 county boundaries + a dated hierarchy vintage (FY27, eff. 2026-07-01) loaded from the canonical upstream repo. Leading-zero FIPS + chapter codes preserved.
Load the tract → county crosswalk
83,528 tracts loaded; 1,868 correctly rejected (CT planning regions, AK boroughs, PR) and logged — not force-loaded.
Load reference topics + serving views
ALICE · SVI · NRI · FEMA declarations + mobile homes, each per-source with provenance. serving.county_intelligence + chapter-free serving.county_reference. 10/10 validation, AI gate 5/5.
Deploy the governed schema to Supabase
Schema + AI gate applied to the "Geographic Intelligence Platform" project (additive; existing tables untouched). Gate verified holding.
Connecticut + effective-dating + staleness governance
9 CT planning regions added (884 tracts roll up); first-load baseline-sentinel fix; source-staleness quarantine + checks. Supabase schema synced to match.
Load the data into Supabase
Full platform loaded into the "Geographic Intelligence Platform" project — 3,149 counties, 84,412 tracts, 5 topics. Re-verified on Supabase: validation assertions all clean, AI gate holds.
Hand the repo to Codex for adversarial review
Against a committed schema + a real load (now satisfied) — only then add the next source.
Decisions Needed
All three gatekeeper calls are resolved (2026-06-25). The next move is the first real data load.
| ID | Decision | Resolution | Status |
|---|---|---|---|
| D-005 | Which cloud Postgres is canonical? | Supabase canonical + DuckDB local; Neon/Railway dropped. | RESOLVED |
| D-006 | Enforce the serving wall with ai_reader? | Built + proven — grants serving only, revoked from the rest. | RESOLVED |
| D-007 | Which store is canonical for geometry? | PostGIS canonical, GeoParquet archive-only. | RESOLVED |
Next up: add Connecticut to the spine and load the data into Supabase (needs the project connection string). See Working Notes below.
White Paper
The full thinking behind the platform — the trust model, the architecture, and the governance contract — in one printable document.
A Governed Geographic Intelligence Platform
Why trust must be structural, how the four-layer architecture enforces it, and the path from raw files to AI-safe answers.