Moratorium Nation

Methodology

How this dataset was built, in plain English.

What we set out to do

Every U.S. local government has the legal authority to pause new development of certain kinds for a defined period. When a city council, county commission, or township board uses that authority, they typically do it through a public ordinance or resolution that is published online (sometimes), posted in a meeting agenda (often), and recorded in board minutes (almost always, eventually).

Our goal: identify every such moratorium adopted in the U.S. that targets data centers, battery storage, solar, wind, or cryptocurrency mining — and capture enough structured information about each one to support cross-jurisdictional comparison.

How we did it

Three phases.

Phase 1: Document collection

We deployed AI-assisted research agents (built on the OpenAI Codex CLI with web-search enabled) across all 50 states. Each agent operated within a single state's scope and was given a research brief for that state.

The agents searched:

Each agent was instructed to download original documents — PDFs of ordinances, HTML of agenda pages, Word documents — and save them locally with provenance metadata (source URL, download timestamp, retrieval method).

We supplemented this with a SerpAPI sweep for "<state>" "data center" moratorium and similar queries, which surfaced documents the per-state agents had missed.

Output of Phase 1: approximately 4,400 unique source documents, totaling ~12 GB, archived in their native formats. Each has a .meta sidecar JSON file recording its provenance.

Phase 2: Classification

Not every document we collected is a moratorium document. Many are project announcements, EIA reports, news articles unrelated to any specific ordinance. We classified each document with a small language model (gpt-5.4-mini at the OpenAI flex tier) using structured prompts that produced JSON-valid classifications:

Output of Phase 2: 709 documents classified as moratorium-related across the corpus. About 1,123 of the 4,400 are primary legal sources of one kind or another.

Phase 3: Structured extraction

For each moratorium-related document, we used a larger language model (gpt-5.5 at the OpenAI flex tier) with a detailed extraction schema to produce a structured record.

The extraction schema captures 60+ fields per document, organized into five tiers that mirror the 44-clause taxonomy used in the working paper:

Each extraction received a confidence score from the language model. We retained extractions with confidence ≥ 0.4 for downstream analysis. The cohort is n = 348, with mean confidence 0.72 and range 0.40 to 0.95.

Output of Phase 3: the JSONL file at data/structured_extractions.jsonl.

Manual review and cleaning

We manually reviewed every extraction record to:

The final cleaned inventory has 222 entries across 30 states (data/moratorium_inventory.csv).

Phase 4: Geocoding (added v2026.04.2)

Each row in the cleaned inventory was assigned WGS84 latitude and longitude coordinates representing the jurisdiction's centroid. Two-tiered approach:

  1. Primary geocoder: OSM Nominatim. Free, open-source, with reasonable U.S. administrative boundary coverage. Rate-limited to 1 request/second per the public API usage policy.
  2. Fallback: U.S. Census Geocoder. Used when Nominatim returns no result. The Census Geocoder is authoritative for U.S. jurisdictions but works best for street addresses; for "Jurisdiction, State" queries we found Nominatim more reliable.

Of 222 rows, 220 (99.1%) were successfully geocoded. The 2 blanks are aggregate meta-rows (Other Reported Local Moratoria, Michigan and Proposed or Rejected Local Pauses, Maryland) that aren't real geographic points.

After geocoding, a triple-check audit ran 89 verifications across three independent methods:

  1. Random sampling against geographic knowledge (24 rows): manually verify each coordinate matches a well-known location.
  2. Wikipedia GeoSearch reverse-lookup (50 rows): query Wikipedia for pages within 10 km of our coordinates; verify the jurisdiction name appears among them.
  3. Targeted high-risk subset (15 rows): the 4 manual within-state-ambiguity fixes plus other generic township names where ambiguity is most likely.

Across all 89 verifications, zero confirmed wrong geocodes (after the 4 manual Ohio corrections in v2026.04.2). The audit caught and corrected:

Each correction used article-context disambiguation (legal_basis, trigger, and news-source mentions). Treat the lat/lon column as ≥99% accurate. The script is scripts/geocode_inventory.py; re-run after adding new rows to fill in their coordinates.

Why the inventory (n=222) is bigger than the extraction cohort (n=348)... wait, that's smaller

Right — the numbers can be confusing. Here's the difference:

The two numbers measure different things and don't need to match. The 222 is the headline count of moratoria; the 348 is the size of the line-coded sample used for clause-prevalence percentages.

What we don't claim

Reproducibility

Every step of the pipeline can be re-run. The scripts are in scripts/ with a README explaining each one. To regenerate every table and figure from the source data:

python -m scripts.generate_tables   # rebuilds all tables/*.tex
python -m scripts.moratorium_maps all   # rebuilds all figures/

The original document corpus (~12 GB) is not in this repository (it's hosted separately on Zenodo as the supplementary data deposit) but the cleaned inventory + structured extractions are sufficient to reproduce all published statistics.

Tooling and models

Step Tool Model
Document discovery OpenAI Codex CLI with web-search gpt-5.5 at medium reasoning effort
State-month chronology OpenAI Codex CLI with web-search gpt-5.5 at medium reasoning effort
SerpAPI ordinance search google-search-results Python package n/a
Document download Playwright + stealth wrappers n/a
OCR (image-based PDFs) EasyOCR + Tesseract n/a
PDF classification pydantic-ai with OpenAI provider gpt-5.4-mini at flex tier
Structured extraction pydantic-ai with OpenAI provider gpt-5.5 at flex tier
Real-browser verification Playwright + system Chrome (Xvfb) for JS-rendered portals n/a
Aggregation, table generation, mapping Python (pandas, geopandas, matplotlib, seaborn) n/a

Updates

Each refresh of the dataset is a tagged GitHub release (v2026.04, v2026.10, ...) with a corresponding Zenodo DOI (planned). Refresh cadence is roughly quarterly while the moratorium wave is active.