Data

The canonical data files for this project.

Files

File	What it is	Size	Format
`moratorium_inventory.csv`	The 222-row main inventory. Start here.	~190 KB	CSV
`state_legislation.csv`	413 state bills tracked in 2025–2026	~150 KB	CSV
`summary_stats.json`	Top-level aggregates (counts by state, by status, etc.)	~12 KB	JSON
`structured_extractions.jsonl`	864 lines: 526 successful clause extractions + 338 LLM-call errors retained for transparency	~2 MB	JSONL
`clause_extraction_analysis.json`	Pre-computed clause prevalence statistics for the n=348 cohort (confidence ≥ 0.4)	~7 KB	JSON
`moratorium-extraction.json`	JSON Schema describing the 60-field extraction format	~25 KB	JSON Schema

Column definitions

See docs/codebook.md for full definitions of every column in every file.

Three columns most users want

The inventory CSV has 19 columns. The three most useful for typical analysis are:

enacted_status — closed-vocab status: active, extended, replaced, expired, rescinded, or pending. Filter to active + extended for "moratoria currently in force"; exclude pending for "ever-enacted moratoria".
moratorium_id — stable identifier (<state>-<jurisdiction>-<year>, with phase suffixes for repeat instruments). Use this as a primary key when joining with future releases of this dataset.
latitude / longitude — WGS84 coordinates of the jurisdiction centroid. Drop the CSV into Mapbox, kepler.gl, Tableau, or any GIS tool to get an instant point map. 220 of 222 rows are geocoded (99.1% coverage); the 2 blanks are aggregate meta-rows. Coordinates were triple-checked with zero confirmed errors after manual fixes for 4 within-state ambiguities — see docs/known-gaps.md for the audit summary.

Quick examples

Count moratoria currently in force, by state (Excel): Open moratorium_inventory.csv, filter enacted_status to active and extended, then PivotTable on state.

Same in Python:

import pandas as pd
df = pd.read_csv("moratorium_inventory.csv")
in_force = df[df["enacted_status"].isin(["active", "extended"])]
print(in_force["state"].value_counts())

Filter to data-center-only moratoria (Python):

df_dc = df[df["trigger"].str.contains("data center", case=False, na=False)]

Just the headline numbers (any tool): Read summary_stats.json. It has total counts, top-states list, and per-state aggregate stats including the enacted_status_breakdown field — useful for embedding live values in dashboards or articles.

A note on `structured_extractions.jsonl`

This file has 864 lines. 526 are successful structured extractions; 338 are LLM-call errors retained for transparency (so you can audit what didn't extract cleanly). The n=348 cohort referenced in the working paper is the subset with extraction.confidence >= 0.4.

import json
records = []
with open("structured_extractions.jsonl") as f:
    for line in f:
        r = json.loads(line)
        if r.get("error") is None and r["extraction"]["confidence"] >= 0.4:
            records.append(r)
assert len(records) == 348

A note on dates

Most date_enacted values are ISO format (YYYY-MM-DD). Some are YYYY-MM (day uncertain), YYYY (only year known), or include qualifying notes (reported, [VERIFY], proposed). When parsing programmatically, expect to handle both clean and messy values. The examples/ folder has working code.

Updates

This data is current through April 2026. Each refresh is tagged as a release on GitHub. The current version is shown in ../CITATION.cff.