Method

How the estimates are built and how they should be interpreted.

This page covers the shared methodology — outcome series, synthetic control design, donor pools, placebo-weighted ensemble, treatment windows, and inference. Individual place reports contain place-specific diagnostics and robustness checks.

Two outcome series

Every place in the Explorer is tracked on two distinct economic measures. Real Gross Value Added (GVA) records the value of goods and services produced within a geography — a workplace-based production concept expressed in real chained-volume terms at 2019 prices (ONS, annual, available through 2023). Gross Disposable Household Income (GDHI, B.6g) records what households can actually spend or save after taxes, social contributions, and transfers — a residence-based welfare concept reported in nominal terms (ONS, annual, through 2022).

The two series answer different welfare questions and need not move together in response to a trade shock. GVA records where output is lost; GDHI records where household purchasing power is affected. A place that loses productive capacity but retains workers who commute elsewhere may see GVA fall faster than GDHI. A region like London, with large net commuting inflows, may see a production loss propagate more diffusely across the commuting catchment in income terms. Fiscal redistribution — tax credits, unemployment insurance, benefits — also cushions GDHI relative to GVA. Both series are therefore tracked, and where they diverge, that divergence is interpreted as a signal about the internal economic flows connecting places.

Synthetic control estimation

For each place, outcome, and donor-pool specification, the estimator solves a simplex-constrained quadratic programme: find non-negative weights that sum to one and minimise the pre-treatment discrepancy between the treated place and its weighted donor composite. The simplex constraint keeps the counterfactual interpretable as a convex combination of observed economies and prevents extrapolation. GVA and GDHI are estimated independently — a deliberate choice that preserves the production–income wedge as an object of study rather than suppressing it through a shared weight vector.

All series are normalised to an index of 100 in 2000 before estimation. The post-treatment gap — actual minus synthetic — is the primary quantity reported. A persistent negative gap is interpreted as evidence of weaker performance relative to the counterfactual. Comparing the two gaps (GVA versus GDHI) allows examination of whether the production cost of Brexit and the household income cost align, and where the redistribution system has cushioned or amplified the shock.

Donor pool design: 31 combinations

Constructing a counterfactual from a single donor pool risks contamination by idiosyncratic features of that pool. To guard against this, the analysis enumerates the full set of 25 − 1 = 31 non-empty subsets of five base donor sources:

Where a country contributes NUTS2 sub-national regions, its national aggregate is excluded to prevent double-counting. The maximum combined donor universe contains 253 units for GVA and 322 units for GDHI. A separate synthetic control is estimated for every triple of (place, outcome, donor-pool combination), producing up to 31 candidate counterfactuals per place and per series.

Placebo-weighted ensemble

Selecting the single best-fitting synthetic control by minimising in-sample fit risks overfitting and upward bias in post-treatment prediction quality. Instead, the primary counterfactual is a placebo-weighted ensemble that scores each candidate specification on out-of-sample predictive performance using four pseudo-treatment placebo windows (2010, 2011, 2012, 2013). For each false treatment date, the synthetic control is re-estimated on data prior to that date and then evaluated against the validation window (2012–2015). The root-mean-square prediction error over that held-out window becomes each candidate's score.

Candidates are ranked by this score (rank 1 = best) and assigned softmax weights with concentration parameter κ = 0.25. This gives meaningful weight to the top several specifications while discounting poorly fitting ones. The ensemble counterfactual is then the weighted average of the 31 real-window synthetic controls, with weights fixed from the placebo step — so no post-treatment data enters the construction of the counterfactual. The spread of gap estimates across the 31 specifications is also reported as a robustness envelope in each place report.

The placebo-weighting approach penalises donor pools whose economies diverge from the UK's pre-referendum trajectory for reasons unrelated to Brexit. Specifications dominated by fast-industrialising economies tend to produce poor pre-treatment fits and therefore receive lower weight.

Two treatment windows

The Brexit output gap is estimated under two distinct treatment codings:

  1. Post-2016 — Treatment begins in 2016, the year of the referendum. Pre-treatment calibration period: 2000–2015 (16 years). Post-treatment period: 2017–2023. This window captures the full economic cost of Brexit including anticipatory channels — sterling depreciation, trade policy uncertainty, and supply-chain reorganisation — that are well-documented to have begun before legal exit in January 2020.
  2. Post-2020 — Treatment begins in 2020, the year of legal exit from the EU and the onset of COVID-19. Pre-treatment calibration: 2000–2019 (20 years). Post-treatment period: 2020–2023. This window isolates the period of direct institutional change — the end of free movement of goods, services, capital, and people — from the earlier uncertainty channel. Because both Brexit and COVID-19 coincide in 2020, the excess gap above what donor-country trajectories imply is attributable to Brexit plus any idiosyncratic UK pandemic response, not the pandemic per se.

Comparing results across the two windows provides a diagnostic: a large post-2016 gap with a comparatively smaller post-2020 gap supports an important role for the uncertainty channel. Similar magnitudes point to direct trade and regulatory disruption as the dominant mechanism.

Inference via restricted placebo units

Standard significance testing for the synthetic control is complicated at the aggregate level because there is only one treated unit. The conventional approach constructs a reference distribution by pseudo-treating each donor unit in turn. For each place and its selected donor pool, the full estimation problem is re-estimated for each donor unit, treating it as though it were the treated unit. The resulting distribution of placebo output gaps characterises the null. The share of donor units whose average post-treatment placebo gap is as extreme or more extreme than the actual gap provides a non-parametric p-value. The robustness tab in each place report shows this distribution alongside the treated gap.

An important caveat applies. The reference distribution is restricted to the same donor pool used to construct the actual counterfactual — economies pre-selected for their pre-treatment similarity to the treated unit. Because these donors were chosen for their stability and closeness to the treated trajectory, they tend to show small, well-behaved post-treatment deviations when pseudo-treated in turn. The resulting reference distribution is therefore narrower than it would be if less stable, noisier donor economies were included. Those excluded economies would introduce genuine post-treatment variance into the reference pool — larger spurious gaps arising from their own idiosyncratic shocks. Against this more variable backdrop, the treated unit's systematic and persistent gap would appear as a cleaner signal relative to noise, improving the statistical contrast and reducing p-values. By restricting to stable significant donors and excluding this contrast-improving variance, the reference distribution is artificially tight, and the p-values produced are conservative: they overstate the probability that an equally large gap would be observed by chance. The additional constraint that the minimum achievable p-value is bounded below by 1/(n+1) — typically 0.05–0.10 with active pool sizes of ten to thirty — reinforces this conservative character. The correct inference is that a treated gap sitting clearly outside the placebo cloud constitutes meaningful evidence of an unusual post-treatment trajectory; the precise p-value should not be read as a sharp threshold but as an upper bound on the true probability under the null.

Interpretation guardrails

The robustness tab in each place report shows the full placebo distribution, the donor-pool sensitivity envelope (all 31 specifications), and the pre-treatment fit diagnostics. These are the primary checks for assessing whether a given place's estimate is credible.