SUMMARY - Pilot Projects and Promising Models: What’s Actually Working?
A city launches a pilot program diverting mental health calls from police to civilian crisis teams, and after eighteen months the data shows fewer use-of-force incidents, faster response times, and higher satisfaction from families in crisis - yet when funding decisions come, the program's budget is cut while police overtime increases. A neighbourhood implements a violence interruption program using credible messengers from the community to mediate disputes before they become shootings, and gun violence drops thirty percent - but when the messengers seek expansion to neighbouring areas, they are told the evidence is not yet sufficient. A school district pilots restorative justice as alternative to suspension, sees disciplinary referrals and recidivism drop dramatically, yet when teachers in other schools ask to implement the approach, they learn the pilot will not be expanded. A jurisdiction tests a program connecting people released from jail to housing and employment support, and reincarceration rates fall by half - but the program remains small while prison budgets grow. Promising models for community safety exist. They have been piloted, evaluated, and proven. The question is not whether alternatives work but why proven alternatives remain pilots while failed conventional approaches remain defaults.
The Case for Scaling What Works
Advocates for scaling successful pilots argue that evidence should drive policy, that proven programs deserve expansion, and that continuing to fund ineffective approaches while leaving effective ones small is irrational.
We know what works. Decades of pilots, demonstrations, and evaluations have identified programs that reduce crime, prevent violence, improve outcomes, and cost less than conventional approaches. The knowledge exists. What lacks is will to act on it.
Pilots that work should become policy. The purpose of piloting is to test whether approaches work before implementing them widely. When pilots succeed, implementation should follow. Endless piloting of proven programs while conventional approaches continue regardless of evidence represents policy failure.
Cost-effectiveness demands scaling. Programs that produce better outcomes at lower cost should expand while less effective, more expensive approaches should contract. Basic resource stewardship requires allocating resources to what works.
From this perspective, moving from pilot to policy requires: commitment to evidence-based decision-making; political will to shift resources from conventional to alternative approaches; overcoming institutional resistance to change; and accountability for implementing what works rather than what is familiar.
The Case for Caution in Scaling
Others argue that pilot success does not guarantee scaled success, that context matters in ways pilots may not capture, and that rushing to scale risks wasting resources on approaches that do not transfer.
Pilots succeed in part because they are pilots. Small programs with dedicated staff, intense attention, and motivated participants may not work when expanded. The enthusiasm and resources that make pilots succeed may not be replicable at scale. Scaling failures of once-promising pilots litter the field.
Context shapes effectiveness. A program that works in one city may fail in another. Community characteristics, political environment, existing services, and local factors shape what approaches succeed. Scaling assumes transferability that may not exist.
Evidence standards should be high before major investment. A single pilot or even multiple pilots may not provide sufficient evidence for large-scale resource reallocation. Rigorous evaluation over time, across contexts, with attention to unintended consequences should precede scaling.
From this perspective, moving to scale requires: multiple successful implementations across diverse contexts; rigorous evaluation with appropriate controls; attention to implementation fidelity and local adaptation; and patience with evidence development before major resource shifts.
The Political Economy of Pilots
Why pilots remain pilots often has more to do with politics than evidence.
From one view, institutional interests resist change. Police unions, prison guard unions, and organizations benefiting from current approaches resist alternatives regardless of evidence. Pilots are permitted because they are small; scaling threatens interests that pilots do not. Political economy, not evidence, explains why proven programs stay small.
From another view, democratic societies appropriately deliberate before major policy changes. Skepticism about new approaches, desire for more evidence, and concern about unintended consequences represent reasonable caution. What looks like obstruction may be appropriate democratic deliberation.
Whether failure to scale reflects institutional capture or democratic deliberation shapes how to pursue change.
The Implementation Challenge
Scaling successful pilots requires solving implementation challenges that pilots may have avoided.
From one perspective, implementation can be figured out. If we know what works, we can develop systems to implement it widely. Training, standards, monitoring, and continuous improvement can maintain quality at scale. Implementation is solvable problem.
From another perspective, implementation is the hard part. Programs work because of specific people, relationships, and contexts that cannot be replicated. Scaling often produces pale imitations of successful pilots. The magic is in the implementation details that scaling loses.
Whether quality can be maintained at scale shapes what scaling can accomplish.
The Measurement Problem
What counts as a pilot "working" is not straightforward.
From one view, rigorous outcome measurement should determine success. Randomized controlled trials, quasi-experimental designs, and validated outcome measures provide the evidence base that policy decisions require. Programs should be scaled based on strong evidence.
From another view, narrow outcome measurement may miss what matters. Programs may succeed in ways not captured by standard metrics while failing in ways that are. Community-defined success, qualitative evidence, and participant voice should inform evaluation alongside quantitative measures.
What evidence counts shapes which pilots are deemed successful.
The Question
If we know what works, why do we keep doing what does not work? If pilots succeed and are not scaled, what was the point of piloting? When violence interruption reduces shootings but remains small while police budgets grow, what are we actually trying to accomplish? When restorative justice improves outcomes but remains experimental while punishment remains default, what values are we expressing? What would it take to actually implement what we have already proven? And when evidence exists and is ignored, what does that reveal about whether evidence actually drives policy?