What Gilded Age America and 1960s police can teach us about state capacity
The economic literature on state capacity in America, briefly explained
What is the American government — federal, state, and local — capable of doing?
To judge from recent news and research, nothing much. State governments can’t build high-speed rail: just look at California’s project, which due to a mix of political chicanery and permitting problems has yet to open nearly two decades after announcement. The federal government can’t build ships for the navy: China’s shipbuilding capacity is 232 times that of the US, and the ships the US does construct are routinely overbudget and years late. The feds and states can’t get people benefits on time; in the wake of the pandemic, the unemployment insurance system went from paying out over 97 percent of claims within three weeks to barely half. States and localities have gotten worse at building highways, with per-mile costs of highway construction more than tripling between the 1960s and 1980s.
These problems, and many more besides, are what writers and analysts mean when they talk about the US problem with “state capacity,” or the ability of the government to actually execute on its policy goals. The US used to have a lot of state capacity. The US used to be able to get a nuclear power plant online within four years of construction starting; to build a 17-station line on the DC Metro within eight years of groundbreaking; to erect the tallest building in the world in barely over a year. The question is how to get back to that.
There isn’t a lot of high-quality evidence on specific policies that enhance or erode state capacity, especially in high-income countries like the US. (If you’re an economist or political scientist interested in producing more high-quality evidence, please email me.) But we do have a few studies I’ve found particularly helpful in thinking about these questions.
Most of the relevant work here examines workforce reforms: attempts to create civil service programs or otherwise premise hiring, firing, and promotions on merit rather than political savvy. This isn’t terribly surprising. Civil service reforms offer a way for researchers to pinpoint a major change to the way government works, and see its effects. But I’d love to see more work on state capacity enhancements that took different forms as well.
Everyone’s favorite case study: the Pendleton Act

Here in the US, by far the biggest change in the way government employees are managed, hired, and fired occurred in 1883, with the passage of Pendleton Civil Service Reform Act. Netflix has popularized the history here a bit with the show Death by Lightning but to recap: prior to the Pendleton Act, most federal positions were allocated per the spoils system, largely to loyalists of the party in power, rather than on the basis of experience or competence. Attitudes about this system were a major cause of intra-party friction among Republicans.
Then a disgruntled office-seeker who failed to get a spoils position murdered President James Garfield. The new president, Chester A. Arthur, was historically a “Stalwart” (that is, a Republican supportive of the spoils system) but had a change of heart in the wake of the assassination and signed the Democrat-authored bill creating a merit-based civil service.
There are a two big questions to ask about this change, from a state capacity perspective:
Did it change much? Did the Pendleton Act have teeth and actually transform the composition of the federal bureaucracy?
Did it improve the government’s efficiency or performance?
The answer to 1 seems to be an emphatic “yes.” In a 2024 paper, UC Davis’ Diana Moreira and Santiago Pérez looked at the effect of the Act on customs collectors, who in the tariff-heavy Gilded Age were among the most important (and most easily corruptible) bureaucrats in the federal government. The Act, they found, professionalized the workforce considerably.
New customs hires who had to undergo exams as part of the new civil service system “were 8 percentage points less likely to report working in an unskilled occupation prior to joining the Customs Service and 7 percentage points more likely to report working in a professional one. Moreover, targeted employees were also 4 percentage points more likely to report being literate in the census.” After ten years, nearly half of customs collectors had been hired through such an exam system. The implication is that the Act made the federal workforce, at least in this area, markedly more qualified.
In a separate paper also released in 2024, UC Berkeley’s Abhay Aneja and Guo Xu examined the Pendleton Act’s effect on another common federal job: postal carriers. They note that eight years after the Pendleton Act took effect, applying at first only to large post offices with 50 or more employees, “almost 20 percent of the average reformed post office was still comprised of patronage appointees.” Aneja and Xu emphasize the continuity, but the same data indicates that over 80 percent of employees had been hired by merit-based processes. The Act seemed to actually change who was working in post offices.
Where the research diverges is on question 2: did this improve government efficiency and performance? Aneja and Xu found it did: “On average, reformed cities experience a reduction in delivery errors by 22 percent.” The Postal Service got better at its core function — getting mail to the right person — and if you buy Aneja and Xu’s differences-in-differences design1, the effect was caused by the Pendleton Act. The Act also, they found, increased productivity: the volume of mail delivered per carrier grew by 8 to 14 percent in cities adopting reformed hiring.
But Moreira and Pérez didn’t see these kinds of improvements in customs collection. They found no evidence that the reform reduced the expenses of customs offices, or their number of employees. They also found no positive effect on revenue collected by the customs offices, and no improvement in “revenue per employee” (not surprising, given that the number of employees and the total revenue didn’t change).
Both papers touch on the seeming discrepancy between their findings. Moreira and Pérez emphasize that in the Customs Service, low-paid employees were exempted from the reforms, providing a way for the spoils system to persist within customs offices. That might explain the lack of improvements they found. Aneja and Xu argue that the value of the act for postal workers came from insulating them from political influence, and the risk of losing their job with a change in administration. The reforms to customs, with the carve-out for lower-paid employees, provided less insulation, and produced less benefit, in this reading.
Customs collection and postal delivery are, of course, only two elements of government performance, albeit elements that lend themselves easily to measurement. The positive reading is that these papers show that civil service reforms meaningfully increase state capacity, but that they only do so when they’re reasonably comprehensive (as they were for post offices). A more nuanced reading is that these reforms had heterogeneous effects: they might improve government performance at some things and not matter for others. I lean toward the latter view, but am open to persuasion.

Zooming out from the 1880s
The Pendleton Act is a nice case study: it affected the whole federal government, and it was far enough in the past that we can get reasonably long-run data on its importance and effects. But most of the history of states and state capacity took place outside of the 1880s, and it’s possible that these other context periods also hold valuable information. (I’m still keeping the focus on the US specifically for now, though international evidence on civil service reforms could also be relevant to the US case.)
We could look at lower-level bureaucracies in the US. A 2014 paper from the University of Houston’s Gergely Ujhelyi, who is responsible for an astonishing share of the economics literature on civil service issues, examines the introduction of civil service reforms at the state level. Ujhelyi’s results are fairly ambiguous, normatively speaking. States that introduce merit systems for hiring spend less at the state level, and instead transfer more funds to the city level. This looks a lot like evasion of the reform rather than real compliance: politicians move money to the part of government where they can still control hiring. Beyond just shifting spending, the reforms lead to less spending on big infrastructure projects like roads. “The policy implication is that reforming multiple levels of government simultaneously could be more desirable than a gradual approach that focuses on specific levels,” he concludes.
In a 2019 working paper, the Hertie School’s Arianna Ornaghi looked at civil service reforms in the US specifically affecting police departments between 1960 and 1980. Adoption of merit systems led, she finds, to an overall crime rate reduction of 45 percent (a pretty staggering effect size).2 Both the scale of the effect and the specifics surprised me: there’s no effect on violent crime rates, and the entire effect is driven by reductions in property crime. Despite this, there was no effect on clearance rates for property crime (which are abysmal - police almost never figure out who stole from you) but a meaningful improvement in violent crime clearances.
Ornaghi interprets the findings as driven by the reforms’ ability to limit politicians’ pressure on police. The effects were bigger in cities where police chiefs were also protected, which is suggestive that the potential for political interference warped incentives and worsened police performance. This isn’t necessarily a story about outright corruption, where politicians are protecting specific criminals, so much as an indication that police chiefs who want to please political bosses act in ways that hamper police effectiveness. Maybe a good analogy is research finding that independent central banks outperform politically controlled ones.
Most of the high-quality studies I found concerned these kinds of merit reforms. But of course that’s not the only way to examine changes in state capacity. One promising genre of study examines how outcomes vary based on how effective individual bureaucrats are. Francesco Decarolis, Leonardo Giuffrida, Elisabetta Iossa, Vincenzo Mollisi, and Giancarlo Spagnolo have a nice 2020 paper using unexpected deaths in the US federal government to measure the importance of individual bureaucratic competence independent of other factors.
They use a rich dataset, the Federal Employee Viewpoints Survey, which asks a huge sample of federal workers a slew of questions, including about the overall competence of their immediate working division. Competence measured this way drops unexpectedly when younger managers die suddenly — and because those deaths are more or less random, they let the researchers see what a sudden shock to competence does to the office’s performance. (These deaths might seem too rare for an analysis like this to work, but the team found 440 such deaths in the years they analyzed. The federal government is very large!)
Decarolis et al conclude that a one standard deviation increase in bureaucratic competence causes a 23 percent reduction in number of days of delays in executing a procurement contract, a 29 percent reduction in cost overruns, and a 52 percent reduction in the number of renegotiated contracts. If every federal office handling procurement were as competent as, say, NASA’s John Glenn Research Center near Cleveland (which the authors highlight as especially competent), then the government would save $2.6 billion and 841,000 days of delay each year.
Cost overruns are a nice dependent variable; they provide a convenient way to quantify just how much better state capacity is saving taxpayers. Yale’s Zachary Liscow, Berkeley’s Cailin Slattery, and Columbia’s William Nober looked at the effect of government engineer quality on costs in state-level infrastructure projects. They find that these engineers pay for themselves many times over. Replacing an engineer at the 25th percentile of their quality measure with one at the 75th percentile leads to 14 percent lower costs per mile, they find — a savings representing more than three times the average salary for engineers.
We need better data, and better metrics
Put together, this literature makes a pretty compelling case that state capacity, or at least higher levels of competence in the public workforce, matters quite a bit for the quality of public services in a number of domains. In some of the literature, the mechanism for those competence improvements is clear (like passing civil service reform when the background context is a spoils system). In others it’s less obvious. We don’t have any simple way to make all highway engineers good.
Beyond that limitation, I’m struck by the sheer number of different outcome metrics being considered: cost per mile of highway construction; cost overruns for federal procurement; mail delivery accuracy; violent and property crime; revenue collection at ports. These are all context-specific and bespoke.
That’s fine so far as it goes, but limits our ability to evaluate broader reforms. There’s an active debate in the public management world over whether moving to “at-will” employment at the state level has had good outcomes (see Judge Glock and Renu Mukherjee for the “pro” case and Don Moynihan for the “con”). A big part of the debate is that we don’t have great outcome metrics. There are self-reports from employees and HR directors, which tell us something but are at best heuristics for the stuff we actually care about: lower crime, more services getting to people, lower-cost contracts.
It’s hard to improve at something unless you measure it — and right now, we’re not really measuring state capacity. Something I’d be excited to see are more efforts to develop broad metrics for state capacity that take into account performance across a number of these areas. Imagine something like Transparency International’s Corruption Perceptions Index which, like that measure, is a weighted average of several different factors: how good police are at solving crimes, how fast government benefits go out the door, how quickly major infrastructure projects are built, how much roads and subways cost per mile, etc. That would take a lot of doing, but could be an important cornerstone for future state capacity pushes.
A statistical technique for establishing cause and effect in cases like this where we don't have a clean experiment
Frankly I'm a little skeptical the true effect is this big, but Ornaghi's methodology looks strong.



I love this post! Are you talking with Kevin Hawickhorst about his take on Pendleton, etc?
Also, I'd argue that in some cases, like schools, we have pretty good outcomes metrics. And some are making the case that state capacity is the key to moving those numbers. I'm gonna write about this, but you might be interested in this:
https://www.niskanencenter.org/a-capacity-agenda-for-state-departments-of-education/
Amateurs talk policy, but professionals study personnel.