
Imagine reviewing six million documents by hand. For help, these lawyers turned to AI
Published on June 11, 2025
Most legal systems form gradually, layer by layer, like sediment. New laws tend to build on old ones, rarely replacing them outright, which means outdated ideas often stick around long after they've lost relevance. Statutes get patched and stacked, but they’re rarely erased. A single bill might cite thirty others—and entire agencies still rely on language drafted decades ago, often with different values. Over time, this makes the legal system harder to navigate as it becomes inconsistent and full of language that no longer fits the world it was originally meant to regulate.
Take this data point: The U.S. Code, the official collection of federal laws of the United States, runs over 24 million words long. Local and state governments pile on thousands of additional statutes, amendments, and rules each year, adding to the complexity. Buried in this layered legal morass are language and clauses that are outdated, contradictory, and in some cases, discriminatory and illegal. Sorting through it all has become a growing priority, driven by new state mandates and a broader recognition that leaving broken laws on the books can sow confusion and stall justice.
In response, governments are now testing a new strategy to use artificial intelligence as a kind of second reader. Large language models trained on the law are being deployed to spot loopholes and outdated provisions, as well as to search mountains of documents for rules that no longer serve any enforceable legal purpose. That brings us to an AI experiment in Northern California, where a team recently used AI to scan more than 86,000 historical property records—just a fraction of the county’s six million archived deeds. Reviewing the full set by hand would have taken human auditors an estimated 160 years.
Helping humans do better work
In 2021, California passed Assembly Bill 1466, a law that tasked county governments with removing racially restrictive covenants from property records. These covenants, which included phrases embedded in mid-century deeds that barred people of color from buying or occupying certain homes, haven’t been legally enforceable since the 1948 Supreme Court case Shelley v. Kraemer. But because they’re embedded alongside valid legal language and scattered across millions of scanned pages, they’ve quietly survived.
Reviewing six million documents by hand would have taken an estimated 160 years and cost more than 22 million dollars.
Santa Clara County had about six million of these older records on file. They’d already been scanned—but as images, not searchable text. The documents used inconsistent formats, sometimes included typos or handwriting, and spanned decades. That meant there was no easy way to run a keyword search or use off-the-shelf tools. Sorting through them would’ve required someone to open and read each file, one by one. Reviewing them by hand would have taken an estimated 160 years and cost more than 22 million dollars.
So the county turned to Stanford’s Regulation, Evaluation, and Governance Lab (or, RegLab), a research group that partners with public agencies to modernize outdated systems. “The scale of the manual review task was as apparent and daunting to them as it was to us,” says data scientist Faiz Surani. To move faster, the team built a custom tool using a fine-tuned version of Mistral 7B, an open-source language model developed by the Paris-based startup Mistral AI, founded in 2023 by former DeepMind and Meta researchers.
The team fine-tuned the model to teach it what to look for. They started with 1,500 confirmed examples of racial covenants, then added thousands of “clean” records that didn’t include problematic language. The idea was to build something that could eventually be used across the country, not just in California. “It was important to us that this system not just be something we can use in Santa Clara and not useful to anyone else,” Surani says. So they fed in documents from across the country to show the model how covenants could be worded in different places and eras. The initial results were mixed, partially due to the quality of the files that were scanned.
As Surani tells it, some of the difficulty was trying to get the AI to read the documents in the first place. Most deeds were scanned from microfiche or old paper copies. Some were blurry or crooked. Others were typed on 1940s typewriters with fading ink. Off-the-shelf OCR (optical character recognition) tools struggled to interpret these messy scans, often producing garbled or incomplete text. So the team built a more robust system that could clean up and digitize the text accurately, improving the AI tool’s ability to read what was actually on the page.
In the Santa Clara RegLab project—fully detailed in this paper—the model processed 5.2 million pages in under six days using four university GPUs. If they’d run it on a commercial cloud service, the compute bill would have been under $300, Surani says. On a test set, it scored near-perfect results, with 100 percent precision and 99.4 percent recall—and that was important. Under AB 1466, a human attorney still has to verify each redaction, meaning false alarms cost both time and money.
Having a lighter workload helped reviewers focus and improve the quality of their work, since the AI tool had already done the heavy lifting.
The model flagged 7,500 likely matches. A human still had to confirm each one, but now reviewers were working from a shorter list instead of drowning in boxes of deeds. Surani says this lighter workload helped reviewers focus and improve the quality of their work, since the AI tool had already done the heavy lifting of narrowing the pool—an outcome that knowledge workers more generally have seen with other AI tools. “When you have to read 10,000 documents a day, your eyes glaze over after a while,” says Surani. “Just taking that medial-ness out of it actually makes […] the people involved in the process able to do better quality work.”
Each time a document was confirmed, Santa Clara County produced a formal statement of redaction and attached it to the original deed. The team also geocoded the flagged documents to map out where racial covenants had been used and who wrote them. This gave the county a clearer view of which neighborhoods were shaped by exclusion.
The pilot program more than proved its point. RegLab has since partnered with counties including San Francisco, Yolo, and Washoe in Nevada to expand the project. But the racial covenant cleanup was just one use case. Surani says the same infrastructure is now being used in San Francisco to dig through the city’s municipal code, flagging outdated or contradictory laws. Some are absurd, like one rule that gave public health officers the right to detain adulterers for venereal disease inspections. Others are quite serious and long out of step with current policy.
“Ultimately, it’s just a start,” says Surani. “Law accumulates.” RegLab’s bigger ambition is to give lawmakers and city attorneys a clearer view of the systems they’re working with. That could mean surfacing every permit fee a city charges, or identifying every outdated reporting rule buried in state code. “Our big question is, how do we use AI and how do we use language models to immediately provide that global picture?”
Surani says that the global picture still requires a person to interpret it. “People don’t necessarily want AI systems making decisions. People want humans in the loop. So how do we make those humans’ jobs easier, so they can focus on core judgment tasks instead of these kind of ridiculous, menial search and location tasks?”
Fixing the law before it breaks
While Stanford’s model was built to clean up the past, some governments are trying to bring AI into the lawmaking process itself in order to reduce confusion, avoid contradictions, and make new laws more understandable from the start.
“AI can serve as a human assistant with lawmaking,” says West, “but it should not replace humans."
Darrell West, a senior fellow in governance studies at the Brookings Institution, a nonpartisan think tank focused on public policy, has tracked how public agencies adopt emerging technology—including the latest AI tools. “AI can serve as a human assistant with lawmaking,” he says, “but it should not replace humans. People still have much more advanced capabilities and more nuance in the way we think about things.”
In the UK, that approach is being tested with a tool called Lex. Built by a small team inside the Cabinet Office known as i.AI, it’s currently being used by civil servants in the Ministry of Justice and Government Legal Department. Lex allows staff to ask plain-language questions and returns direct answers pulled from official legal databases. It’s a way to navigate a heavily cross-referenced statute book without losing days to manual digging. Lex doesn’t write legislation; instead it helps a team see how a new bill interacts with existing law, flags possible contradictions, and avoids duplication before anything is tabled.
Chile is testing something even more ambitious. Its system, CAMINAR, uses AI to summarize bills, flag overlap with existing proposals, and help analyze quorum rules and legal admissibility, with a goal of clearing space for better decisions.
Still, while AI is proving a tremendous asset for lawmakers, West says that continued human oversight will be critical. “Lawmakers still have to be very careful about it,” he says. “With some things, if you have 99 percent accuracy, that may be perfectly fine. But with legal documents, you really need complete accuracy. And that requires a human.”