This article was written for Data First, Xcential’s monthly newsletter covering the modernization of lawmaking around the world. Click here to see past issues and subscribe to Data First.
What is Version Control?
Every writer has files somewhere named something like “GrantProposalDraft1-finaledited-reallyfinal,” which implies a whole sequence of versions that may or may not be saved elsewhere. Version control systems automate this process when working with computer code and keep a history of all work.
Version control is invaluable for any complex software project involving team collaboration. Consider the development of a large program like macOS. Hundreds of Apple employees in different departments, different buildings, and even different time zones simultaneously make updates, rewrite segments of code, and fix bugs. Without version control there is no way to tell who changed what and when. Changes slip through the cracks and inconsistencies compound in the code.
To a great extent, the private sector has solved this problem. If you work anywhere near software development you have likely heard of git, a free and open-source distributed version control system, and GitHub, the hosting service that lets you manage repositories.
Git was developed in 2005, and other automated version control systems pre-date it by a number of decades. But governments and legal systems have maintained their own ‘version control’ with labor-intensive processes to amendment and update law.
GitHub for Government?
When it comes to version control for legislative documents, the main questions to address are relatively straightforward. What was the law? What is the law? But it might surprise you to know that for many laws, particularly those that were recently changed, there is no current official version and no way to see a precise history of amendments over time.
This is a big problem. In the Canadian House of Commons a lack of version control has led to embarrassing headlines: Senate debates wrong version of government bill for the second time in less than three months.
Even apart from such snafus, all legislative bodies struggle with the need to quickly understand how a proposed bill would impact existing laws, or how an amendment would change a proposed bill. Even for experienced legislators, their staffs, and policy lawyers, comparing versions of bills and laws is an arduous, manual, expensive, time-consuming process.
So, why can’t the U.S. Congress, and other legislatures, just use git? Unfortunately, there are a number of processes that make standard version control for computer code not applicable to legal documents.
- Amendments, not versions. Amendments to laws are not made as versions. Instead, you often get a single sentence with textual language like “strike,” “insert,” “remove,” or “repeal” that must be interpreted. Amendments typically explain how they would change existing laws using prose, not redlines.
- Acts, not repositories. Each law Congress passes is a new Act. New Acts are changed at many hierarchical levels by subsequent Acts. Congress does try to codify all Acts into the United States Code by passing “codification bills,” but this project is decades behind, which means many Acts have not been incorporated into the Code yet, and there is no comprehensive repository of U.S. federal law. To a programmer, that would be as if each new commit that you have changes a whole bunch of repositories in an unlimited way.
- Standard Diff doesn’t work. Unlike showing the changes between two versions of the same file, matching “the same” section of an amended law requires semantic judgement, creating difficulty grouping changes.
The difficulty of applying version control systems to law is compounded when you consider the different types of changes and materials attached to legislative documents, including amendments, hearings, floor speeches, testimony, votes, conference committees, effective dates, regulatory implications, and countless other details that need to be tracked.
To bridge the divide between modern version control, git, and coding and the esoteric traditions and processes of law in the United States, experts with broad and deep competence in both distinct fields are necessary. Those experts — who could be referenced in shorthand as “lawyers with GitHub accounts” — are part of a small but growing community.
Modernization in the House
In June 2020, the Clerk of the U.S. House of Representatives issued an initial report on a rule change enacted at the start of the 115th Congress, commonly called the Comparative Print Project. Despite the unfortunate use of “print” in its working title, the report states that “the project will result in a robust, scalable, and secure web application.”
The scope of the Comparative Print Project calls for two distinct types of comparison at various points in the legislative process.
Clause 12(a) calls for a document that illustrates changes and differences made by a legislative proposal to current law. How does H.R. 123 change the Social Security Act (non-codified law) and 38 USC 321 (positive or codified law)?
Clause 12(b) calls for a document-to-document comparison between different versions of bill language. How does the Rules Committee Print differ from the bill reported by the committee?
Legal and Tech Experts with Proven Solutions
In August 2018, a contract was awarded to Xcential Legislative Technologies to build document comparison software for the U.S. House that can track changes in law and will ultimately be able to show what the law was at any point in time.
Xcential got its start by building the system that the California Legislature now uses to write, update, and amend laws. Today, Xcential’s largest project is in the U.S. Congress, working on the House Modernization Project which is involved in many different aspects of the legislative workflow. Xcential designed an open standard XML for legislation called United States Legislative Markup (USLM) and converted the entire U.S. Code into USLM, paving the way for version control for law.
Solving Version Control for Law
Xcential addressed the challenges of version control for the law with three central solutions: machine-readable amendments, machine-readable legal citations, and the creation of a legally-relevant diff.
The House Clerk’s report explains that Xcential’s team “compiled a current law dataset stored in a custom repository solution and developed natural language processors [NLP] to do the work of recognizing, interpreting, retrieving, and executing the amendatory language contained in the legislative proposal.”
In order to do this, Sela Mador-Haim, an NLP expert at Xcential, took hundreds of thousands of amendatory phrases, deciphered the grammar and semantics of those phrases, and put them into a machine-readable format. That effort enables the translation of legal documents produced by the U.S. Congress into a format that is machine-readable.
The same machine-readable translation process was then completed for legal citations. When citations can be machine-processed a query language is provided. Xcential is then able to go into the database and specify precise addresses in the law for references within legislative documents. Combining machine-readable amendatory phrases and legal citations gives us an address where the change is to be made and a language that describes the change, resulting in machine-executable instructions.
The goal is to create a legally-relevant diff. The challenge inherent in doing so is not simply identifying revised portions of legislative text, but understanding what is legally relevant to the drafter.
In particular, the goal of version control for the law, set forth in the Comparative Print Project by the House Clerk and Legislative Counsel, is the illustrate changes between the following:
- Two versions of a bill, resolution, or amendment (document to document comparisons).
- Current law and current law as proposed to be changed by amendments contained in a bill, resolution, or amendment to current law (codified and non-codified law).
- A bill or resolution and the bill or resolution as proposed to be modified by amendments (amendment impact).
According to the House Clerk’s report, Xcential’s NLP tool is currently “performing very well and with a high degree of accuracy.” The report offered the following figure indicating the solution’s success.
Conclusion
Version control for law is neither as simple as coders imagine, nor as complex as lawyers would make it. While the focus of the project described above was on federal law in the United States it contains lessons that can be applied in your local city council up to national jurisdictions around the world.