Article

The (Supposed) Limitations of XML

September 5, 2016

It’s been a while since I updated my blog – a whole year in fact. The reason is that I’ve been hard at work finishing our web-based XML editor, LegisPro, supporting our projects with the U.S. House, while simultaneously developing an Akoma Ntoso-based implementation for the U.K. and Scottish Parliaments. The challenge has been all-consuming.

Next week I will be giving a couple of talks with Matt Lynch of the Scottish Parliament at the LEX Summer School 2016 in Ravenna, Italy and then, the following week, by myself at NALIT 2016 in Indianapolis. My company, Xcential, also intend to show glimpses at our booth the Data Transparency Conference in Washington D.C. on the 28^th September. We’ve got a busy month ahead of us.

Recently the tired old question of whether a legislative drafting system is best built on a word processor or using true XML technology was raised yet again. (No, the Open Document Format (ODF) and Office Open XML don’t make word processors into XML editors.) To me, and everyone I interact with, the answer is quite clear and was settled a decade ago – XML is the way to go. The reason is simple. XML provides a long-lasting data format that can be used to build a comprehensive solution that enables all of the required automation features for legislative drafting. On the other hand, shoe-horning a legislative drafting application into a word processor, never designed for this type of an application, results in too many compromises.

In reliving the debates from 10 years ago, I stumbled across a competitor’s white paper on the subject. While they still promote the white paper, its content is quite dated. It makes the case for the word processor approach rather than XML. I read, with some amusement (or was it irritation), all the perceived shortfalls of XML.

I thought it would be fun to take a look at each of these supposed problems with XML, and provide a counterpoint to each of them. To be fair, this paper was written several years ago and technology doesn’t stand still. So, here goes:

Point 1: Legislative content and presentation cannot be separated

There is a thread of truth to this. Because so much of the amending process is based around the page and line number paradigm for referring to locations, it is essential that there be a robust and precise means for referring to any part of the document right down to a specific word. However, this is the entire requirement – there is no further need for the presentation to be tied directly to the content. Legislatures including California and the U.S. Congress, have used markup technologies for many decades now, long before the advent of XML (or even SGML). If the requirements mandated that content and presentation not be separated, none of these solutions would have been viable. So let’s consider the specific issue – how to tie page and line numbers with the content. Superficially, a word processor does this with an intrinsic page and line numbering capability. However, you quickly discover a problem – legislation requires page and line numbers fixed to locations in the last official publication. The dynamic recalculating nature of the intrinsic page and line number capability of a word processor renders the capability useless. Instead, the classic way to address this requirement is to produce a separate rendition of the document using a hidden tabular format, one row per each published line and a column for line numbers and a column for content. However, this creates a huge problem – you now have two copies of the legislation, one organized by the document structure and one organized by physical layout. Getting between these two representations precisely becomes a troubling problem to deal with. For XML, this was also a challenge until we came up with a very clean and workable solution almost a decade ago. Now, when we publish the document PDF, we back-annotate unobtrusive markers into the XML. These markers are used to arrange the editor presentation as well as to drive the amendment engine. This works out very nicely. We have implemented this technique several times now with great success.

Point 2: Temporal relationships must be preserved

This one made me laugh. For years, we’ve pointed to issues like this as reasons to go to XML rather than to avoid it. The argument made in the white paper is that XML provides no facilities to model the temporal relationships that are necessary when making citations or establishing other relationships that exist in legislation. While this is true, it’s also quite misleading. To expect XML to intrinsically provide this facility is to completely fail to understand the role of XML. In fact, word processors have no intrinsic capability to solve this problem either – it’s something that has to be built.

We’ve been addressing this problem since our beginning in 2001 using web-based references or URIs and a clever middle-tier technology we call a resolver that interprets the temporal or versioning aspects of the citation or reference. This problem was solved long ago.

Point 3: Permanence is required

The argument here is confusing. It is totally true that there is an unbendable requirement that legislation be preserved in a form known to last forever (or at least for many centuries). It’s also totally true that there is no digital technology with the proven permanence of paper or vellum. However, this is an argument about the medium used to preserve the content. It’s not an argument about the type of format that should be used to create the content – unless there is an argument that we should give up on digital technologies and return to paper, scissors, and glue. A physical document can be produced for archival purposes regardless of the technology used to draft it.

Personally, it you were to ask me, the logical document to archive would be a vellum printout of the XML. That way, you would have a much easier task of restoring the document at some future date should some catastrophe result in the loss of the digital record. However, I don’t think that’s a likely decision that anyone will make anytime soon. John?😉

Point 4: Work-in-progress is structurally broken by XML’s rules

This has long been the primary argument against XML editors. Their rigidity in enforcing the rules often gets in the way, especially in the early part of the drafting cycle when the ideas are still fluid. Most XML editors just aren’t designed for this type of work.

This limitation has been one that I’ve focused considerable effort at overcoming for years. Our most recent efforts have made tremendous strides. We’ve tackled this problem in two ways:

First, using the modern DOM Range constructs that are inherent in modern browsers, we’ve been able to loosen up the selection model considerably so that it closely matches selection in a word processor. Using sophisticated programming of the DOM and this range mechanism, we able to match much of the loose editing offered by a word processor.

Second, we go beyond the word processor by allowing the structure to be removed entirely and allowing the words to be rearranged entirely unencumbered by any structure at all (think text editor). Once the words are rearranged and the user is ready to move on, we automatically recreate the structure for the user. It’s a great way for drafters to get their ideas down without worrying about detail. Turns out, it’s also a great way to import foreign content – a nice bonus.

Point 5: XML document validation is insufficient

This is another one that made me laugh. As before, it’s totally true and entirely misleading. XML validation is not intended to be the be-all and end-all of verifying a document. It would be quite remarkable if an XML schema could do that. Curiously, a word processor offers nothing whatsoever in this regard – it must all be custom created.

When it comes to the subject of verifying a document, we use two terms – validation and verification. Validation is the process of ensuring that the document’s XML content adheres to the content model prescribed by the schema. We call this the “outer envelope” of checks. The “inner envelope” is to verify that the document adheres to the jurisdiction’s internal business rules. While off-the-shelf technologies exist to perform the outer XML validation, this inner verification step requires custom software. We’ve built a configuration mechanism that allows us to configure a “model” that our existing software can use for verification rather than building this from scratch each time.

Point 6: There is no common language around which to develop a standard

This one is perhaps the most annoying. To be fair, the white paper is not current and could do with an update – although doing that might undermine its use as a marketing tool. Again, as before, there is some truth to the assertion, but to argue that the differences between jurisdictions disqualifies XML is to see the glass as half empty rather than half full. I’ve been in this field for 15 years in November and have worked on legislative systems on four continents. What’s surprised me is how similar they are rather than how different. Fundamentally, the process of making law is the same almost everywhere. It’s in the details that the differences lie. Yes, in California a resolution is a type of document while in the UK, a resolution is a type of section in a specific type of statutory instrument, but that’s a detail that doesn’t get in the way at all.

There is one point in this area that the white paper makes that is particularly annoying. XML is characterized as a generic model for representing data. That’s only half the story. Everybody in this field knows that there are two very different models that XML serves well – representing data and representing documents. XML, as a derivative from SGML, has stronger origins in the representation of documents than of data. So why are XML’s strengths in representing a document so casually ignored? Seems a little self-serving to me. JSON, on the other hand is an excellent generic model for representing moderate amounts of data, but a terrible model for representing documents.

The entire argument here can now be refuted as we now have Akoma Ntoso. It’s an XML schema that was initially designed by the University of Bologna at the request of the United Nations. Today, it’s on the verge of becoming an OASIS standard. Akoma Ntoso understands that there is no on-size-fits-all solution to legislative XML. It does this by providing a basic set of constructs that are generally found everywhere, a mechanism to create custom constructs, and an overarching design for how to model the hierarchy of legislation.

The implementation I will be showing in Ravenna, Indianapolis, and Washington D.C. in the coming weeks will demonstrate just how a general-purpose XML model such as Akoma Ntoso for legislation can be applied to the specific needs of a pair of jurisdictions – and a pretty challenging pair of jurisdictions at that.

For me, XML is the easy winner. With XML you design the document to exactly fit the needs of a jurisdiction and then shape the tools to work with this model. With a word processor, you shoe-horn the needs of the jurisdiction into the limited flexibility offered by a word processor’s intrinsic model and then spend all your time trying to handle the mismatches between what the word processor was designed to do and that the customer wants. Either way it’s challenging, but with a word processor, much more so.