After my last blog post I received a lot of feedback. Thanks to everyone who contacted me with questions and comments. After all the interest in the subject, I think I will devote a few more blog posts to the subject of legal references. It is quite possibly the most important subject that needs to be tackled anyway. (And yes, Harlan, I will try and blog more often.)
Many of the questions I received asked how I envision the resolver working. I thought I would dive into this aspect some more by defining the role of the resolver:
The role of a reference resolver is to receive a reference to a document or a fragment thereof and to do whatever it takes to resolve it, returning the requested data to the requestor.
That definition defines the role of a resolver in pretty broad terms. Let’s break the role down into some discrete functions:
- Simple Redirection – Perhaps the most basic service to provide will be that of a reference redirector. This service will convert a standardized virtual reference into a non-standard URL that is understood by a proprietary repository available elsewhere on the web that can supply the data for the request. The redirection service allows a legacy repository to provide access to documents following its own proprietary referencing mechanism without having to adopt the standard referencing nomenclature. In this case, the reference redirector will serve as a front to the legacy repository, mapping the standard references into non-standard ones.
- Reference Canonicalization – There are often a number of different ways in which a reference to a legal document can be composed. This is partly because the manner in which legal documents are typically structured sometimes encourages both a flat and a hierarchical view of the same data. For instance, one tends to think of section in a flat model because sections are usually sequentially numbered. Often however, those sections are arranged in a hierarchical structure which allows an alternate hierarchical model to also be valid. Another reason for alternate references is the simple fact that there are all sorts of different ways of abbreviating the same thing – and it is impossible to get everyone around the world to standardize on abbreviations. So “section1”, “sec1”, “s1”, and the even more exotic “§1” need to be treated synonymously. Also, let’s not forget about time. The requestor might be interested in the law as it existed on a particular date. The resulting reference will be formulated in a manner in which it starts being more of a document query rather than a document identifier. For instance, imagine a version of a section that became operational January 1, 2013. A request for the section that was in operation on February 1, 2013 will return that January 1 version if that version was still in operation on February 1 even though the operational date of the version is not February 1. (Akoma Ntoso calls the query case a virtual expression and differentiates it from the case where the date is part of the identifier)
The canonicalization service will take any reference, perhaps vague or malformed, and will return one or more standardized references that precisely represent the documents that could be identified by the original reference – possibly along with a measure of confidence. I would imagine that official data services, providing authoritative legal documents, will most likely provide the canonicalization service.
- Repository Service – A legal library might provide both access to a document repository and an accompanying resolution service through which to access the repository. When this is the case, the resolver acts as an HTTP interface to the library, converting a virtual URL to an address of sorts in the document repository. This could simply involve converting the URL to a file path or it could involve something more exotic, requiring document extraction from a database or something similar.
There are two separate use cases I can think of for the repository. The basic case is the repository as a read-only library. In this case, references are simply resolved, returning documents or fragments as requested. The second case is somewhat more complex and will exist within organizations tasked with developing legal resources – such as the organizations that draft legislation within the government. In this case, a more sophisticated read/write mechanism will require the resolver to work with technologies such as WebDAV which front for the database. This is a more advanced version of the solution we developed for use internally by the State of California.
- Resolver Routing – The most complex, and perhaps most difficult to achieve aspect, will be resolver routing. There is never going to exist a single resolver that can resolve every single legal reference in the world. There are simply too many jurisdictions to cover – in every country, state/province, county/parish, city/town, and every other body that produces legal documents. What if, instead, there was a way for resolvers to work together to return the document requested? While a resolver might handle some subset of all the references it receives on its own, for the cases it doesn’t know about, it might have some means to negotiate or pass on the request to other resolvers it knows about in order to return the requested data.
Not all resolvers will necessarily provide all the functions listed. How resolvers are discovered, how they reveal the functions they support, and how resolvers are tied together are all topics which will take efforts far larger than my simple blog to work out. But just imagine how many problems could be resolved if we could implement a resolving protocol that would allow legal references around the world to be resolved in a uniform way.
In my next blog, I’m going to return to the reference itself and take a look at the various different referencing mechanisms and services I have discovered in recent weeks. Some of the services implement some of the functions I have described above. I also want to discuss the difference between an absolute reference (including the domain name) and a relative reference (omitting the domain name) and why it is important that references stored in the document be relative.