We had the privilege of interviewing Hugh Halpern, who is the twenty-eighth person since 1861 to lead the U.S. Government Publishing Office, formerly known as the Government Printing Office. Director Halpern has brought a Data First approach to modernizing the GPO’s management and publication of both legislative and regulatory information. The interview was conducted by Hudson Hollister, founder and CEO of HData, and former president of the Data Coalition.
Today we’re honored to have Hugh Halpern, director of the Government Publishing Office, chief executive publisher of the United States government, for an interview. Hugh, thank you so much for joining us.
Thanks for having me.
We’d love to start out by asking you about your experiences on Capitol Hill, as a user of published information on legislative and legislative materials published by the GPO,
I spent about 30-odd years on Capitol Hill starting, frankly, as my local Congressman’s driver, and working my way up to the point where I ran the House floor for Speaker Paul Ryan.
And in the middle, I was both a user and producer of legislative information, doing a lot of legislative research for my work for about half a dozen different committees, but also producing legislation and committee reports.
Throughout that process, I developed an interest, and, as circumstances often dictated, I needed to understand the legislative document production process … I wanted to gain more control over the documents produced by the committees I worked for, and through that learned to use the same software that the GPO is using to produce a lot of documents today.
That software is very long in the tooth, and is due for a significant overhaul. And now as Director of the GPO I’m really happy to be here to help shepherd that process through.
What were some of the first inklings that you had that better technology in the drafting and the management of legislative information might be a good thing, either for the internal users in Congress, or ultimately for the constituents in public?
Let’s talk about how GPO and, eventually, Congress moved into the digital age.
If you go way back, bills were produced on special typewriters and then copies sent over to GPO, and then they would set that type by hand or using a linotype.
When we hit the digital age in the early 1980s, the GPO developed its own digital composition system, based on a typesetting language which was also proprietary to GPO. It was all designed to put an image on substrate and put words on paper … the typesetting language described how things looked.
That’s not too different than all of the early word processors, [including] the current word processors we all use: something’s bold, or something’s italic, or it’s small caps.
Fast forward: the GPO used this system, and it was adopted by both the House and Senate legislative counsels to improve their drafting of legislation for Members and staff. But it was very opaque, and it’s very hard to use.
When I learned how to use this system, it was a pain … It’s basically putting code in text files and hoping that you don’t have errors to get the output you’re looking for. And that output was really limited so bills all looked the same way, and you [could] get the PDF that makes it look that same way.
Committee reports, too, were designed to look a certain way. God forbid you should try and do something different, like put in a chart or a graph or an image. That was really complex stuff to do with that system. It still is, we still use it today!
In the late 1990s and early 2000s, Congress shifted its editor for legislation and a lot of other documents to to use an XML base. That was the first time we really started describing legislative elements as what they were, rather than how they looked.
That [first XML-based] system has morphed into our current standard: USLM, which is the United States Legislative Markup. USLM is really going to be the foundation of both legislative and eventually, probably some regulatory documents in the future. We are excited for that!
But the current problem is: while Congress’ XML-based editor changed the process for producing output [in Congress], that process that occurs when Congress hits Ctrl-P didn’t change. [The GPO’s typesetting language is] still [embedded in] our composition software.
So … everybody realized that this typesetting system, developed in the early 1980s, didn’t play well with modern operating systems. It was essentially held together with bailing wire and bandages to make sure that it could work in Windows 7 or Windows 10 now.
So there was always a project at the GPO to try and replace that system. And it didn’t matter when I asked what its status was: it was always two years away.
Well, the good news is we’ve actually made some real progress. This replacement system is called XPub, and is our next generation composition engine. It really holds a lot of promise for where we can go in the future and will revolutionize our workflow here at GPO and our customers’ workflows as well.
I would love to dig more into XPub, Hugh. I would like to put that in the context of the work that you’ve done, when you came to lead the Government Publishing Office and thought about the different challenges and management challenges that you were going to undertake in that executive role. What portion of your brain space did this technological infrastructure change occupy?
For me, it was really important and one of the critical changes we’ve got to make as an organization.
GPO has a lot of talent. We’ve got everybody working for GPO from artisanal bookbinders on one end to software developers on the other. It’s an incredibly diverse, incredibly talented group of people.
But sometimes what I’ve found is there is a little bit of a lack of imagination, a bit of “we’ve always done it this way.” We always expect our customers to expect the same thing, so we just have to keep doing the same thing.
But now, we’re at an inflection point where several different technologies are coming together. I’ve talked about XPub, our new composition engine and the software behind that–really, really important.
But the other technological piece that’s important is in a project that was started before I got here. The GPO was shifting away from old-style web presses with an offset printing process to new, digital inkjet presses.
The thing about offset printing is it requires a large investment in what we call the pre-press process. You’ve got to make metal plates that the ink sticks to, and that’s what gets transferred to the paper. Because that process is so labor intensive and expensive, you try and minimize your costs as much as possible. So for instance, on a committee report, that format was designed to fit the maximum number of pages we could across one of those metal plates.
When you shift to a digital inkjet press, you’re not going to necessarily get the same super-duper high quality that you get out of an offset press. But what you do get is a lot more flexibility.
So if you think of those digital presses like really large office copiers, the cost per copy is roughly the same. In an offset world, you’re going to pay tens of thousands of dollars for that first copy, and then pennies for each additional copy. Whereas [for a digital press] that cost is spread a lot more across every copy of a document.
Doing short runs is actually a lot easier if you only need 200 or 500 copies of a particular document. That’s a lot easier to do in a digital environment.
So you combine XPub, which has a lot more flexibility, with these new digital presses, that are also capable of color and other things that our old software wouldn’t support. You can go to our customer–one of our most important customers is Congress and say–hey, we have a bunch of new capabilities, maybe now is the time to take a look at your documents and think about how you want them to look in the 21st century.
So you’re no longer confined to weird, small paper sizes. If you want something on letter size paper, you can do that. Or if you want something that incorporates color elements, or you want images, or charts or whatever, that’s a lot easier to produce.
And, [as I said], the one thing that I really pushed for once I got here, was trying to ease the authoring part of this equation to make content creation a little bit easier. Everything has been designed to use specialized editors up to this point. And my point to our team was: we need to meet our customers where they are–we need to get into the business of producing Word templates, because that’s the primary system in use on the Hill, or, frankly, any other editor that folks want to use.
If we build those templates for our customers and say, use these templates with these styles, it’s really easy for us to ingest that into XPub and deliver great results. That’s going to be a huge leap forward.
Our goal is to be able to say to a committee, here’s your Word template, use this to produce your committee report, and put in those images, put in those charts, whatever tables, whatever. And then you can upload that to a GPO website and we will spit back out to you a PDF that’s ready for the presses, plus good XML structured data that’s also set for the web. And I think that really holds a lot of promise for our customers and for the public in general.
In a minute, I want to turn to the future, and I want to invite Mark Stodder, the president of Xcential, to ask one future question. But before that, I want to ask one more thing about how the U.S. Legislative Markup, USLM, forms the digital structure foundation for all of these changes.
Our email newsletter is called Data First, because we’re supporting the notion that legislative information ought to be created as data and then maintained as data and published out as data. How does USLM underlie all those plans? Then we’ll turn to the future.
The schema that we’re using–the multiple schemas we’re using–for USLM really is that foundation for government information going into the future.
When we draft legislation, that’s a structured document. You’ve got titles, you’ve got sections, you’ve got subsections, you’ve got paragraphs and on through, it’s really easy to understand how that fits into a more structured data format.
But the thing that folks kind of missed was that legislative language gets reused in a number of different contexts. When a committee marks up a bill, and reports an amendment, that amendment text is actually in their committee report. So we need a structure that supports both that committee report and the legislative language itself.
And there are other things that get carried in committee reports–for instance, committee votes. We need a structure that supports those votes so that somebody on the outside who’s really enterprising could build a database of all the votes in the Ways and Means Committee, or the Energy and Commerce Committee.
That can all be tied to those legislative items, committee reports, and things like that. USLM really is the platform on which all of those next-gen applications are going to get built.
As somebody who used legislative information all the time, that would make my job a lot easier, and let me spend more time thinking about policy solutions than trying to pound the square peg into the round hole to get a document I can work on.
All of those things are going to be super important. And in addition to what we do for Congress, we publish the Federal Register every day. And XPub is going to be key to that production process as well. Expanding USLM and, and all of these schema, to support all of these different publications that are interconnected, is really going to be key to the future.
Speaking of the future, I’ve got a fast future question–and I believe Mark’s got one too. My question is a science fiction question, almost.
Do you foresee a future in which the digital structure will not just convey the visual appearance and organization of legislative information, but perhaps also the meaning of such digital tags, perhaps for due date, internal citations at first, and then the actual mandates in the future?
I think we can get there.
Part of it is getting the users of that data to really understand the potential that structured data for legislation can deliver. One of the things that we’re going to be looking at in the future is: how can we use that information going forward? And one of the things that I think you see, if you go to Congress.gov, or any of the other sites that use our data, they’re already doing certain things with those structures.
For instance, you can share a link to a particular section, or paragraph, or element in that XML display. I think that’s touching on what these services can deliver, as we go forward. Frankly, those of us in government are not necessarily going to be the ones to come up with the next whiz bang idea.
But by putting this data out there, it really enables the public and the community and folks who have an interest in this kind of data. It really gives them the opportunity to experiment.
Wonderful. Let’s toss it over to Mark Stodder, president of Xcential, for the future-oriented conclusion.
OK, thank you.
As someone who worked in legislation for 30 years, you certainly detected how often the printing process determined the legislative process – how something appeared on a paper would drive how a bill is structured, or how a printing deadline would drive how certain processes and systems would go.
As we move to a data-rich future and one that relies on digital processes, and doesn’t have to wait, always, for an offset printing press to start cranking up – how will that change, over the longer term, legislative processes?
Well, that’s really a great question. And I think it’s going to be one of those sort of fundamental questions, certainly one that at the federal legislative level we’re going to have to grapple with.
When I ran the House floor for (former Speaker of the U.S. House) Paul Ryan, one of the overriding fundamental principles that the House and the Senate used, and still use today, is that paper is what governs. Ultimately, it’s that official copy, that paper copy, that is critical. It is the record of record, so to speak. It is the original.
And I always used to say: my most powerful tool in my toolbox was that red pencil. If we caught a mistake at the end, I needed that ability to go in and make that correction, knowing that that manuscript was going to come over here to GPO and our proofreaders and keyboard operators were going to see that correction, and make it, and it was going to get incorporated into the process.
So Data First drafting is going to be a really fundamental departure for Congress, if we ever get to that point. They’re really going to have to think through that process going forward.
But that said, we’re already starting to see some really significant changes in the direction of more electronic workflows. For instance, with the pandemic, the House of Representatives put in place the eHopper, so that members could introduce bills electronically.
Now, we’re still getting that manuscript and following our existing process where we match that manuscript to an electronic file, if we have one. And we take a lot of time and put a lot of effort into making sure that manuscript matches the electronic file if we’ve got one.
But I’ve always described these new innovations a little bit like Pandora’s box, and sometimes they yield real benefits, but it’s always this concept that once you’ve gone down that road, at least once, you can never really reel it all the way back.
Sometimes that’s for good. Sometimes that’s not.
But, you know, we’ve put in place a lot of these more electronic workflows to get this through Covid. And the idea that we’re ever fully reeling those back to the way it was two and a half, three years ago, doesn’t make a whole lot of sense.
So the House and the Senate are really going to have to figure out what makes sense for them institutionally. And we at GPO are going to be here to support whatever those efforts are. And we’ll make changes in our own process to ensure that we’re supporting what those legislative bodies need.
Hugh, thank you so much.
No, thank you guys.