HData CEO Hudson Hollister recently interviewed Adam Goldberg and Justin Marsico on their work furthering data standardization across the federal government, progress toward innovation and transformation at the U.S. Treasury Department, and modernization in the warrant generation process. They discuss the future state they’d create (if only they had a magic wand) — some sort of automatic interface between the legislative and the executive — and the real-life, incremental steps necessary to actually get there.
Hudson Hollister: We’re grateful to be joined today by Justin Marsico, the Chief Data Officer and Deputy Assistant Commissioner in the Bureau of the Fiscal Service at the United States Treasury, and Adam Goldberg, the Acting Assistant Commissioner for Financial Innovation and Transformation at the Bureau of the Fiscal Service. Adam and Justin, thanks for joining us.
Our interview series is based on the concept of “Data First.” We believe that laws and regulations work best when we draft and manage them as standardized data. The story of Data First doesn’t stop with laws and regulations, of course. Government information, once it is expressed as searchable data becomes both more transparent and efficient — and both of you have extensive backgrounds pursuing the idea of Data First in the federal government’s financial information.
Justin, could you give us a brief summary of the work that you’ve done in data standardization over the past 5 to 10 years?
Justin Marsico: Thanks for having us, Hudson.
One of the things that we are trying to do at Fiscal Service is change the mindset from traditional financial reporting to data production. Here is what I mean by that, if you look at the way that financial reporting exists among entities—and this is not necessarily exclusive to the Federal Government but state and local governments and companies who put out financial statements—the traditional approach is you write a report that somebody prints out. They have it on their desk, they flip through it, and they start reading to understand the basics of a company.
There’s nothing bad about that approach, we like all the information that’s in there. But the way that people want to use information right now is as data. So instead of conceiving of our financial documents as being structured to fit inside of an 8 ½ x 11 piece of paper, let’s think about them in the way that the Census Bureau thinks about producing data that will be used by researchers, financial analysts, market participants, and so on.
It’s really about a mindset shift from our perspective to producing data instead of just producing reports. Once we accomplish that mindset shift, some of the things that look a little bit different are that instead of only producing snapshots in time we can bring data together and present a historical view for the public showing how changes have been made over time. We can think about metadata and data dictionary standards which are concepts from the world of data, but don’t necessarily exist in the financial reporting world in the same rich way and bring those over. Those are some of the ways that we think about accomplishing this mindset shift.
We made a huge strides forward in this area during our time implementing the DATA Act and part of that activity was working with federal agencies across the government to actually identify what data elements we had in common and needed to be standardized. So, whereas previously agencies were producing all this information, and it was all unique to the agencies that were putting it out, we worked with them to define standards so that, when we actually brought data together for all of the agencies it actually made sense to look at aggregate and analyze as a whole.
Hudson Hollister: Justin thanks so much and maybe we can touch on some of those concepts as we continue our conversation. I’ll go to Adam next. Adam, you’ve met innovation and transformation at the Treasury Department for many years, and not all of that work has to do with adopting data standards, but a portion of it has, so I’d love to get a bit of that history.
Adam Goldberg: Thanks, Hudson. There is an evolution of the things that we’ve been doing and when we originally started the organization we were looking to find alternative paths for agencies to adopt different accounting or financial solutions that weren’t as involved as an ERP and enterprise resource planning solution.
So, you take concepts like an electronic invoicing solution, like Treasury’s invoice processing platform, which you could probably put in place in about six months and start getting returns on investment within a year, something much quicker than you might be able to do with any ERP. And one of the things that we realized, as we were doing things like that is data is an essential part of being able to adopt those common solutions, particularly when you’re sharing them in a cloud because everybody should be working off the same instance of that application and the data should be structured in the same way.
One of the things that we’ve done in the last couple of years is built off the work that Justin has done with USASpending and created additional data elements that agencies should be standardizing across the government enterprise.
Hudson Hollister: In order to get a fuller picture, what’s one of the additional data elements that your work has touched on, Adam?
Adam Goldberg: It would include something like an order number or an address.
Hudson Hollister: Those things seem like they should already be standardized.
Adam Goldberg: They are, in concept, but when an agency takes something and implements it on its own, all bets are off. Oftentimes, there’s a reinterpretation that occurs when agencies do things, and those are things that even in the work that Justin does, there needs to sometimes be translations that occur along the way. Business rules may change from one agency to another for no other reason than it is historically how they have done it.
But the other part of what we’re doing, that I think is very important, is that as we start looking at new and emerging technologies like artificial intelligence, blockchain, and advanced data analytics, data becomes the central piece.
Hudson Hollister: Of course, some of those technologies can be applied when you don’t have perfect standardization of the data elements.
Adam Goldberg: They can. Although it helps when things are more standardized. For example, a Chat Bot. You know if you don’t have a good knowledge base of information that’s digital or electronic a Chat Bot is not going to be as effective of a tool and so that’s the other part of what we’re trying to do, which is incorporate these emerging technologies into the agency’s business practices and, as we do that we’re learning where data needs to improve along the way.
Hudson Hollister: Thank you both for a great summary of the work that you have done, especially when it comes to transforming financial information using what we call the “Data First” approach, but what you might also call data standardization.
Now we come to today’s topic which is the chaotic constitutional collision of legislative and executive, and the intersection of appropriations bills and spending information. Both of you have worked on the standardization of spending information but we now come to our topic of the spending information that is generated by instructions from Congress’ appropriations bills.
Let’s start by talking about today’s process. Adam what are Treasury warrants? Why do we need them and how do they work?
Adam Goldberg: Sure. A Treasury warrant, very simply stated, is the authority for the agencies to obligate and disperse funds. In order for it to encumber itself to another entity, the Treasury needs to give them this document that says, “when you go ahead and do this, there will be funds available for the Treasury to disperse.” It’s an essential part of the appropriations process and needs to happen fairly soon after the appropriation bill is enacted.
Hudson Hollister: Do Treasury warrants determine when and how money is actually debited or credited to particular accounts?
Adam Goldberg: The warrant is really just at the beginning of that financial transaction, so it doesn’t determine how the funds will be dispersed over time. That would be more in line with how OMB is managing things and the apportionment, how that money gets doled out to them, but really what we’re saying is that for this account when you’re ready to spend the money or when you’re ready to send it to someone, it will be there. After that point there’s really not a lot of activity with a warrant unless something changes. If there’s some rescission or there’s some change and in how Congress is saying the money should be spent.
Hudson Hollister: Justin, could you give us a little bit more color on how warrants are generated in response to instructions from Congress?
Justin Marsico: You described it as a Constitutional collision between Articles one and two. I think of it more as like a high speed handoff. You can imagine two vehicles barreling down the freeway at 100 miles an hour, trying to hand the baton from one to the other without a collision.
Hudson Hollister: Maybe a monster truck rally relay race with a baton.
Justin Marsico: Exactly. I was imagining something from “Mad Max: Fury Road” happening.
But what happens is once Treasury gets the appropriations law that has been signed by the President, or goes through a different process, we start reading through it to dissect it and analyze it. The way that it works today is it’s a very manual process where we have a team of subject matter experts who are very skilled in understanding how to pick apart appropriations laws and basically make them into data.
Essentially what they’re trying to do is to go through and see what agencies, sub-agencies, and then accounts, which are the places that funds actually go to that allow agencies to start spending money. How the law says that dollars should go into those accounts is a little bit more complicated than that because in addition to recording the amounts appropriated for the current period, there may also be some complicating factors in there as well. Like, here are some funds that you can use for this period, here are some other funds that you can use over a two year period and, here are some funds that you can use over a five year period, or as long as it takes you to use them.
All of those different groups of funds or appropriations need to go into different accounts on the Treasury side, so our team has to go through and find the right place that the data belongs. So effectively, it’s a manual process of turning unstructured data into structured data and that kicks off the process of agencies getting authority to actually start doing spending activities.
Hudson Hollister: Justin, how is this manual process affected by irregularities in appropriations? For example, we’ve all watched budget battles that happen on Capitol Hill—that’s the way the process should work. There are political disagreements over spending and they are resolved.
Sometimes through compromise and sometimes the compromises on Capitol Hill involve continuing resolutions. In some fiscal years there might be a full appropriations process for part of the executive branch but a continuing resolution for the other part. How do those sorts of compromises or irregularities affect the process of drafting the warrants?
Justin Marsico: I think that the more irregularities there are in the process, the more change or the more iterations there are, the more work it is for the team that has to do the work of picking apart the laws.
It’s not like we’re trying to complain that there’s too much work here, but one thing that is important to note is that in some cases agencies need to be able to spend their funds immediately or they might have needed to spend the funds for very important activities before the appropriation was passed. So the team that does this work is always trying their hardest to get through the appropriation and to turn it into data as fast as possible. The more complexities there are in there the more discussions need to happen with counsel, and the more discussions that need to happen with agencies, to understand and to make sure that we are doing is right by the law when we’re putting the amounts into structured data.
Hudson Hollister: I want to ask Adam one more question about the warrants and then I want to begin talking about the project that you both are pursuing to evaluate ways to modernize the warrant generation process.
Adam, in the current state, is there one warrant document for each account that gets generated whenever the authority for that account changes or is it more complicated than that? And can you help us understand a bit more of how warrants pertain to the spending process?
Adam Goldberg: It’s a little bit more complicated because, even within an account, you might have appropriation language that underneath that account says, “well, for this you get this amount and this is the period, and for this you get this amount and this is a period,” so you’re going to get some variations which is really where the complexity comes in.
You’re looking for three things: purpose, amount, and period of availability. The language might reflect a parent child relationship or account and sub-account.
Hudson Hollister: One more follow up question. Justin, you refer to warrants as structured data. I wasn’t thinking of them as structured data, because they are documents… In what sense are warrants structured data?
Justin Marsico: The sense in which they’re structured data is they get entered into a central accounting system that we maintain at Treasury called CARS, it stands for the Central Accounting Reporting System. That system basically holds balances for agencies. So, if you are the Department of Agriculture and the Forest Service, and you have an account—this is a fictional account, or it could be real, I don’t know—for salaries and expenses, you might have had 100 million dollars appropriated into that account in a previous period, and you didn’t spend all of that, so you carried over some of it and your new appropriation might give you an extra $50 million well that $50 million needs to be placed into your into that account. Then agencies will know, this is the amount that’s available for spending. Then it’s tracked in that account to make sure it is in that system as well as in other agency financial systems that can make sure that the money is being spent at the right pace and that it’s not violating any laws or going below zero at some point, along the way.
Hudson Hollister: That’s the current state of Congress-passed appropriations laws, which are full of prose describing the authorities in a detailed fashion that agencies and sub-agencies ought to have.
The laws are signed by the President, or through the pocket veto process, the laws make their way to the Treasury Department and the Fiscal Service must create warrants and enter the information on spending authority into the right systems.
Let’s talk a bit about the glorious future state, laying aside all the incremental steps. If you had a magic wand, what does this look like?
Adam Goldberg: The future state would be that the appropriation bills and activities are published in XML or USLM and available immediately upon enactment. Then we can run this program that would allow us to automatically pull out those three pieces: purpose, account, and amount, and that would automatically be loaded to that CARS system.
Hudson Hollister: Imagine you can replace CARS. Imagine some sort of automatic interface between the legislative and the executive. Justin can you get even more sci-fi for us?
Justin Marsico: Think about the entire lifecycle of spending. If you take a step back and think about it, we collect taxes from the American citizenry, and then we have a pot of revenue that’s over there. Then you can think about the spending activities that have their own lifecycle. There’s the President’s budget, which includes ideas for what spending should look like in upcoming years, then there’s the Congressional Budget Office which analyzes bills that Congress has and makes projections about what future spending will look like, then there are the appropriations themselves, as we mentioned.
Those get transformed into some kind of structured data that we hold at Treasury then those go to agency financial systems all over the government their spending that takes place some of that comes back to Treasury it’s reported on USASpending at the conclusion of this cycle. There are agency financial statements on it so that’s another form of data that comes out but at some point in that pipeline there’s also a disbursement. That goes to state governments and sometimes to local governments as well.
So you can imagine there’s data that’s going all over the place, and while we’ve done a ton of work to standardize one piece of it in the middle with the DATA Act, and all of the federal agencies getting on the same page, there still isn’t standardization from the beginning to the end of that lifecycle.
And particularly, when the Federal Government gives large chunks of money to states, for Medicaid or TANF spending as examples in the States are responsible for dispersing those funds, the way in which you’ll see that data really depends on the program or it depends on the state that is providing the information to the public.
In other words, we don’t have a way of showing that information on USASpending today in our central place, we were able to show that an award is made to the state but then, after that what happens to it, we don’t know. As a secondary issue, state and local government spending is also interesting and in some cases very impactful to local communities and we don’t have a standard that we use to share that kind of information. So if you’re curious about what the impact of your tax dollars are, to the federal government, state, or municipality, we don’t have a way of bringing that all together as a coherent picture.
To the question that you have about the step between two branches of government. At some future point, it would be fantastic if we all were adopting the same standards and we were able to take data from the Congress and have it flow into our systems without having to run natural language processing and artificial intelligence in order to put it into the right format. I think it would be fantastic if we were able to get together, talk through, and adopt some common standards to reduce the friction in that process a little bit.
Hudson Hollister: We won’t get to that glorious future state all at once. In a few minutes I want to talk about the project that your offices have initiated in order to evaluate ways to move incrementally toward it. But before we get off the exciting future vision and talk about the incremental reality, I’d like to ask Adam: what would be the impact for the Fiscal Service if this automatic flow could exist? How would it make things better for the way that the Fiscal Service and the Treasury Department operate?
Adam Goldberg: So, from a big picture perspective, I think there’s a lot of benefits, because it would take away a lot of the data that we’re now producing and things like reports that agencies are producing.
We’re working with HHS on a project, right now, where they have to take information that we give them on paper and then, they have to update each one of their three instances of their accounting system so we’re trying to get them a data feed so that they can do all of this automatically. So, the future here is getting other people excited about the opportunities to move things off of the sheets of paper, that we’re passing among us today and to Justin’s point, having it automatically updated in different applications.
I think there’s a lot of benefits, but also a lot of changes that are going to need to take place in order for that to happen. My long term objective has always been creating that seamless electronic transaction, from the point that Congress appropriates dollars, to the point that an account is closed out, to the end of whatever period, it was available for — without the need for humans to enter and re-enter it along the way.
Hudson Hollister: I’ve got one background question. Adam, is this a project of the Office of Financial Innovation and Transformation [OFIT], and when was it initiated?
Adam Goldberg: It’s actually a joint project among many offices within the [Bureau of the Fiscal Service]. It’s taking OFIT’s interest in terms of leveraging emerging technology and process improvement and the Chief Data Officer looking at data and how we could use advanced analytics to support the business, and also our accounting shop, which is responsible for actually performing the process [of preparing warrants]. The idea originated from this concept that I’ve been talking about of having an end-to-end electronic transaction.
And the first thing out of the gate [warrant creation] is a manual activity, so that’s where we thought we would start. We also know that it’s an activity that requires judgments. It requires cognitive thinking by a bunch of accountants. And so that takes us out of … the robotic process automation that agencies are focused on today [and] brings us into the artificial intelligence cognitive world … We know a human can do it, they give us an answer – and now I can say, “Computer, you try and figure it out, and see if you can come up with the same answer, and if you’re not, let’s try and train you on how to think about this, so that the answers match when we’re done.”
Justin Marsico: My understanding is that there is an effort that exists in the legislative branch to turn the appropriations laws and other laws into data, into USLM, but because that’s a conversion effort, and not where the legislative branch starts, it’s not useful to us because we have to get the warrants out the door as soon as humanly possible so agencies are able to start spending money. If the legislative branch took the approach of Data First, and PDF second, then we would be able to automatically use the data and it would reduce the friction a lot.
I think that is something we struggle with in the executive branches, as well as we have processes that make us produce documents, and then we go through efforts to translate them into data, but we do need to flip from our thinking to producing data. That allows processes to flow a lot faster and then to the extent that they continue to be necessary, translate the data into documents.
Adam Goldberg: Hudson, I think when we started this project, and Justin I were talking, we [assumed] the [appropriations] language is in XML … [but] then the first meeting out of the gate [indicated] we can’t use [XML] because that [data] isn’t available to us [for several weeks after an appropriations law is finalized]. So this actually allows us to give a concrete example … as to the importance of having that language [in XML] available sooner.
Hudson Hollister: Do you call this project “the warrant project” or is there a name?
Adam Goldberg: Justin, do you have any good ideas right now?
Justin Marsico: We just refer to it as the “AI warrant project,” I think.
Hudson Hollister: But it isn’t AI, gentlemen.
Justin Marsico: There are two steps to the process. One is that we take apart the PDF and we turn it into structured data—that’s not technically AI. But what is AI, is we have to use natural language processing to dissect the sentence structure.
Hudson Hollister: So the structured data doesn’t get you all the answers.
Justin Marsico: Exactly. What we’re working on right now is trying to find out … sometimes the sentence will have the period over here and sometimes it’ll be over here, it’s not always uniform, so we need to use that technology to try to say “okay, here’s this dollar amount, and now we think that the way that the sentence is structured, maybe the period is over here.”
Adam Goldberg: Hudson, I would say there’s three categories that we’re dealing with a low, medium, and high level of complexity. Right now, we’re working on the low and medium language complexity. So we’ll look at a couple things, one of which is, the speed and how accurate it is. And then, based on the accuracy, we could actually go back and say you know what do I need to teach the model to do to give me a greater level of accuracy of what we’re doing. And so there’s lots of different dimensions that we’re trying to slowly step through in order to get the better outcome.
Justin Marsico: Can I just quickly share an example?
Hudson Hollister: Please do, yes.
Justin Marsico: Here’s an example of something that’s a little complicated. So sometimes there will be a paragraph like the example I gave before, where it says, “USDA, for service salaries and expenses, this amount” Simple. But other times you’ll see where there are dollar amounts that are floating all over the place, there are availability periods that are in different places, and then the purpose is kind of like strewn about.
Justin Marsico: And this example is more complex.
Adam Goldberg: Hudson, I want to pull on this thread. You said, “right now what you’re doing really isn’t AI.” Truth be told, we’re just getting into the AI portion now.
Everything that we’ve done to date has been about, “how do I get the data in a good format for me to apply AI to it?” And so there’s another sheet, where you know when we pull information out of the PDF remember when you cut and paste the PDF into a word document the formatting troubles.
Hudson Hollister: Every day of my life.
Adam Goldberg: And so now what we’re also trying to do is that text, actually the account name, becomes this one long stream of data. And you and I, as humans can look at that string of data and actually read what it says in a way that the machine doesn’t know how to. So we also need to train the machine to put things in like spaces, so it can match against the names that we use. So again, to your point, lots of the stuff that we’re doing now is to get the data ready, so that the AI algorithms can actually work.
And that’s not uncommon and a lot of the projects that I think both Justin and I worked on there’s a lot, we have to do in order for us to get the technology to best do what it was established to do and take just our organization and then multiply that against you know 24 large departments plus there’s a lot of information out there that’s probably needs some refinement in order for these tools to be to be maximized and exploited the way we want it.
Hudson Hollister: I want to talk just briefly about the possible outcomes of your project. It seems as though what you are building can be useful in the short term, because if you are able to develop an extraction process and then AI that can automatically generate simple warrants that can be brought to production, and you will save a lot of people time. It also seems to me that, in the medium term, perhaps your project can demonstrate the benefits of a Data First approach for appropriations bills and there are ways that Congress can rise to this and begin delivering more structure than appropriations bills currently feature.
I might be putting words in your mouth. Why don’t you tell me if I’m right?
Justin Marsico: I think you’re right. We’re trying to fix a process right now to make ourselves more efficient and we’re thinking about how do we use the very smart people that we have to do this work, instead of having them do the extremely tedious work of going through the thousands of pages of the bill, lifting and copying and pasting things. How do we get them to just focus on those really hard issues? Correcting the errors that are made by machines, you know that’s what we can do right now.
But, you’re right, in the long term, I don’t think it’s necessarily that Congress needs to rise to the occasion, I think that it’s that we need to start talking to each other about how this information is used.
Adam mentioned that we’re using real life use cases right now, and I think just having that conversation and saying, this is how we use the appropriations laws, and this is what the process looks like to unlock agency funds. Is there a way that we could do this better? Is there a way that we could do this to ensure that we have greater transparency in the end?
I think that if we start having those conversations we’re going to realize that what Congress wants is the same thing that we want: to get the money to the American people as fast as humanly possible, and to do it with you know maximum transparency.
Adam Goldberg: Yes, the other part of this, you know, and I think for both Justin and I, is we’re looking for other opportunities to do things. So one, the goal isn’t necessarily to replace the accountants and have machines do things if we could just kind of ease what they’re doing by giving them a head start.
That would also be success, and if we look at other areas where AI is being used, like in program integrity and proper payments, the computer isn’t precisely finding every true case of fraud, but what it’s doing is helping the human in the analysis of large quantities of data to ease the burden of what would happen next.
And again we’re doing this in a small enough project that we can get to kind of a concrete set of findings quickly versus finding the most complicated use case in the world and taking a year and then forgetting what the original purpose was and so what we want to be able to do is take some of these findings from you know, a smaller clear case and say how do I now expand this and be able to do this in other parts of the organization.
Hudson Hollister: Is there anything that you want to make sure that those who are interested in legislative XML understand about the potential of a Data First approach for appropriations?
Justin Marsico: I think one thing that is possibly very exciting is the idea of linking appropriations to information that’s on USASpending today, If you can imagine that lifecycle. I can imagine that a numbers perspective or from an appropriators perspective the documents that you create, the documents that you vote on, and passed into law, ultimately, it would be really interesting to see exactly how spending occurs in pursuit of those activities. I think the idea of actually pulling apart legislation and then showing what happened to the spending, where the funding went to the states is really interesting and will be as compelling to Congress as it is to us. So we’re excited about that too.
Adam Goldberg: I would echo that too. If there are other ways this information is being used and the Fiscal Service could help pursue those objectives, that would be really great. There are a lot of opportunities that open up in terms of tying all this information to the various transactions that are happening over time. I’m working on a project on blockchain right now. If we had information that was tying the appropriations to the actual transactions or awards that we’re doing and have that stay with the information all through the spending cycle that would be very informative for things like research and decision making, that is associated with all the funds that go out the door.
I don’t think we know at the end of the day, where all this money is going and I think this is the beginning of allowing us to attach that information and have it stay with the transaction throughout the lifecycle.
Hudson Hollister: I think that is possible, and I do appreciate the work you both are doing you’re working to bring better efficiency to the largest and most complex financial operation in human history, which is the management of the US Federal Government.
Thank you for the work you’re doing, thank you for spending time with us, and I look forward to seeing your progress