(this was originally posted on the Powertek Corporation blog at http://www.powertekcorporation.com/index.php/blog/162-everything-in-government-is-based-on-data)
Data.
Whether it is prefaced with “Big” or sometimes with “Open”, it is the lifeblood of government IT.
We talk about using data to do Business Analytics. We write about XML and XBRL standards. There is a famous saying that “Everybody complains about the weather, but nobody does anything about it.” In this month’s blog we will focus on how companies doing business with the Federal Government might dosomething about data standards.
In previous posts, we have talked about innovation in government and the rapid growth of mobile technology, and explored some of the areas of focus that small businesses might pursue within those business spaces. As continuing topics of interest, we will return to those areas in the near future, but in this post, we look at what provides the structural plumbing for all of the innovative solutions we desire to implement in Government. That structural plumbing is data.
Let us look at why data is an important component of every government IT solution, why it is becoming a more visible topic these days, and examine some of the initiatives that touch on data. Finally, I will make a few comments about how to improve the data situation within government.
When I was the CIO at the US Department of Transportation (DOT), I reached out to the then CTO for the District of Columbia government, Vivek Kundra, who was doing some interesting things with data inside the DC government. His goal was to make data available and let interested parties write their own programs to make use of it, thus expanding the possibility of useful applications with little incremental cost to the DC Government.
We copied some of his work at DOT in a small way, creating a data page for a few of the agencies. Kundra, of course, went on to be the first CIO for President Obama and started a process of increasing the transparency of government data across the Federal Government.
One of the results of that effort is data.gov, which was started in May, 2009 with 47 datasets. When I last accessed the site recently there were over 100,000 datasets with hundreds of published applications that make use of the data.gov information. From a dataset of “$815,000 for the link road between Pahrump and Amargosa Valleys”, to “Zurich Sunspot Numbers”, so called because the first scientist to measure sunspots was Swiss, there is data from before a to z and beyond.
It makes sense to focus on data when talking about the Federal Government; information is one of its major products. One of the few explicit references to a government agency in the Constitution is the requirement to conduct a Census every ten years. Even those agencies that provide government oversight generate a lot of information. If you go to the Government Accountability Office (GAO) used by Congress to investigate the Executive Branch of Government, and type in “data” as a search term, you get back a listing of over 50,000 reports.
The increased use of technology only increases the creation of data. Almost every Government employee not only has a desktop or laptop computer but also carries with them a smart phone or tablet that increasingly is used to collect information.
For example, Pam Hird, a talented project manager at the US Department of Agriculture (USDA), runs a project called Computer Assisted Personal Interview (CAPI). USDA now uses Apple iPads for data entry from farmers for the collection of information for agricultural survey data for the national Agricultural Statistics Service. The National Oceanic and Atmospheric Administration (NOAA) uses an app developed at the University of Oklahoma to do weather crowd sourcing both to collect data where weather sensors are not located and to identify types of weather that sensors do a bad job of tracking, such as the difference between rain and sleet.
With this enormous explosion in the collection of data, the Government faces a number of difficulties in managing the resulting datasets. One of the most fundamental is the lack of standardization between agencies or even within an agency and between programs.
There have been a number of attempts to standardize data formats in the Government. One of my favorites is the National Information Exchange Model (NIEM), https://www.niem.gov. NIEM grew out of the work of state and local governments to exchange criminal information. The US Department of Homeland Security and the US Department of Justice then put this model in place as NIEM in 2005. In 2010, the US Department of Health and Human Services joined the effort. While sometimes, it feels that NIEM has some similarity to a salmon trying hard to swim upstream; it continues to grow in importance.
A comparable effort has been in the works for the financial data that the US Government uses based on a standard called XBRL, eXtensible Business Reporting Language. The Security and Exchange Commission (SEC) and other Federal financial agencies are beginning to require financial reports to be submitted using this standard. There is debate in the US Congress as to whether all, or only larger organizations should be required to do so, but the trend toward greater usage and compatibility is there.
And yet while some things change, much remains the same. I wrote a column over twenty years ago for a corporate newsletter talking about the then IT reformer Paul Strassman. I suggested he spend time looking at the multiple inconsistent data formats within the Department of Defense (DoD), noting the many ways that the color blue, as in blue eyes, was coded. While DoD has come a long way since then, the number of different, often inconsistent, data registries and schemas across government remains large as a quick glance athttp://xml.fido.gov/registries.asp shows.
Why is that?
The reality in the Federal Government is that supporting an agency mission as required by Congress remains the focus of management staff. Activities that optimize performance generally are pushed to a lower priority unless the Office of Management and Budget (OMB) makes it a visible priority and continues to keep it such a priority over time. In a time of social media, blogs and twitter, standardizing data is just not as exciting a topic. This is true even though a case can be made that in the long-term it is the more important area of focus. Further, it has the advantage that once being established data standardization is ‘sticky’. Unlike policies that often can be swept aside by a single signature, data standards tend to stay around forever.
Regardless of the success or failure of such data efforts, the need to manage the collection and reporting on data will remain an important one with a need for industry support. In a future blog entry, we will talk about the analysis of such data under the general title of business analytics.