Open data and AI readiness

In March of 2024, The Office of Management and Budget issued a new set of requirements for federal agencies governing their use of artificial intelligence.

Widely viewed as the most consequential policy directive underpinning how agencies use AI, OMB memo M-24-10 includes a host of new requirements for agencies on how they use AI tools, how they manage potential risks, and how they track and report their AI use.

But the most important policy directive impacting how federal agencies will use AI will ultimately turn out to be another OMB memo issued over ten years ago. OMB memo M-13-13, which created an open data policy and required agencies to begin managing data as a strategic asset, will ultimately be the federal policy that determines how successful agencies are in adopting AI to improve government services.

Laying the foundation for AI

In the early 2010s, governments around the world embraced open data as a way to innovate and collaborate with external partners. Governments began adopting open data policies and publishing datasets for the public to see and — more importantly — to use in a variety of different ways. This movement culminated in the issuance of an Executive Order by the Obama Administration, which established a federal open data policy. The subsequent OMB guidance set up requirements for agencies to share their data publicly.

But how does releasing open data by government agencies relate to how those agencies use AI tools? The answer is that releasing data publicly requires agencies to take a number of steps to enhance the quality and completeness of their data. And agencies that have adopted practices that support the development of higher-quality data will be able to employ new AI tools more effectively to improve government services.

Altitude training for data sharing

Governments that have robust programs for data sharing have built up the infrastructure to ensure better data quality. Open data programs typically support processes and mechanisms for identifying useful data, reviewing it for accuracy, completeness, and security, and documenting it comprehensively. This same infrastructure can be used to identify and prepare data that is suitable for training AI models.

When agencies release government data openly, there are a number of healthy incentives created by data users. These users expect data to be well-documented and regularly updated – when it is not, governments typically hear about it in a very visible way. Data that is regularly updated, well documented, and otherwise easier to use has more value for every potential user (including governments themselves). In this way, sharing open data is like altitude training for data sharing — governments that do it well develop robust mechanisms for ensuring high-quality data that can be used in a variety of new ways.

The infrastructure and processes needed to ensure high-quality data releases are a function of the 2013 Executive Order and subsequent OMB guidance on open data. These policy directives laid the foundations for agencies to become stewards of data as a reusable resource, one that has inherent value beyond any immediate need or application. Agencies that have laid this foundation well are the ones that stand to gain the most benefit from their use of new AI tools.

Using AI successfully requires high-quality data

To use AI tools successfully, particularly the new breed of generative AI tools that agencies are adopting at a rapid pace, agencies need access to high-quality data. If data is scattered across disparate systems, incomplete, not regularly updated, or poorly documented it can have limited value to agencies that seek to leverage these new AI tools.

To be truly useful to agencies, these AI tools need to be trained on domain-specific data that relates to the programs and services that those agencies administer. The better the quality of this information, and the more of it there is, the more effectively trained these AI tools can be. Without this data, generative AI tools simply can’t be used effectively by agencies to improve services. In addition, techniques used to refine generative AI content to improve its accuracy and relevance, like Retrieval-Augmented Generation (RAG), depend on frictionless access to high-quality data. Without this data, generative AI tools can’t be effectively trained and their results can’t be enhanced to ensure that they are meaningful to people using government digital services.

Open data and data that is suitable for training AI models are currently the focus of an effort being spearheaded by the Department of Commerce, which is looking at ways to make open data more AI-ready. This effort underscores the strong connection between an agency’s open data maturity and its ability to muster the high-quality data that is needed to use new AI tools most effectively.

Agencies that want to use generative AI tools more successfully can take steps to leverage the lessons learned from over a decade of sharing open data. The organizational muscles developed through the rigorous training in data sharing required by the 2013 open data policy have helped prepare agencies for this moment.

At Ad Hoc, we can help you think about your data in new ways and help you organize it and prepare it for the new breed of AI and large language model tools. We have a proven track record of helping agencies use data to deliver results that meet agencies’ needs and produce exceptional products for the public. If you’re interested in learning more about how we can partner with you to leverage data and new AI tools to better serve your customers, reach us at hello@adhoc.team.