Using open-source LLMs to optimize government data

Companies have been developing and using artificial intelligence (AI) for decades.

But we’ve seen exponential growth since OpenAI released their version of a large language model (LLM), ChatGPT, in 2022. Since then, other companies have been working rapidly on their own models to stay competitive. As these tools continue to improve, the possibilities abound for how we can use them to automate time-intensive tasks, analyze unrefined data, and bring increased productivity and better decision-making to business operations.

While private companies have taken this powerful technology and run with it, the government must also adopt it to keep pace and continue making progress on developing products and services that everyone can access and use with ease. Open-source versions of these tools can help agencies optimize their processes and surpass current levels of data analysis, all in a secure environment that won’t risk exposing sensitive information. Doing so can improve efficiency and help create experiences for the public that are comparable to what they get from consumer products.

Using AI to improve data and decision-making

Government agencies have access to massive databases full of unstructured information – including health claims, program data, reporting, and other types of qualitative data. Teams spend considerable time sifting through data to inform their decision-making while many valuable insights remain locked behind unusable or overwhelming data sets.

Using AI tools can be a safe, practical solution to solving real problems. These tools can address time and manpower constraints by reducing the load on agency teams; AI tools can analyze and bring order to patterns within data at speeds beyond our capabilities. Teams can then use this more accurate data to make better, more informed decisions when developing their products and services.

A study in creating a more efficient, resilient product

An AI technology called natural language processing recently helped an Ad Hoc team identify patterns and problems within vast amounts of unstructured, duplicate data for one of our federal customers.

The challenge

A federal program used a database to collect goals from grant recipients and used those goals to improve their overall progress and capacity over time. But because there was no text standardization, we discovered that the goals within the database consisted of approximately 30% duplicates. These duplicates were difficult to fix due to variations in capitalization, punctuation, verb tense, and spelling errors. Simple methods to merge goals – like exact string matching – weren’t effective, and manual corrections were labor-intensive. This hindered the accurate tracking and reporting of program progress over time.

The solution

An Ad Hoc data scientist experimented with Google’s BERT, an open-source LLM, to effectively identify duplicate goals. Trained on a massive dataset of 3.3 billion words, BERT was able to automatically match duplicates even with subtle textual variations.

The results

The data scientist verified that using the model for this task was useful and low-risk. The model tested as 84% accurate. Incorporating AI could eliminate existing duplicates and could also help prevent future ones and help the agency improve its understanding of program performance over time. These kinds of tools help agencies unlock the value of raw data, find insights, improve their data-informed decision-making, and increase efficiency by saving a significant amount of time.

Raising the bar of government services with AI

Consumer companies are all in on AI. Like mobile applications and personalized experiences before it, AI is going to change the public’s expectations of how they interact with services and technology. Even if much of the future of AI is unclear, we can be confident that without finding meaningful ways to adopt these technologies, government services run the risk of falling even further behind consumer services, leaving opportunities for greater productivity in working with data on the table.

Ad Hoc is already helping government agencies explore how they can use AI in targeted ways to improve their services while mitigating the potential downsides of AI. Through this careful exploration in partnership with our customers, we can help to continue closing the gap between the public’s expectations and what the government delivers.

If you’re ready to see how using AI can improve your data analysis, drive more powerful decision-making, and significantly increase your agency’s efficiency, let’s talk.