Modernizing how the government buys

AI4GOVT

Jul 31

For AI4GOVT, a startup bidding on the IRS's Pilot Solution Challenge, I designed a working prototype to help federal procurement officers estimate fair labor rates from disparate government data sources. In 120 days, the prototype passed IRS tabletop testing and unlocked Phase 2 of the contract.

About one in every ten federal dollars goes to industry partners through contracts, and the professionals managing those contracts often work without modern tools for their most consequential task: estimating what a fair price actually is.

In 2020, the IRS launched its Pilot IRS program to fund prototypes that could change that. Solution Challenge Two—a three-phase competition with a ceiling of $7 million—invited startups to build a tool that would synthesize labor and wage data from across the federal government and help acquisition officers make faster, more defensible decisions about contract pricing.

The Phase 1 prototype: comparing labor estimates across federal data sources

AI4GOVT, a small AI startup, bid on it. It was their first federal contract attempt. They had a thesis and a small team, but no concept and no prototype.

I joined as the solo designer for Phase 1—120 days, $50K, with the goal of producing a working tabletop demonstration that would convince IRS evaluators the concept was worth funding further.

Defining what to build

The brief was open, which meant the work started with strategy, not screens. I worked with AI4GOVT's cofounders to interview six federal procurement professionals—contracting officers, contract specialists, project managers, and a contracting officer's representative. Speed, defensibility, and audit-readiness came up in every conversation. Procurement officers don't just need a number; they need to be able to show how they got it.

Notes from my interview with an IRS employee, showing their process — Interview notes

The tool we designed pulled from multiple government databases—BLS Occupational Employment Statistics, GSA's CALC tool, Wage Determinations Online—and let users compose an estimate by selecting databases, specifying location and term, and projecting inflation. Just as importantly, it exposed the individual records behind each number, so users could justify their estimates if challenged.

Outcome

IRS evaluators conducted tabletop testing with a project manager and an IT specialist, scoring the prototype 3.7 out of 5 across six criteria—enough to meet IRS standards and advance AI4GOVT into Phase 2 of the contract. The win launched AI4GOVT's federal contracting practice, which continued for the next two and a half years.

First page of an IRS-administered test gauging the prototype's effectiveness

Second page of an IRS-administered test gauging the prototype's effectiveness as "Meets Standards"

Why it matters

Civic and public-sector work runs on tight constraints, non-negotiable user contexts, and high accountability for public money. Designing inside those constraints—fast, defensible, and respectful of the professionals doing the actual work — is a discipline of its own. It's the kind of design I want more of.

Internal Revenue Service

Ben Spear benjaminspear.com

Modernizing how the government buys

120

days

1

prototype

$150k

won

Defining what to build

Outcome

Why it matters

Contact Resume Logo design

Modernizing how the government buys

120

days

1

prototype

$150k

won

Defining what to build

Outcome

Why it matters

Building design from zero

Repositioning trades education

Contact Resume Logo design