For non-technical decision makers — founders evaluating vendor proposals, executives with AI projects that aren't delivering, PE/VC partners needing independent AI due diligence mid-deal.

Ground Truth Assessment

Before you commit budget to an AI initiative, find out whether it will actually work. An independent technical diagnostic that ends with a clear decision: proceed, revise, or stop.

The Problem

Only 5–12% of organizations achieve significant financial impact from AI.^[1] The difference isn't the technology — it's whether anyone independently assessed the problem before committing resources. Organizations that define clear success metrics before starting generate 2.1x more ROI.^[2] The ones that skip structured assessment abandon 42% of their initiatives — up from 17% the year before.^[3]

You don't need another consulting engagement that takes a quarter and tells you what you already suspected. You need someone who can look at what you have, define what success actually looks like, and tell you the truth. See the full evidence →

How It Works

Stage 1: Discovery

We start with a conversation — not a document submission form. Most buyers don't have a clean artifact to analyze. They have a situation: half-formed requirements, a vendor's pitch deck, a CTO who talks past them. I ask the questions you don't know to ask, and convert your situation into something that can be rigorously assessed.

This is 40 years of architecture and consulting experience. No AI tool replaces the ability to sit across from someone and realize they're describing a compliance problem as a feature request.

Stage 2: Assessment

Your structured requirements run through an analysis and verification pipeline that produces five deliverables:

Claim Register: Every testable assertion in your documents — extracted, catalogued, and verified against evidence. Not opinions. Claims, tested.
Landscape Scan: Rapid survey of the relevant technical territory — surfacing what your documents don't know they're missing.
Gap Map: Systematic identification of technical, legal, operational, data readiness, and scope gaps — positioned and structured, not just listed. Includes an honest assessment of whether the data foundation can support what's being proposed.
Campaign Plan: Sequenced, gated action plan with measurable success criteria and explicit kill points. Not aspirational recommendations — operational plans with milestones, defined outcomes, and the discipline to stop if the evidence says stop.
Velocity Check: Data-driven timeline calibrated to how software actually gets built now — including AI-paired development. Are the speed claims realistic?

What a Ground Truth Assessment Finds

In the last few weeks alone, I've stopped a government system from being built on an impossible architecture, caught a pricing vulnerability before launch, and prevented a company from spending money building something they already had. Three recent examples:

Enterprise platform: three failed launches, 41 phantom endpoints

Context: Express.js + React platform rebuild by external team. Three failed go-live attempts.
Findings: Price manipulation vulnerability, 41 AI-generated phantom API endpoints, 22-point frontend/backend quality gap, zero test files.
Implication: Launches failed because the system had no way to verify itself — not bad luck, but a structural absence of quality gates.
Recommendation: Fix pricing vulnerability (1-2 days), delete phantom code, install CI/CD and verification gate.

A specialty consumer products company hired an external development team to rebuild their platform — Express.js backend and React frontend. After three failed go-live attempts, each aborted when last-minute testing uncovered release-blocking issues, the company requested an independent assessment.

What appeared to be true

A modern platform nearing completion. The backend had professional architecture — layered services, proper payment handling, strong authentication. The frontend used React 18, a component library, and a service layer. Three launch dates had slipped, but each round of bug fixes was assumed to be bringing the system closer.

What was actually wrong

Finding	Evidence	Severity
Price manipulation vulnerability. Tiered per-unit pricing logic existed only in frontend JavaScript. The backend performed no price validation. A customer could submit any price through browser developer tools, and the server would accept it.	Code review: backend calculates quantity × submitted price with no tier lookup or validation	Critical
41 phantom API endpoints. The frontend contained 41 service methods calling backend endpoints that did not exist. All were generated by AI-assisted code generation tools. The code compiled, the TypeScript types were elaborate. None of it connected to anything. Meanwhile, the backend had 8+ working endpoints the frontend never called.	Systematic cross-reference of every frontend service call against backend route definitions	Critical
22-point quality gap. Backend scored 63% (Adequate). Frontend scored 41% (Concerning). Same vendor, same project, same timeline. The backend had TypeScript strict mode and CI/CD. The frontend had strict mode disabled, no CI/CD, no test framework installed.	Standardized rubric applied independently to both repositories	Significant
Three failed launches traced to systemic cause. No independent verification gate between development and production. Testing happened as a pre-launch sprint, not a continuous process. Zero test files found in either repository.	Timeline analysis; `find` returned 0 results for test files in both repos	Significant
2,241-line God file still growing. A single TypeScript service file with a phone number duplicate check that fetched the entire customer table into memory and performed a linear scan.	Line count analysis; code review of full-table `findMany()` + `.some()` pattern	Moderate

What it meant

Half the frontend was AI-generated scaffolding that didn't connect to the real backend. The pricing system could be manipulated by anyone with browser developer tools. There was no safety net — no tests, no staging gate, no CI/CD. The three failed launches weren't bad luck. They were the predictable result of a system that had no way to verify itself.

What was recommended

Fix the pricing vulnerability before launch. Migrate tier logic to the backend. Estimated 1-2 days.
Delete the phantom code. Remove all 41 dead service methods. Wire the frontend to the real backend endpoints.
Install a minimum safety net. CI/CD for the frontend, manual test protocol for critical paths, code review for high-fragility files.
Establish a verification gate. Defined go-live conditions, continuous testing — not launch-day QA sprints.

The assessment found a security vulnerability that could let customers set their own prices, 41 API endpoints that existed only in AI-generated scaffolding, and the systemic reason the launches kept failing. These findings surface when someone independent reads every line of code against every route definition and asks: does this actually work?

Developer handoff: architectural impossibility in a 482-line spec

Context: Civic organization document portal. 482-line developer handoff spec estimating 12 days.
Findings: Architectural impossibility (Python in a V8 isolate), two unmentioned compliance gaps (WCAG, public records), 5x document count discrepancy.
Implication: Project as scoped was half of what was actually needed. Limitations would have surfaced mid-build or in a legal filing.
Recommendation: Proceed with re-scope: dual-stack architecture, compliance from day one, re-estimate with real document count.

A civic organization needed a public document portal. A developer delivered a 482-line handoff spec estimating 12 days to build. The Ground Truth Assessment reviewed the spec, the source data, and the target architecture.

What appeared to be true

A straightforward migration: ~400 PDFs from WordPress to a Cloudflare Workers application with full-text search. Twelve days. Reasonable scope, modern stack, clear deliverable.

What was actually wrong

Finding	Evidence	Severity
Architectural impossibility. The spec called for pdfplumber and Tesseract OCR running in Cloudflare Workers. Workers execute in a V8 isolate — they cannot run Python or native binaries. The upload pipeline as designed cannot function.	Cloudflare Workers runtime documentation; spec Section 4 (text extraction architecture)	Critical
Two compliance gaps, unmentioned. No reference to WCAG 2.1 AA accessibility (required under DOJ 2024 rule for civic web content) or California Public Records Act readiness. A civic portal without these is a legal liability.	DOJ Title II web accessibility rule (April 2024); California Government Code §6250 et seq.; spec contains zero accessibility references	Critical
5x document count discrepancy. Spec stated ~400 documents. Actual scrape of the source WordPress site found 2,069 media files plus inline HTML minutes — across 4 distinct parsing patterns, not 1.	Automated crawl of source site vs spec Section 2 scope statement	Significant
Admin security insufficient for civic data. Authentication was a single password stored in an environment variable. No session management, no MFA, no brute-force protection, no audit trail for a system managing public records.	Spec Section 7 (admin interface); compared against NIST SP 800-63B	Moderate

What it meant

The project as scoped was roughly half of what was actually needed. Without the assessment, the team would have discovered the Workers limitation mid-build, the compliance gaps at launch (or in a legal filing), and the document count surprise when the migration took far longer than planned.

What was recommended

Proceed, but re-scope. The project is viable with a dual-stack architecture (Python locally for migration, unpdf + external OCR in Workers for uploads).
Add compliance from day one. WCAG and records retention woven into every component, not bolted on later.
Re-estimate with the real document count. Build the scraper for 4 parsing patterns and 2,069 documents, not 1 pattern and 400.
Harden admin before launch. Session-based auth, audit logging, soft deletes. Cloudflare Access covers most of this at zero cost.

The full assessment — claim register, gap analysis, gated build plan, and velocity calibration — produced findings that are the kind of thing that typically surfaces mid-build, mid-launch, or in a legal filing. The point of Ground Truth is to surface them before you commit.

Vibe-coded tool: AI-built, 5 field failures found over coffee

Context: AI-built pricing calculator and proposal generator. 600 lines, single HTML file, non-technical owner.
Findings: Frequency calculation bug producing incorrect totals, iOS popup blocking on primary field device, no business terms on proposals, no commercial mode, no proposal history.
Implication: Tool was fundamentally sound — 8 of 28 claims passed. But 5 failures would have surfaced in the field on the first sales call.
Recommendation: Fix frequency bug, replace popups with in-page rendering, add terms. Five of seven fixes are directly AI-assistable.

A service business owner in a competitive West Coast market used AI coding assistants to build a browser-based pricing calculator and client proposal generator. Roughly 600 lines of code, single HTML file, built without a developer. The owner reported it was "not quite working completely yet."

What appeared to be true

A functional sales tool: professional layout, correct pricing math, clean three-tier comparison. The kind of thing that looks ready to hand to a salesperson. Built entirely through AI-assisted prompting by a non-technical owner.

What was actually wrong

Finding	Evidence	Severity
Custom service frequency bug. Custom-added services hardcode every-visit frequency regardless of service type. A seasonal service on a monthly plan shows 12 visits per year instead of 1-2. Proposals with custom lines produce incorrect totals.	Code inspection: custom service handler bypasses frequency-mapping logic used by presets	Critical
Mobile popup blocking. Proposal generation uses `window.open()`. iOS Safari blocks popups by default. The field team presents from tablets on-site — the core workflow fails on the primary device.	Apple WebKit documentation; iOS Safari default popup-blocking behavior	Critical
No business terms on proposals. Generated proposals contain pricing but no cancellation policy, payment terms, or liability language. A proposal presented to a client is a de facto offer without terms.	Generated proposal output review; compared against industry proposal templates	Significant
No commercial mode. The business serves both residential and commercial clients, but the tool only has residential presets.	Owner interview; tool interface limited to residential property types	Significant
No proposal history. Each proposal generates and prints with no record saved. Salespeople can't recall a previous quote or track what was offered to a returning client.	Tool architecture review: no persistence layer or export mechanism	Moderate

What it meant

The tool was fundamentally sound. Pricing logic: correct. Design: professional. Eight of 28 assessed claims passed outright. But five failures would have surfaced in the field — a salesperson tapping "Generate Proposal" on a tablet and watching nothing happen, or handing over a proposal with incorrect line items and no terms.

What was recommended

Fix the frequency bug. Route custom services through the same mapping logic as presets. Highest impact, lowest effort.
Replace popup-based proposals. In-page rendering with a print stylesheet eliminates the iOS blocker.
Add business terms. Cancellation, payment, and liability language in the proposal footer.
Build a commercial mode. Commercial property types and square-footage-based pricing.
Add local storage for proposal history. Save quotes to browser storage with client name and date.
Have the AI fix what the AI built. Five of seven recommendations are directly AI-assistable. The same workflow that built the tool can fix it — now that someone has identified what's actually wrong.

The assessment verified 28 claims against industry pricing data, field management software documentation, local market research, and behavioral economics research. Five of the failures would have surfaced on the first sales call. The point of Ground Truth is to surface them before you hand a proposal to a customer.

When People Call Me

You got a vendor proposal for an AI solution and need an independent AI vendor assessment before you sign
Your AI project is months in and not delivering — you need an AI project health check
You're in due diligence on a deal and need independent technical due diligence on the AI
You built something with AI and you want to know if it does what you think it does
You're about to commit budget and want a second opinion on what you're actually buying

What Happens Next

You send me your situation. Email, one paragraph is fine. What you're looking at, what's worrying you, what decision you need to make.
I tell you whether a GTA is the right tool. Within 24 hours. Sometimes it is. Sometimes you need something else. I'll tell you either way.
We have a conversation. That's Stage 1 — where I ask the questions you don't know to ask. Then you assemble the materials and access I need.
You get the deliverables. Claim Register, Landscape Scan, Gap Map, Campaign Plan, Velocity Check. Typically 2-4 weeks from first conversation.
You get a clear decision. Proceed, revise, or stop — with the evidence behind each recommendation. Not a pile of findings. A decision you can act on.

The cost is a fraction of discovering these problems mid-build or after launch. You're paying for what gets caught before you commit, not for hours worked.

Most engagements start with a Ground Truth Assessment. Some end there — you get the deliverables and your team takes it from here. Others lead to ongoing advisory work or hands-on implementation. The assessment tells us both what the right next step is.

Frequently Asked Questions

What is a Ground Truth Assessment?

An independent technical diagnostic that tells you what's actually true about your technology — not what someone is selling you. It ends with a clear decision: proceed, revise, or stop. You get five structured deliverables — Claim Register, Landscape Scan, Gap Map, Campaign Plan, and Velocity Check — and the evidence to act on them.

Who needs a Ground Truth Assessment?

Non-technical decision makers — founders evaluating vendor proposals, executives with AI projects that aren't delivering, PE/VC partners needing technical assessment mid-deal. Anyone who needs to independently verify AI claims before committing resources.

How does a Ground Truth Assessment work?

Two stages. Stage 1 (Discovery): a conversation to convert your situation into something that can be rigorously assessed — 40 years of architecture and consulting experience, not a document submission form. Stage 2 (Assessment): structured requirements run through a analysis and verification pipeline producing five deliverables including measurable success criteria, data readiness evaluation, and explicit go/no-go decision points.

How long does a Ground Truth Assessment take?

Typically 2-4 weeks from first conversation to delivered findings. Most of that time is yours — scheduling, assembling materials, coordinating access. The analysis itself is focused and structured: one senior practitioner with an AI-augmented diagnostic pipeline, not a team billing hours over a quarter.

Get Started

If you're about to commit real money to an AI initiative and want an independent check first — tell me your situation. One paragraph is enough. I'll respond within 24 hours and tell you whether a Ground Truth Assessment is the right tool.

info@chapsoft.com