METHODOLOGY · 5 STAGES · ONE REPORT OUT

How research actually happens.

A five-stage pipeline across seven frontier models, a 180-source universe screened down to ~40 cited per report, and fourteen gates that block a memo from shipping. The reasoning trace and citations ship with the verdict.

How Verdikt is designed to work · Public beta opens June 2026

5Pipeline stages

7Frontier models orchestrated

180+Source universe

14Quality gates before ship

<1 hrEnd-to-end runtime

1Report out

Get a verdict See a sample

PIPELINE AT A GLANCE

Five stages. The first one is with you. The next four run automatically.

01Intake & framing≈9 min · you

02Market sizing≈4 min · pipeline

03Competitive map≈5 min · pipeline

0410× claim test≈2 min · pipeline

05Synthesis & verdict≈1 min · pipeline

01≈9 min · you

Intake & framing

Your one-sentence pitch becomes a structured brief of twelve questions, each mapped to a downstream decision and a named kill criterion.

INPUTS

· One-sentence pitch
· ICP guess
· Pricing hypothesis
· 10× claim
· Named falsifier

OUTPUTS

· Structured brief
· Hypothesis tree
· Coverage map

WHAT RUNS

01Brief is parsed into a hypothesis tree (one root claim, three to five child claims, leaf evidence requirements).
02Each child claim is tagged with a kill criterion and a required evidence tier.
03Coverage map identifies which sources, datasets, and competitor surfaces each leaf needs.

MODELS

Claude Sonnet 4.6 (brief parser) · GPT-5 (hypothesis tree validator)

02≈4 min · pipeline

Market sizing

Bottom-up TAM built from unit economics, SAM from ICP density, SOM from year-one capture; top-down Gartner numbers do not qualify.

INPUTS

· ICP definition
· Pricing hypothesis
· Geography
· Comparable wedges

OUTPUTS

· TAM / SAM / SOM table
· Sizing memo
· Growth rate range
· WTP signals

WHAT RUNS

01Bottom-up TAM from primary databases (SEC EDGAR, FRED, BLS, Eurostat, ONS, Census).
02SAM derived from ICP density × ACV band, cross-checked against three comparable wedges.
03SOM modeled across three GTM scenarios (founder-led, PLG, channel) with named penetration ceilings.
04Growth rate triangulated from at least two independent sources; conflicts flagged, not averaged.

MODELS

Claude Sonnet 4.6 (sizing reasoning) · Perplexity Sonar Pro (source corroboration) · GPT-5 (numerical sanity check)

03≈5 min · pipeline

Competitive map

Direct competitors, indirect substitutes, and the do-nothing baseline scored on the same six axes, with the reach-for substitute named.

INPUTS

· Category definition
· ICP workflows
· Wedge claim
· Substitute hypotheses

OUTPUTS

· Competitor scorecard
· Substitute map
· Moat thesis
· Threat ranking

WHAT RUNS

01Direct competitor sweep across Crunchbase, PitchBook, G2, Capterra, Product Hunt, and recent funding announcements.
02Indirect substitute mapping: open source, in-house build, spreadsheet, services firm, do-nothing.
03Each player scored on feature parity, distribution reach, capital position, hiring velocity, integration surface, and switching cost.
04Moat thesis tested against the strongest competitor's most recent 90 days of shipping cadence.

MODELS

Claude Opus 4.7 (competitive synthesis) · GPT-5 (capital and hiring signal extraction) · Gemini 2.5 Pro (long-context competitor doc ingestion)

04≈2 min · pipeline

10× claim test

Your 10× claim is decomposed into measurable sub-claims and run against the falsifier you named; untestable claims do not survive.

INPUTS

· 10× claim
· Falsifier criterion
· Benchmark targets
· Comparable products

OUTPUTS

· 10× audit
· Benchmark grid
· Falsifier check
· Adversarial transcript

WHAT RUNS

01Claim is decomposed into measurable sub-claims (latency, cost, accuracy, coverage, time-to-value).
02Each sub-claim is benchmarked against the named competitor and the do-nothing baseline.
03Falsifier criterion is run as an inverse test: if the criterion holds today, the claim is downgraded.
04Adversarial pass: a second model tries to break the claim from the strongest counter-argument.

MODELS

Claude Opus 4.7 (claim decomposition) · GPT-5 (adversarial counter-argument) · o4-mini (numerical benchmark verification)

05≈1 min · pipeline

Synthesis & verdict

Multi-section research report drafted to a fixed template: verdict on the cover, kill criterion named, sections for market, competition, the 10× test, pricing, and (when the call is BUILD) a 12-week build plan and GTM motion analysis. Tier-graded citations inline. Two audience frames in one memo. Timestamped run record at ship.

INPUTS

· All prior stage outputs
· Reasoning traces
· Citation pack
· Quality gate results

OUTPUTS

· Multi-section report
· Citation pack (~40 cited)
· Founder / investor dual frame
· Reasoning trace
· Re-run hooks
· Run record & timestamp

WHAT RUNS

01Memo drafted to a fixed template; deviations from template require explicit override and are flagged in metadata.
02Recommendation is one of: BUILD, PIVOT, KILL (founder frame) or PURSUE, RECONSIDER, REJECT (investor frame). Both frames ship in the same memo. No fourth option.
03Kill criterion restated verbatim with the evidence threshold that would trigger it.
0414 quality gates run on the draft; any failure blocks ship and returns to the relevant stage.
05A run record is emitted alongside the memo: generation time, pipeline version, model lineup, gate results, and re-run hooks for the three weakest claims.

MODELS

Claude Opus 4.7 (memo composition) · GPT-5 (gate verification) · Claude Haiku 4.5 (citation linting)

INSIDE THE REPORT

What ships is a memo, not a paragraph.

The deliverable is not a paragraph, it is a multi-section research report with the verdict on the cover and every claim cited inline. Same evidence, two lenses: the founder frame reads first, the investor frame ships beside it for cofounder alignment, advisor reviews, and partner meetings.

FOUNDER FRAMEBUILD · PIVOT · KILL

Should you ship it? Quit your job? Pivot the wedge? The report reads end-to-end as a builder's memo.

INVESTOR FRAMEPURSUE · RECONSIDER · REJECT

Each section ships with a reframe panel: same evidence, the partner-meeting lens. Standard pre-seed comp set, named kill criteria, IC-packet ready.

01The callVerdict, confidence, biggest risk. The cover answers the question.

02What we testedHypothesis tree. Every claim has a tier, a source, and a kill criterion.

03The marketBottom-up TAM, SAM, SOM. A conflict surface when sources disagree by more than 20%.

04The 10× testBenchmark grid plus the adversarial transcript that tried to break the claim.

05Why it winsMoat synthesis. The substitute that is actually dangerous, named.

06CompetitionDirect, indirect, and do-nothing competitors scored on six axes.

07Kill criteriaThree named ways this fails. The signal to watch. The trigger threshold.

08What to chargePricing band against four comparables, with willingness-to-pay quotes.

09Build planTwelve-week execution path with weekly milestones. Conditional on BUILD.

10Go-to-marketThree distribution motions modeled. The one we recommend for year one.

11Sources~40 cited, tier-graded. Every URL fetched and grep-verified before ship.

A1Run recordGeneration time, pipeline version, model lineup, gate results, re-run hooks.

See the sample reportA real verdict on a solo-built AI dev tool. 42 cited sources. Build call, three named risks.

MULTI-MODEL ORCHESTRATION

Seven models. Each one used where it is strongest.

One model is one opinion. The pipeline routes each task to the strongest model, then to a different lineage for adversarial review. Different training data, different failure modes.

MODEL

ROLE

USED IN

WHY THIS ONE

Claude Opus 4.7

Long-horizon synthesis

Stages 03, 04, 05

Highest-quality multi-step reasoning. Used where the cost of a wrong call is highest.

Claude Sonnet 4.6

Structured extraction

Stages 01, 02

Fast, reliable, follows schemas. Used for parsing briefs and sizing math.

Claude Haiku 4.5

Citation linting & formatting

Stage 05

Cheap and exact. Used where we need throughput, not creativity.

GPT-5

Adversarial second opinion

Stages 01, 02, 03, 04, 05

Different lineage means different failure modes. Used to break what Claude built.

o4-mini

Numerical verification

Stage 04

Reasoning model. Used to verify benchmark math holds under scrutiny.

Gemini 2.5 Pro

Long-context ingestion

Stage 03

1M+ token context. Used to read entire competitor documentation, S-1s, and roadmap dumps in one pass.

Perplexity Sonar Pro

Live source corroboration

Stages 02, 03

Real-time web grounding. Used to cross-check numbers against the freshest available data.

SOURCE GRADING

Every citation is tier-marked.

A claim from the SEC and a claim from Reddit aren't the same claim. Numbers your deal hinges on come from Tier 1. Tier 2 is context. Tier 3 is reading the room.

We pull from 180 named sources below. A typical report cites about 40 of them, chosen for relevance to your idea and weighted by tier. Every cited URL is fetched and the claim grep-verified before ship.

TIER 1 · PRIMARY

SEC, USPTO, FRED, BLS, Eurostat, Gartner, IDC, Forrester, peer-reviewed. Where the deal numbers come from.

TIER 2 · TRADE PRESS

The Information, Bloomberg, FT, Reuters, Stratechery, MIT Tech Review, HBR. Context and corroboration.

TIER 3 · SENTIMENT

G2, Hacker News, Reddit, Glassdoor. To read the room. Never as fact.

TIER

Fourteen checks. Any failure blocks ship.

Fourteen automated gates. A failure returns the draft to the relevant stage with the flagged claim. The memo you receive is the one that passed all fourteen.

Source tier coverageEvery numeric claim must cite at least one Tier 1 source.

Citation freshnessMarket size and competitor capital data must be ≤ 18 months old or explicitly flagged as stale.

Conflict surfaceWhen two Tier 1 sources disagree by > 5%, both are shown and the gap is noted. We do not silently average. Gaps > 40% downgrade the claim to NEEDS MORE EVIDENCE and block ship.

Falsifier presenceThe named kill criterion must appear verbatim in the final memo.

Adversarial reviewA second model must attempt to break the 10× claim. The transcript is attached.

Substitute namedThe do-nothing or strongest substitute must be explicitly identified, not assumed.

Unit economics sanityACV × ICP density must reconcile with the SOM within an order of magnitude.

No hallucinated citationsEvery cited URL is designed to be fetched at ship time and the cited claim grep-verified against the page text. Any 404 or text-mismatch blocks the gate.

Reasoning trace completenessEvery recommendation links to the chain of intermediate outputs that produced it.

Template conformanceMemo follows the fixed five-section template. Deviations require an override flag.

Recommendation discretenessOutput is one of BUILD · PIVOT · KILL (founder frame) or PURSUE · RECONSIDER · REJECT (investor frame). No prose hedge.

Re-run hooks attachedThe three weakest claims ship with a one-click re-run for when new evidence appears.

Run record & timestampEvery memo ships with a run record: generation time, pipeline version, model lineup, gate results, and re-run hooks. The memo is a decision-support artifact, not a legal attestation.

Human-readable diffIf the memo is re-run later, the diff against the prior verdict is rendered, not just the new file.

See a memo that passed all fourteen

THE FALSIFIER LOOP

Every verdict names what would change the call.

We name the specific evidence that would flip the call. The leading incumbent ships parity in 18 months. Cycle stretches past 14. ICP shrinks below 4,000. If it shows up, the verdict changes. (Illustrative example.) That's the difference between a useful memo and a confident paragraph.

NAMED IN INTAKE

You stake it. The pipeline can't test what you haven't.

TESTED IN STAGE 04

Run as an inverse claim. If it holds today, the verdict downgrades.

RESTATED IN STAGE 05

Reproduced in the memo with the threshold that would trigger a re-run.

Get a verdict with a named falsifier

DELIBERATE OMISSIONS

What Verdikt refuses to do.

A methodology is defined by what it rejects. Six things the pipeline won't do.

Top-down market sizing

No Gartner number times an arbitrary penetration rate. Top-down-only TAM is flagged and the claim is downgraded.

Single-source confidence

No recommendation rests on one source. Without two independent Tier 1 inputs, the claim drops to NEEDS MORE EVIDENCE.

Averaging conflicting data

When two credible sources disagree, both are shown with the gap explained. Quiet averages make bad decisions.

Vibes-based moats

No 'strong network effects' without a metric trending the right way. Otherwise the moat is marked speculative.

Hedged recommendations

No 'it depends.' Four discrete calls with a named kill criterion. We will be wrong sometimes, never vague.

Training on your idea

Your brief, trace, and memo never train any model. Zero retention on every LLM call.

Run an idea through the pipelineTarget under 1 hour. $49.99. Refund or re-run if it's not defensible.

Verdikt reports are advisory research, not legal, financial, investment, or business advice.