How it works

How Underground Commerce works.

What it is

Underground Commerce is a decision-support tool. The product gives you a structured read of where narrative and money agree (or don't) on a topic, so you can form a sharper view faster than you would from unaided media consumption. It is explicitly not a forecaster, not a trade signal, and not a crystal ball — when the editorial score and the markets score diverge on a category, the gap is information, not a prediction about which side will turn out to be right.

The system ingests reporting from curated sources, scores eight tech-business categories, and writes a paragraph explaining each score. The dashboard is the live signal. Reports — when they ship — are the interpretation. We're building this in public, so this page describes how the system works today, including what's incomplete.

Two flagship domains

We're applying the same machinery — retrieval, relevance routing, structured synthesis, divergence — to two domains where decisions get made under high informational uncertainty and there's no liquid prediction market to defer to.

  • Business strategy. Live today. Eight tech-business categories: capital markets, labor signal, regulatory pressure, AI deployment risk, plus four pending source coverage. The audience is operators forming a view on near-term tech-business risk.
  • Career transitions. Planned, in development. A DOORS-inspired tool for job seekers — given your current role and skills, what adjacent careers are worth considering by earnings, similarity, and demand? Inputs are O*NET occupations, BLS wage data, and CareerOneStop posting signals. Same shape as the business surface: structured synthesis, two reads, explained gap. Preview at /transitions.

The technical lineage runs through Radinsky & Horvitz 2013 ("Mining the Web to Predict Future Events") and the LLM-era forecasting stack (AutoCast++, PROPHET, Halawi et al.) — but the research question is adapted: not whether text mining can predict events, but whether the same machinery can produce a useful decision-support read.

Two tracks: editorial and markets

Every category gets two readings, side by side:

  • Editorial — a 0–100 score produced by a language model reading recent news for the category. Updated every six hours, with a written rationale grounded in the article pool. This is the headline number on every category card.
  • Markets — a 1–10 score derived from prediction-market data. The system pulls every binary market matching the category's filter, takes the liquidity-weighted average of the implied probabilities, then maps that to a 1–10 bucket via a per-category threshold table. Updated every 30 minutes. Visible as a small chip below each category card, and on its own page at /markets.

The two tracks deliberately disagree sometimes. Editorial reads narrative; markets read price. When they diverge, that's information — the gap is where uncertainty lives, and the /divergence page surfaces it with a one-paragraph theory of why. The point isn't to pick a winner; it's to give a reader a structured place to start their own read.

Both tracks are graded on the same internal standard: 14 days after a forecast snapshot, an operator adjudicates whether the predicted event class actually occurred. Per-track calibration shows up on /track-record as a Brier score per category × track. We publish it because operators (us, and serious readers) want to know the synthesis isn't drifting — but Brier is a calibration check, not the headline value. The headline value is decision uplift.

What it scores

Today, four categories are active:

  • Tech Capital Markets — funding, M&A, IPO and SPAC activity, valuation shifts, public-market reactions to tech earnings.
  • Tech Labor Signal — layoffs, hiring freezes, executive departures, return-to-office mandates, broader employment momentum.
  • Regulatory & Antitrust Pressure — antitrust enforcement, AI regulation, privacy rulings, FTC/DOJ/EU actions, congressional hearings.
  • AI Model & Deployment Risk — frontier-model incidents, jailbreaks, governance disputes at AI labs, safety controversies.

Four more are seeded but inactive — Active Exploitation Risk, Cloud & Infrastructure Stability, Software Supply Chain Risk, and Open Source Governance Stress. They'll go live as the specialized data sources they depend on come online.

How an editorial score is produced

Every six hours, a scheduler runs a cycle:

  1. Ingest. Pull recent reporting from curated sources — currently a mix of general-tech RSS feeds, SEC EDGAR 8-K filings, and Hacker News front-page submissions.
  2. Score. Hand the most recent ~100 articles to Claude with a system prompt describing each category, ask for a 0–100 score, a confidence value, and a one-paragraph rationale grounded in the article pool.
  3. Review. Scores with confidence below 0.75 route to a human review queue before publishing. Higher-confidence scores publish directly.
  4. Publish. The dashboard refreshes in place. Score deltas tell you when something's actually shifting versus when chatter is noise.

A typical run takes 20–25 seconds and costs under ten cents.

How a markets score is produced

Twice an hour, a separate scheduler runs the markets pipeline:

  1. Ingest. Pull current prices from prediction markets — Polymarket today, Kalshi added when its credentials are configured. Each market has a yes/no question and a price interpretable as the implied probability.
  2. Match. For each category, an operator-defined filter selects markets whose titles match a keyword set. Filters support OR-substring matches, word-boundary matching, and exclusion lists, so "fed" doesn't pull "Federica" and "layoff" doesn't pull "playoff."
  3. Combine. The matched set is reduced to a single probability via a configurable strategy — usually the liquidity-weighted average of all matched markets' yes-prices.
  4. Bucket. The probability is mapped to a 1–10 integer score via a per-category threshold table. The mapping is deterministic and public: no model interpretation, just price → bucket.

Categories without an operator-defined filter are skipped silently — markets scoring is opt-in per category. No score is better than a misleading one.

What we mean by "score"

A score is a near-term likelihood read for the next 30 days, calibrated against the article pool the system saw. It is not a probability in the strict statistical sense — there's no historical base rate yet to anchor against, and the model's confidence is its own self-report.

What you can read off a score

  • Direction. Is this category warming or cooling versus last cycle?
  • Magnitude. Roughly: 0–33 background, 34–66 elevated, 67–100 high.
  • Reasoning. The rationale tells you what specific events the model weighed.

What you can't read off a score

  • A specific event happening on a specific date.
  • An investment or trading signal. (See: not financial advice.)
  • A probability calibrated against historical outcomes. (Not yet.)

Why this is decision support, not prediction

The original 2013 paper this lineage descends from asked: can we predict future events from news? That's a hard question to win. Liquid prediction markets and well-resourced quant funds compete for the same signal, and the scoreboard (Brier score against resolved outcomes) demands a data volume we don't have. Pitching this product as a forecaster would be claiming foresight we can't back.

The question we're actually trying to answer is different: can structured synthesis of media signals materially improve human decisions in domains of high informational uncertainty, relative to unaided media consumption? That's evaluable on decision quality (calibration, regret, user-reported confidence), winnable in domains where there is no liquid prediction market — career transitions especially — and aligns with how the product already behaves. Every score has a written rationale. Every divergence has an explained gap. Nothing here is prescriptive.

What we adopt from the prediction-systems literature is the machinery — retrieval, relevance filtering, decomposition, ensembling, calibration. What we don't adopt is the goal of optimizing forecast accuracy as the externally-pitched value. Internal calibration honesty stays. Crystal-ball claims do not.

What's incomplete

Today's pipeline reads articles directly. The model's rationales reference specific companies, dollar figures, and events, but those references are reconstructed from article summaries — not linked to the underlying reporting in a way you can audit click-by-click. Occasionally that produces a confident-sounding rationale that conflates two real events into one, or attributes a quote to the wrong person. We're aware of this failure mode and treat it as a real limitation, not a footnote.

The next version of the pipeline replaces this with an event-extraction layer — every article gets parsed into structured event records, similar events get clustered into storylines, and scores get grounded in retrieved historical patterns of what tends to follow what. When that lands, every score will link to the specific evidence chain that produced it, and we'll publish a methodology page with backtest results against named events.

That work is in progress. We'll ship it when it's actually better, not when it's announced.

Sources, currently

Editorial track

  • General tech news — curated RSS feeds via Google News.
  • SEC EDGAR — Form 8-K current reports, the SEC's "material event" filing.
  • Hacker News — front-page submissions via the Algolia API.

Markets track

  • Polymarket — every active binary market, refreshed twice an hour.
  • Kalshi — same cadence; reads only when API credentials are configured.

Each source is independently configurable. Adding new sources is a matter of writing a small ingester and pointing it at the right endpoint.

Sources, planned

The four inactive categories each wait for a specific source class:

  • Active Exploitation Risk — CVE/NVD feeds, security mailing lists.
  • Cloud & Infrastructure Stability — vendor status pages, hyperscaler incident feeds.
  • Software Supply Chain Risk — package registries, signing-key compromises, security advisories.
  • Open Source Governance Stress — GitHub Archive, foundation announcements.

These activate when their sources do. Not on a calendar.

Cadence and freshness

Editorial scores update every six hours. The cadence is deliberate: minute-by-minute updates would be noise. Six-hourly score deltas are meaningful.

Markets scores update twice an hour, on a separate schedule, because that's roughly how often the underlying prediction-market prices move materially. A markets score that doesn't move for an hour is itself information.

The CDN caches public pages for five minutes. If you reload and don't see a new score, give it a moment.

Honest limitations

We've been writing this in plain language because we'd rather you understand what the system does than be impressed by what it claims. A few things worth saying outright:

  • The model can be wrong. It compresses summaries into rationales and occasionally invents specifics. Treat the rationale as a starting point for your own reading, not as primary source.
  • The categories are a v1 cut. They reflect what we think matters in tech-business risk today. They'll evolve.
  • Six hours is not real-time. If you need real-time, you need a different product.
  • The dashboard is free, and will stay free. Pro tier exists for the long-form analyst reports that interpret what the dashboard is showing.
  • Career transitions is not shipped yet. The backend exists; the user-facing tool is in development. /transitions is a preview, not a product.

back to dashboard