The platform layer for AI that comprehends, remembers, and scales.
Two platforms, one company. Hypercore is the comprehension engine for context your frontier model won't touch — domain-grounded retrieval, agentic research, mechanical confidence, structural provenance. Modulum is the universal drop-in inference platform — effective infinite context, persistent expertise, no weights modified, 3.04× decode speedup. Composable. Independent. Powerful together.
01The two platforms
AI breaks in two places. Hypernym builds the platform layer for both. The platforms ship as separate products, ship together as a stack, or compose into custom builds for partners.
Hypercore
"Models can't reliably reason over your domain."
Domain-grounded retrieval, agentic research, structured memory, runtime compression, provenance. Point it at any corpus, define a domain config (YAML), and your team — or your agents — get structured retrieval, entity resolution, agentic research, and provenance-preserving search. Infrastructure. Not a chatbot. Not a frontend.
Modulum
"Models can't hold context, remember, or run cheap."
The universal drop-in inference platform. Drops into any transformer. No weights modified. No training. No data center. Effective infinite context. Persistent expertise that survives restarts. Provisional patent filed. 7 modular inference-time components. 17 claims.
02Hypercore — what it is, how it ships
"The comprehension engine for context your frontier model won't touch." LLM self-assessed confidence misses by ~59%. Models refuse domain questions experts need to ask, or answer with uncalibrated confidence. Hypercore is the layer underneath.
Four pillars · what generic retrieval can't do
The agent writes its own SQL
No rigid query classifier. Generalizes across verticals without rebuild.
Mechanical confidence
source_type × grounding × corroboration. Every claim 0.0–1.0, math visible.
Structural provenance
Every finding links to source database, query, and agent turn. Citations validated against actual results.
Grounded start
Copy-on-write prompts + deterministic pre-agent workflows. Agent opens turn with facts, not a blank slate.
Four products · bundled or standalone
Same question. Different system. Different outcome.
03Modulum — what it is, why it works
"The universal drop-in inference platform." Drops into any transformer. No weights modified. No training. No data center. Works on what you already run. The provisional patent covers 7 modular inference-time components and 17 claims.
What Modulum delivers, measured.
Why it works · the 75% finding
Most of attention is noise
Three-quarters of KV cache entries contribute nothing to attention output. Confirmed across Llama 3.1 8B (24 of 32 heads) and MiniMax M2.5 228B (36 of 48 heads). Both exactly 75.0%.
4 companies. 1 algebra.
Models from Meta, OpenAI-adjacent, Alibaba, MiniMax independently converge on the same structural pattern. Universal across architectures.
Scale inversion: 8B beats 228B
Llama 3.1 8B annealed PPL 3.86 vs MiniMax M2.5 228B cold PPL 5.71 on the same domain text — neither in any training set. 32.4% gap.
No domain hallucination
Vocabulary output restriction eliminates out-of-domain hallucinations entirely. Catastrophic forgetting solved at inference time.
Three deployment paths
04When Hypercore + Modulum compose
The platforms compose into the memory layer the industry has been missing. Independent products. Components compose for custom builds — vertical agents, world models, persistent-memory APIs.
End-to-end persistent memory. Hypercore brings structured memory with provenance, source-chain citations, confidence math per claim, audit-ready retrieval traces. Modulum brings inference-time memory persistence, effective infinite context in fixed memory, domain expertise across session boundaries — no retraining. Joint · Hypernym Confidential · Page 14
05Five zero-to-one product opportunities
From the Pivot-mode panel (Grok + Gemini, scope: pure Hypernym product surface). Each opportunity is a NEW product Hypernym can ship by composing existing components — not a feature, not an integration with anything outside the company. Names converged or diverged are noted.
Persistent Domain Schema · Locus · Fact Inference Stack
A standardized, portable specification for the output of the Hypercore Engine — a "compiled domain" graph of entities, verified facts, confidence scores, and source provenance. Modulum's inference engine is re-architected to natively load and query this schema at near-zero latency, treating it as a first-class component of the attention mechanism. Same primitive, two names: Gemini calls it Persistent Domain Schema (PDS); Grok calls it the Persistent Fact Inference Stack.
Modulum Locus · the persistent expertise cache
An API-addressable, persistent expertise cache. Point it at a corpus, it creates a "locus" of facts; subsequent inference calls to any model are routed through the locus, which injects domain context and prunes irrelevant attention heads in real-time. Turns any stateless foundation model into a stateful domain expert without fine-tuning. Solves catastrophic forgetting and domain drift at the infrastructure level — as a service.
Hypernym Eigen · agent acceleration proxy
A containerized, intercepting proxy for agentic development. It sits between developer agent code (Devin, Codex, Claude Code, custom scripts) and the model API, applying context compression and inference acceleration to dramatically lower token costs and latency. Decouples agent logic from inference economics — developers run more complex multi-turn agents on smaller, cheaper models that perform like larger ones.
Hypernym Chain / ProvenanceShield · audit-grade middleware
Compliance and audit middleware that generates an immutable, verifiable "receipt" for every AI-generated conclusion. Logs the exact sources consulted, the agent's reasoning path, the mechanical confidence score per claim — chain of custody from query to answer. Modulum's vocabulary restriction provides inference-time hallucination elimination; Hypercore's confidence math provides the provenance audit trail. Sells certainty and defensibility, not just model capability.
Modulum Anneal · scale-inversion SDK for model providers
A software toolkit for model providers to create domain-specific variants without retraining. Provider integrates the SDK into their inference stack, allowing them to "anneal" a small base model (e.g., 8B) with a customer's domain corpus via Modulum's cache, enabling it to outperform a much larger general model. Inverts the scaling laws for domain-specific tasks — new business model for foundation model providers: cheaper, specialized, high-margin inference instead of competing on parameter count.
Modulum Substrate · "Attention as a Database Query"
A custom inference card that hardwires Modulum's 7 components: 75% noise detection via dedicated attention analyzers, real-time sub-1ms KV pruning. The chip would have dedicated silicon for the components, directly wired to on-chip memory holding the Persistent Domain Schema. When the transformer needs to attend, instead of full dot-product attention over a dense KV cache, it issues a hardware-level query to the PDS block — returning only the ~25% signal. Fuses inference and RAG into a single clock-cycle-level operation.
Hypernym Sentient / EternalAgent · the outcome subscription
A fully managed, long-running, vertical-specific AI agent that lives in the customer's VPC. Sold as a solution, not a tool. Customers subscribe to "Sentient Clinical Researcher" or "Sentient Underwriting Analyst" — not an API. Hypernym manages the entire stack: Hypercore continuously ingests new domain data, updating a live PDS; the Modulum-powered agent uses persistent memory to perform its function over weeks and months, becoming progressively more expert without retraining.
Modulum Rosetta · the model MRI
An analysis service that ingests a customer's proprietary foundation model and a target corpus, then produces a "Structural Efficiency Map" detailing exactly which attention heads and layers are redundant or contribute only noise for that specific domain. Provides a surgical pruning/distillation strategy. Sells insight, not implementation. Productizes Hypernym's core "75% of attention is noise" discovery as a unique diagnostic capability — a "model MRI."
EchoCore · streaming corpus → drift-free memory
Streams live corpus updates (real-time sensor data, market feeds, news) into Hypercore workflows for incremental entity resolution; anneals Modulum inference to clean and persist evolving expertise without full re-inference; outputs delta-updated fact graphs with confidence deltas tracked per stream event. Turns static transformers into adaptive systems for time-series domains where updates previously required full recompute and lost historical context.
06Vertical wedge plays
One platform, every subsidiary. Each domain plugs into the same Hypernym foundation. Different YAML, different corpus, different model — same platform, same governance. The verticals where the panel converged.
Osmium Pro / Sentient Clinical Researcher
Hypercore Engine on the Osmium deployment (PMID, ClinicalTrials, ClinVar parsers) + Omnifact for trial fact extraction + Modulum annealing for hypothesis generation.
CaseVault · TrustFoundry expansion
Hypercore Engine for docket ingestion + ProvenanceShield for scored precedent matching + HyperRemember for case history persistence.
Hypernym Signal · RiskAnnealer
Hypercore Engine + Modulum Locus on financial news, SEC filings, alternative data. Source-chain reasoning over policy, claim, and submission data.
Hypernym Gridmind · GridRecall
Hypercore for SCADA + maintenance logs + weather. Modulum at the edge for predictive maintenance agents. Magic in field-engineer toolchains.
Hypernym Axiom
Hypercore on materials science papers + patents + lab test data. Modulum Anneal creates small fast models reasoning about chemical compositions and physical properties.
ModelMemory · Modulum Rosetta engagement
Modulum components + Hypercore Engine for corpus grounding + Rosetta diagnostic. Per-model deployment. 7 modular components, weight-frozen integration.
07How customers engage
Three depths of integration. Pick a depth — Hypernym meets you there. Most partners start at API and graduate up. Three steps, low commitment to start. Typical sample turnaround: 1 to 2 weeks.