Hypernym · Product Map · 2026 · 05 · 07 · Hypercore + Modulum

The platform layer for AI that comprehends, remembers, and scales.

Two platforms, one company. Hypercore is the comprehension engine for context your frontier model won't touch — domain-grounded retrieval, agentic research, mechanical confidence, structural provenance. Modulum is the universal drop-in inference platform — effective infinite context, persistent expertise, no weights modified, 3.04× decode speedup. Composable. Independent. Powerful together.

01The two platforms

AI breaks in two places. Hypernym builds the platform layer for both. The platforms ship as separate products, ship together as a stack, or compose into custom builds for partners.

Platform 01 · Comprehension Layer

Hypercore

"Models can't reliably reason over your domain."

Domain-grounded retrieval, agentic research, structured memory, runtime compression, provenance. Point it at any corpus, define a domain config (YAML), and your team — or your agents — get structured retrieval, entity resolution, agentic research, and provenance-preserving search. Infrastructure. Not a chatbot. Not a frontend.

4products bundled or standalone
6architecture layers
21biomedical databases on Osmium
VPCby default · per-domain isolation
Platform 02 · Inference + Memory Layer

Modulum

"Models can't hold context, remember, or run cheap."

The universal drop-in inference platform. Drops into any transformer. No weights modified. No training. No data center. Effective infinite context. Persistent expertise that survives restarts. Provisional patent filed. 7 modular inference-time components. 17 claims.

3.04×decode speedup
−47%domain perplexity
75%of attention is noise
8B beats 228Bby 32.4% on domain

02Hypercore — what it is, how it ships

"The comprehension engine for context your frontier model won't touch." LLM self-assessed confidence misses by ~59%. Models refuse domain questions experts need to ask, or answer with uncalibrated confidence. Hypercore is the layer underneath.

Four pillars · what generic retrieval can't do

01

The agent writes its own SQL

No rigid query classifier. Generalizes across verticals without rebuild.

02

Mechanical confidence

source_type × grounding × corroboration. Every claim 0.0–1.0, math visible.

03

Structural provenance

Every finding links to source database, query, and agent turn. Citations validated against actual results.

04

Grounded start

Copy-on-write prompts + deterministic pre-agent workflows. Agent opens turn with facts, not a blank slate.

Four products · bundled or standalone

01 · Full deployment
Hypercore Engine
Complete platform configured to corpus. Structured retrieval, agentic research, confidence math, provenance.
VPC · 4–6 weeks to prod · per-pilot then per-seat or per-query
02 · Runtime compression
Magic
API + plugins for Claude Code · Codex · Devin. 30–60% raw speedup on agentic coding. SWEBench Verified. 87% context compression.
Plugin + API · same-day install
03 · Context compression
Omnifact API
60 stochastic trials, frequency-ranked semantic facts. Compresses long context into ranked, citable facts. Standalone-sellable.
Drop-in REST · authenticated · pay-per-call · instant
04 · Semantic memory
HyperRemember API
Embeddings + fact-based reranking. Long-running memory that doesn't drift. Pairs with any agent build.
API call · instant · pairs with Modulum for full persistent memory
Proof · Osmium / biomedical

Same question. Different system. Different outcome.

34/35claims grounded in source
0.85avg confidence · 0.51 min · 0.98 max
6PMID citations validated
23databases · 312K entities · 842K xrefs

03Modulum — what it is, why it works

"The universal drop-in inference platform." Drops into any transformer. No weights modified. No training. No data center. Works on what you already run. The provisional patent covers 7 modular inference-time components and 17 claims.

Numbers, not vibes · 38 measurements · 3 corpora · 7 context lengths

What Modulum delivers, measured.

3.04×decode speedup · scales with context
−47%domain perplexity · lower is better
−14.18%below F16 · cleaner than full precision
17 claims7 components · provisional filed
"38 improvements · 0 regressions · 0 speed cost. The optimization isn't model-specific — it's how transformers work."

Why it works · the 75% finding

a

Most of attention is noise

Three-quarters of KV cache entries contribute nothing to attention output. Confirmed across Llama 3.1 8B (24 of 32 heads) and MiniMax M2.5 228B (36 of 48 heads). Both exactly 75.0%.

b

4 companies. 1 algebra.

Models from Meta, OpenAI-adjacent, Alibaba, MiniMax independently converge on the same structural pattern. Universal across architectures.

c

Scale inversion: 8B beats 228B

Llama 3.1 8B annealed PPL 3.86 vs MiniMax M2.5 228B cold PPL 5.71 on the same domain text — neither in any training set. 32.4% gap.

d

No domain hallucination

Vocabulary output restriction eliminates out-of-domain hallucinations entirely. Catastrophic forgetting solved at inference time.

Three deployment paths

Available now
Inference products
Modulum-powered inference services. Bundled with Hypercore or standalone. Drops into existing transformer stacks.
Drop-in inference · 1–2 weeks · enterprise · devs
In development
Proprietary chip
Hardware co-design. Modulum's architecture in silicon. Next-generation efficiency for AI infrastructure.
Chip co-design · strategic · multi-year roadmap
Roadmap
Model partnerships
Available to model providers through strategic partnerships. Faster, cheaper, more capable models for the industry.
Partnership · hyperscalers · model providers

04When Hypercore + Modulum compose

The platforms compose into the memory layer the industry has been missing. Independent products. Components compose for custom builds — vertical agents, world models, persistent-memory APIs.

End-to-end persistent memory. Hypercore brings structured memory with provenance, source-chain citations, confidence math per claim, audit-ready retrieval traces. Modulum brings inference-time memory persistence, effective infinite context in fixed memory, domain expertise across session boundaries — no retraining. Joint · Hypernym Confidential · Page 14

Custom build · world models
Domain reasoning systems
Grounded recall (Hypercore) + inference-time efficiency (Modulum). Built for narrow worlds. Real-time reasoning over your corpus.
Custom build · vertical agents
Long-running domain agents
Hypercore for structured grounding. Modulum for memory persistence. Run for hours, days, weeks without drift.
Custom build · persistent memory APIs
Developer drop-in
Memory store for any app — B2C, B2B, embedded. Cleaner than RAG. Cheaper than fine-tuning. Auditable.
Custom build · proprietary models
Bring your own model
Bring your model + corpus. Hypernym brings the platform layer that makes it deployable.

05Five zero-to-one product opportunities

From the Pivot-mode panel (Grok + Gemini, scope: pure Hypernym product surface). Each opportunity is a NEW product Hypernym can ship by composing existing components — not a feature, not an integration with anything outside the company. Names converged or diverged are noted.

01 · The single highest-ROI compose

Persistent Domain Schema · Locus · Fact Inference Stack

Built from OmnifactHyperRememberHypercore EngineModulum components
both Grok + Gemini converged

A standardized, portable specification for the output of the Hypercore Engine — a "compiled domain" graph of entities, verified facts, confidence scores, and source provenance. Modulum's inference engine is re-architected to natively load and query this schema at near-zero latency, treating it as a first-class component of the attention mechanism. Same primitive, two names: Gemini calls it Persistent Domain Schema (PDS); Grok calls it the Persistent Fact Inference Stack.

Why it's the load-bearing move
Locks Hypercore + Modulum into one substrate. Defines the platform as the memory OS for transformers.
Buyer
Enterprise developers · AI labs · hyperscalers building long-running agents. Wedge: paid pilot creating a Locus from customer's private data.
Follow-on products it unlocks
Domain-as-a-Service (pre-built PDSs: FDA Trials, US Case Law) · Locus-in-a-Box (self-hosted PDS server) · Anneal Marketplace (models pre-loaded with certified PDSs) · Living Documents (self-updating knowledge bases — the foundation for World Models)
Falsification (4 weeks)
Customer model + Locus achieves >50% reduction in domain hallucinations and >2× query-response improvement on a mutually agreed benchmark.
02

Modulum Locus · the persistent expertise cache

Built from Modulum 7 componentsHyperRememberOmnifact
both Grok + Gemini, named differently

An API-addressable, persistent expertise cache. Point it at a corpus, it creates a "locus" of facts; subsequent inference calls to any model are routed through the locus, which injects domain context and prunes irrelevant attention heads in real-time. Turns any stateless foundation model into a stateful domain expert without fine-tuning. Solves catastrophic forgetting and domain drift at the infrastructure level — as a service.

What it does (concrete)
Loads facts once · recycles 75% noisy KV cache entries · reranks embeddings on session restart · vocabulary output restriction prevents out-of-domain hallucination.
Buyer · wedge
Enterprise dev / AI lab. Wedge: paid pilot benchmarking customer's existing model with/without the Locus on a domain-specific Q&A task.
03

Hypernym Eigen · agent acceleration proxy

Built from MagicModulum decode speedup
Gemini · variant from Grok ("AgentAnnealer")

A containerized, intercepting proxy for agentic development. It sits between developer agent code (Devin, Codex, Claude Code, custom scripts) and the model API, applying context compression and inference acceleration to dramatically lower token costs and latency. Decouples agent logic from inference economics — developers run more complex multi-turn agents on smaller, cheaper models that perform like larger ones.

Wedge surface
Existing Magic plugin → free developer-tier Eigen container (Docker) with usage-based pricing on the acceleration.
Falsification (1 week)
Run customer agent through Eigen — demonstrate >60% reduction in API cost and >40% end-to-end task completion speedup.
04

Hypernym Chain / ProvenanceShield · audit-grade middleware

Built from Hypercore (provenance + confidence)HyperRememberModulum vocab restriction
both Grok + Gemini, distinct framings

Compliance and audit middleware that generates an immutable, verifiable "receipt" for every AI-generated conclusion. Logs the exact sources consulted, the agent's reasoning path, the mechanical confidence score per claim — chain of custody from query to answer. Modulum's vocabulary restriction provides inference-time hallucination elimination; Hypercore's confidence math provides the provenance audit trail. Sells certainty and defensibility, not just model capability.

Buyer
Chief Risk / Compliance Officers in regulated enterprises (finance, insurance, legal, healthcare). TrustFoundry pilot is the wedge surface for legal opinion analysis.
Falsification (4 weeks)
Generated chain report submitted as evidence in a mock internal audit — accepted by customer's compliance team as meeting evidentiary support standards.
05

Modulum Anneal · scale-inversion SDK for model providers

Built from Modulum 7 components packaged as SDK
Gemini · adjacent to Grok's "AgentAnnealer"

A software toolkit for model providers to create domain-specific variants without retraining. Provider integrates the SDK into their inference stack, allowing them to "anneal" a small base model (e.g., 8B) with a customer's domain corpus via Modulum's cache, enabling it to outperform a much larger general model. Inverts the scaling laws for domain-specific tasks — new business model for foundation model providers: cheaper, specialized, high-margin inference instead of competing on parameter count.

Buyer · wedge
Mistral, Cohere, large enterprises with internal model teams. Wedge: direct engagement with model provider's research team to replicate the "8B beats 228B" scale inversion on their models with a neutral third-party corpus.
Falsification (6 weeks)
Demonstrate partner's 7B/8B model with Anneal beats their >70B model on domain-specific perplexity benchmark by ≥25%.
06 · Modulum chip implication

Modulum Substrate · "Attention as a Database Query"

Built from Modulum 7 components in siliconon-chip Persistent Domain Schema
both Grok + Gemini · same insight

A custom inference card that hardwires Modulum's 7 components: 75% noise detection via dedicated attention analyzers, real-time sub-1ms KV pruning. The chip would have dedicated silicon for the components, directly wired to on-chip memory holding the Persistent Domain Schema. When the transformer needs to attend, instead of full dot-product attention over a dense KV cache, it issues a hardware-level query to the PDS block — returning only the ~25% signal. Fuses inference and RAG into a single clock-cycle-level operation.

What software-only cannot achieve
Real-time, grounded world models for robotics, autonomous vehicles, AR overlays — millisecond-level latency for complex, fact-based reasoning. Software CPU/GPU context-switching cost between inference and RAG lookups disappears.
Buyer
Consumer device makers · phone OEMs · robotics platforms · AR. Strategic chip co-design partners (multi-year roadmap, in development today).
07 · Joint flagship · Hypercore + Modulum

Hypernym Sentient / EternalAgent · the outcome subscription

Built from Hypercore EngineHyperRememberModulum componentsVPC
both Grok + Gemini · convergent

A fully managed, long-running, vertical-specific AI agent that lives in the customer's VPC. Sold as a solution, not a tool. Customers subscribe to "Sentient Clinical Researcher" or "Sentient Underwriting Analyst" — not an API. Hypernym manages the entire stack: Hypercore continuously ingests new domain data, updating a live PDS; the Modulum-powered agent uses persistent memory to perform its function over weeks and months, becoming progressively more expert without retraining.

Who buys
C-suite / Head of Business Unit. Buying an outcome ("accelerate drug discovery by 30%"), not an engineering project. Significant annual subscription — value of a tireless, domain-expert digital employee.
The moat
The state. After one year, the Sentient agent's PDS — refined by continuous data ingestion and interaction — represents a unique, auditable, irreplaceable corporate asset. Switching cost = abandoning a year's worth of accumulated, structured corporate memory.
08 · NEW outlier · Pivot mode preserved

Modulum Rosetta · the model MRI

Built from Modulum diagnostic IP"4 companies · 1 algebra"
Gemini outlier

An analysis service that ingests a customer's proprietary foundation model and a target corpus, then produces a "Structural Efficiency Map" detailing exactly which attention heads and layers are redundant or contribute only noise for that specific domain. Provides a surgical pruning/distillation strategy. Sells insight, not implementation. Productizes Hypernym's core "75% of attention is noise" discovery as a unique diagnostic capability — a "model MRI."

Buyer
AI research labs at hyperscalers · sovereign wealth funds · large enterprises training their own foundation models (Bloomberg, Apple, etc.).
Falsification (2 weeks)
Customer uses report's recommendations to create a pruned version of their model that is ≥30% smaller and faster while retaining ≥99% of performance on key benchmarks.
09 · NEW outlier · Pivot mode preserved

EchoCore · streaming corpus → drift-free memory

Built from Hypercore (intake + stream layers)Modulum (persistent expertise + below-F16 cleaning)
Grok outlier

Streams live corpus updates (real-time sensor data, market feeds, news) into Hypercore workflows for incremental entity resolution; anneals Modulum inference to clean and persist evolving expertise without full re-inference; outputs delta-updated fact graphs with confidence deltas tracked per stream event. Turns static transformers into adaptive systems for time-series domains where updates previously required full recompute and lost historical context.

Buyer · wedge
Energy (grid operators monitoring dynamic loads) · Finance (real-time risk). Stream API pilot on synthetic grid data (1K events/hour), delivering updated graphs in <5min latency.
Falsification (4 weeks)
Simulate 1-week stream on energy corpus — verify <5% drift in recall accuracy vs baseline 25% loss.

06Vertical wedge plays

One platform, every subsidiary. Each domain plugs into the same Hypernym foundation. Different YAML, different corpus, different model — same platform, same governance. The verticals where the panel converged.

Healthcare · biomedical R&D

Osmium Pro / Sentient Clinical Researcher

Hypercore Engine on the Osmium deployment (PMID, ClinicalTrials, ClinVar parsers) + Omnifact for trial fact extraction + Modulum annealing for hypothesis generation.

Osmium flagship (23 DBs · 312K entities) → Pro deployment with persistent trial memory → 10+ pharma pilots
Legal · opinion + contract analysis

CaseVault · TrustFoundry expansion

Hypercore Engine for docket ingestion + ProvenanceShield for scored precedent matching + HyperRemember for case history persistence.

TrustFoundry pilot (35 files · 18.6K facts) → VPC deployment → 50-lawyer firm vertical with contract review modules
Finance · underwriting + quant

Hypernym Signal · RiskAnnealer

Hypercore Engine + Modulum Locus on financial news, SEC filings, alternative data. Source-chain reasoning over policy, claim, and submission data.

10-year proprietary research → quant fund pilot (3 alpha factors, 5 risk events) → claims adjudication suite
Energy · grid + reservoir

Hypernym Gridmind · GridRecall

Hypercore for SCADA + maintenance logs + weather. Modulum at the edge for predictive maintenance agents. Magic in field-engineer toolchains.

Single substation pilot → entire grid digital twin → load balancing + outage response
Materials · advanced manufacturing

Hypernym Axiom

Hypercore on materials science papers + patents + lab test data. Modulum Anneal creates small fast models reasoning about chemical compositions and physical properties.

Single polymer or alloy class → cross-domain reasoning (battery + aerospace) → novel formulations
AI labs · training + inference

ModelMemory · Modulum Rosetta engagement

Modulum components + Hypercore Engine for corpus grounding + Rosetta diagnostic. Per-model deployment. 7 modular components, weight-frozen integration.

Benchmark pilot → custom co-build for lab inference → joint model releases · 3–5 lab partnerships

07How customers engage

Three depths of integration. Pick a depth — Hypernym meets you there. Most partners start at API and graduate up. Three steps, low commitment to start. Typical sample turnaround: 1 to 2 weeks.