New evidence layer for RAG

Near full-RAG quality, far fewer RAG tokens.

Trajbl v0.1 EXP is a new evidence-packing method that sits between retrieval and the LLM. Full-context RAG is the quality ceiling, but production systems pay for every retrieved token. Trajbl turns retrieved context into a short, traceable evidence packet. The current Phase31 benchmark shows 85% fewer tokens, 97% of full-RAG quality retained, and stronger compact answers than LLMLingua-style compression at similar compact budgets.

85%
fewer tokens than full-context RAG in the current Phase31 benchmark
97%
of full-RAG quality retained in the current Phase31 benchmark
+20%
higher judged quality than question-aware LLMLingua at similar budget
Compact-answer benchmark
RAG quality, lower token cost
Full RAG
7,023
LLMLingua
1,216
Trajbl EXP
1,088
85%fewer tokens vs full RAG
97%Phase31 full-RAG quality retained
+20%vs question-aware LLMLingua
Trajbl context-efficiency visualization
Not just compression: Trajbl v0.1 EXP keeps the proof readable, traceable, and compact before the LLM starts spending tokens.
What is Trajbl v0.1 EXP
Trajbl RAG pipeline graphic

RAG works. It just sends too much.

Modern AI assistants often send large retrieved context blocks to the model just to answer one question. Trajbl v0.1 EXP turns that pile into a compact evidence packet, so the model reads less, costs less, and still sees the proof needed for a strong answer.

The problem

Context is the cost.

Every retrieved chunk the model reads becomes recurring token cost, latency, and infrastructure load. Generic compression can reduce the bill, but it can also damage meaning and weaken the final answer.

Token bills scale with context — every extra retrieved passage is paid for on every query.
More context is not free — noise can bury the few sentences that actually answer the question.
The Trajbl v0.1 EXP answer

Keep the evidence. Drop the noise.

Trajbl v0.1 EXP uses a proprietary evidence-selection layer to forward a compact, traceable packet to the LLM. The model gets a short brief instead of a haystack, at a fraction of the token cost.

Verbatim, auditable evidence — every forwarded sentence is traceable to its source. Built for biomed, legal & compliance.
Drop-in & model-agnostic — sits between your existing retrieval and any LLM. No retraining, no GPU.
Benchmark results

Near-ceiling quality. Much lower context cost.

Trajbl is not trying to win an unlimited-context contest against full-context RAG. It makes the same RAG workflow more economical: far fewer context tokens, close to full-RAG answer quality, and stronger compact answers than tested compression baselines.

85%
Phase31 fewer tokens vs full RAG
Phase31: 1,088 tokens for Trajbl v0.1 EXP vs 7,023 for full-context RAG.
97%
Phase31 full-RAG quality retained
In the current Phase31 benchmark, Trajbl v0.1 EXP stays close to the full-context quality ceiling without paying full-context cost.
+20%
Stronger compact answers
Higher judged quality than question-aware LLMLingua at a similar token budget.

Cost vs quality, at a glance

Full RAG is the quality ceiling, not the economic target. Trajbl v0.1 EXP lands near that ceiling while using a compact packet, then scores above the tested compression alternatives in that same low-token zone.

Method Quality kept Tokens spent
Full RAGReference ceiling
100%
100%
Trajbl EXPBest tradeoff85% fewer tokens
97%
15%
LLMLingua Q-awareMicrosoft baseline
81%
17%
Normalized from Phase31 OpenAI-judged benchmark: full RAG top-12 = 0.7225 quality and 7,022.8 mean answer tokens.

Compression baseline comparison

Trajbl v0.1 EXP is not a replacement for RAG. It is a measured compact-context layer: in tested evidence-QA setups, it produces stronger compact answers than LLMLingua-family compression baselines.

LLMLingua question-aware Current Phase31, same retrieval pool, similar token budget.
+20%
LLMLingua blind Compression without question-aware selection.
+31%
LongLLMLingua-7B English multi-hop benchmark, objective F1.
+103%
These are relative quality lifts over each baseline. Full-context RAG remains the quality reference, not the compression target.
Method, Phase31 current benchmark Raw quality Quality retained Token load
Full-context RAG top-12 quality ceiling, expensive context 0.7225 100% 7,023 / 100%
Trajbl v0.1 EXP pool12 best tradeoffBest tradeoff 0.7041 97% 1,088 / 15%
LLMLingua question-aware Microsoft Research family baseline 0.5872 81% 1,216 / 17%
LLMLingua blind compression without question awareness 0.5390 75% 1,185 / 17%
Naive truncation simple cut baseline 0.6789 94% 992 / 14%

The headline is not "Trajbl v0.1 EXP beats full RAG." Full RAG is the reference ceiling, not the economic target. The headline is that Trajbl v0.1 EXP keeps near-ceiling quality while cutting token load by 85% and producing stronger compact answers than LLMLingua-style compressors.

Technical benchmark summary

Phase31 measures how much RAG quality Trajbl preserves at a fraction of the token load.

The current benchmark uses the same 109-task FairV4 scope and the same top-12 retrieval pool for each arm. Full-context RAG is treated as the quality ceiling; Trajbl v0.1 EXP is evaluated as the economical evidence-packing layer that runs between retrieval and generation.

Scope 109 evidence-QA tasks, same retrieval source for all compared arms.
Primary metrics Blind judged answer quality, mean answer tokens, and forbidden rate.
Main result 0.7041 quality vs 0.7225 full RAG, with 1,088 vs 7,023 tokens.
Fairness control Question-aware LLMLingua is included at a similar compact token budget.
Benchmark controls
Reference ceiling Full-context RAG top-12 remains the absolute quality reference.
Best tradeoff Trajbl v0.1 EXP pool12: 97% of full-RAG quality with 85% fewer tokens in Phase31.
Compression baseline LLMLingua question-aware: 0.5872 quality at 1,216 mean answer tokens.
Claim boundary: this is an experimental benchmark result, not a claim that Trajbl universally beats full-context RAG. The defensible claim is near-ceiling quality at much lower context cost, with stronger compact answers than tested LLMLingua-family baselines.

Quality retained, same retrieval pool

Normalized to full-context RAG as the quality ceiling. Higher retained quality is better.
Full RAG 7,023 tokens
100%
Trajbl EXP 1,088 tokens
97%
LLMLingua Q-aware 1,216 tokens
81%
LLMLingua blind 1,185 tokens
75%
In Phase31, Trajbl v0.1 EXP keeps 97% of full-RAG quality while Microsoft LLMLingua baselines score lower at similar compact scale.

Tokens sent to the model

Lower token load means lower recurring inference cost.
Full RAG
7,023
LLMLingua Q-aware
1,216
Trajbl EXP
1,088
LLMLingua blind
1,185
Trajbl v0.1 EXP uses 85% fewer tokens than full-context RAG while staying stronger than LLMLingua at similar scale.

Early cross-document signal: cheaper multi-source evidence

Cross-document QA is where token pruning can break structure. Trajbl v0.1 EXP keeps balanced evidence from multiple sources.
Token saving 1,308 vs 7,456 tokens
82%
Full-RAG quality retained 97% of the full-context score
97%
Lift vs LLMLingua Q-aware +95% relative judged quality
+95%
This is an experimental cross-document result, but it is the right kind of signal: large token savings plus stronger evidence preservation.

Validated through multiple lenses

The same efficiency pattern shows up across blind judging, objective F1, and cross-document probes.

1 — Current OpenAI blind judge

Phase31 shows Trajbl v0.1 EXP at 97% of full-RAG quality, with 85% fewer tokens and a +20% lift over question-aware LLMLingua.

2 — Objective English F1

On English multi-hop QA, Trajbl v0.1 EXP scores 0.4001 F1 vs 0.2329 for LLMLingua-2 and 0.1971 for LongLLMLingua-7B.

3 — Cross-document potential

Early cross-document testing shows near full-RAG quality with 82% fewer tokens and a large lead over question-aware LLMLingua.

Why it matters

Lower token cost, stronger compact answers, production-friendly RAG economics.

The core advantage is simple: Trajbl v0.1 EXP cuts expensive context before the LLM reads it, while preserving a traceable evidence packet for the final answer.

85% lower token load

Phase31 shows 1,088 Trajbl v0.1 EXP tokens versus 7,023 for full-context RAG, while retaining 97% of full-RAG judged quality in that benchmark.

Better than compression baselines

In the tested setup, Trajbl v0.1 EXP scores +20% over question-aware LLMLingua, +31% over blind LLMLingua, and +103% over LongLLMLingua on the English F1 benchmark.

Deterministic & model-free

No training, no GPU, runs on CPU. The same input always yields the same output — exactly reproducible, and cheap to operate at scale.

Auditable by design

Every sentence forwarded is verbatim and traceable to its source — a natural fit for regulated domains like healthcare, finance, and law.

Promising on multi-source evidence

Cross-document testing shows 82% fewer tokens, 97% of full-RAG quality retained, and a +95% lift over question-aware LLMLingua.

Model- & vendor-agnostic

Works in front of any LLM and any retrieval stack. No lock-in, no fine-tuning — it slots into systems teams already run.

The opportunity

RAG token spend is becoming infrastructure spend. Trajbl v0.1 EXP cuts it without giving up answer quality.

As AI assistants move into production, context tokens become a recurring infrastructure cost on every query. Trajbl v0.1 EXP cuts that load before generation starts: fewer tokens sent to the model, less wasted context, and compact answers that benchmark stronger than generic compression.

Recurring token savings

The benefit lands on every query, so value scales directly with customer AI usage.

Horizontal across industries

Search, support, biomed, legal, finance — anywhere evidence-grounded answers are needed.

Baseline edge

Current benchmarks show Trajbl v0.1 EXP scoring above Microsoft LLMLingua-family baselines at compact token budgets.

In plain language

How this all works — explained for anyone.

No jargon. Here's how AI assistants answer questions today, the world-standard way, and where Trajbl v0.1 EXP quietly changes the economics.

1

You ask a question

A large language model (LLM) — the technology behind ChatGPT-style assistants — is great at writing answers, but on its own it doesn't actually know your company's documents. So it can't reliably answer questions about them.

2

The system fetches documents (this is "RAG")

The world-standard fix is called RAG — Retrieval-Augmented Generation. Before answering, the system searches your files and grabs the chunks that look related to the question, then hands all of that text to the model as background reading. This is now the default way serious AI products answer from private data.

3

The catch: you pay for every word

The model is billed by the token (roughly, a piece of a word) for everything it reads. RAG tends to grab a lot of text to be safe — most of which is irrelevant. So you pay for piles of noise, answers get slower, and the few sentences that truly matter get buried.

4

The usual shortcut: squeeze the text

The standard way to cut that cost is "context compression" — tools that shrink the text by deleting words they predict are unimportant. It saves money, but it chops sentences into fragments and often loses the meaning: a "not" goes missing, a number drifts from its label, and the answer quietly degrades.

5

Where Trajbl v0.1 EXP fits in

Trajbl v0.1 EXP slots in right after the documents are fetched and before the model reads them. Instead of squeezing text blindly, it builds a short, traceable evidence packet for the question. The model gets a clean brief; you pay for a fraction of the words; and the answer benchmarks stronger than the squeeze-the-text shortcut.

The bottom line: RAG made AI useful on your own data, but every extra context token becomes cost, latency, and infrastructure load. Trajbl v0.1 EXP is the evidence layer that cuts that spend: 85% fewer tokens in the current Phase31 benchmark, 97% of full-RAG quality retained in Phase31, and stronger compact answers than Microsoft LLMLingua-family baselines tested so far.
For investors & partners

Let's talk.

Trajbl v0.1 EXP is a working evidence layer for teams that need to cut RAG token cost without losing traceability. If you're an investor, design partner, or a team running AI on your own documents, we can walk through the benchmark detail and pilot path.

Deep-dive available under NDA — token savings, LLMLingua comparisons, and methodology on request.
Pilot-ready for evidence-grounded, cost-sensitive AI workloads.
Get in touch

Contact