New evidence layer for RAG

Near full-RAG quality, far fewer RAG tokens.

Trajbl v0.1 EXP is a new evidence-packing method that sits between retrieval and the LLM. Full-context RAG is the quality ceiling, but production systems pay for every retrieved token. Trajbl turns retrieved context into a short, traceable evidence packet. The current Phase31 benchmark shows 85% fewer tokens, 97% of full-RAG quality retained, and stronger compact answers than LLMLingua-style compression at similar compact budgets.

Talk to us about investing See the results

Public evidence package now available

Review the scoped benchmark summaries, claim ledger, limitations, and controlled demo guide for Trajbl v0.1. Source code remains proprietary; this package contains public evidence only.

Download Evidence Brief (PDF) View Public Evidence Repo

85%

fewer tokens than full-context RAG in the current Phase31 benchmark

97%

of full-RAG quality retained in the current Phase31 benchmark

+20%

higher judged quality than question-aware LLMLingua at similar budget

Compact-answer benchmark

RAG quality, lower token cost

Full RAG

7,023

LLMLingua

1,216

Trajbl EXP

1,088

85%fewer tokens vs full RAG

97%Phase31 full-RAG quality retained

+20%vs question-aware LLMLingua

Try the live demo → Talk to us

Run the real Trajbl method on your own text and watch the token count drop live — then verify the quality in your own LLM. Request access → we approve you → test. It’s a focused demo; for a full evaluation on your data, contact us.

Not just compression: Trajbl v0.1 EXP keeps the proof readable, traceable, and compact — it matches a trained SOTA pruner on evidence coverage, runs with zero training, model or GPU, and never returns an empty context.

What is Trajbl v0.1 EXP

Trajbl evidence-packing overview graphic

RAG works. It just sends too much.

Modern AI assistants often send large retrieved context blocks to the model just to answer one question. Trajbl v0.1 EXP turns that pile into a compact evidence packet, so the model reads less, costs less, and still sees the proof needed for a strong answer.

The problem

Context is the cost.

Every retrieved chunk the model reads becomes recurring token cost, latency, and infrastructure load. Generic compression can reduce the bill, but it can also damage meaning and weaken the final answer.

Token bills scale with context — every extra retrieved passage is paid for on every query.

More context is not free — noise can bury the few sentences that actually answer the question.

The Trajbl v0.1 EXP answer

Keep the evidence. Drop the noise.

Trajbl v0.1 EXP uses a proprietary evidence-selection layer to forward a compact, traceable packet to the LLM. The model gets a short brief instead of a haystack, at a fraction of the token cost.

Verbatim, auditable evidence — every forwarded sentence is traceable to its source. Built for biomed, legal & compliance.

Drop-in & model-agnostic — sits between your existing retrieval and any LLM. No retraining, no GPU.

Benchmark results

Near-ceiling quality. Much lower context cost.

Trajbl is not trying to win an unlimited-context contest against full-context RAG. It makes the same RAG workflow more economical: far fewer context tokens, close to full-RAG answer quality, and stronger compact answers than tested compression baselines.

85%

Phase31 fewer tokens vs full RAG

Phase31: 1,088 tokens for Trajbl v0.1 EXP vs 7,023 for full-context RAG.

97%

Phase31 full-RAG quality retained

In the current Phase31 benchmark, Trajbl v0.1 EXP stays close to the full-context quality ceiling without paying full-context cost.

+20%

Stronger compact answers

Higher judged quality than question-aware LLMLingua at a similar token budget.

Cost vs quality, at a glance

Full RAG is the quality ceiling, not the economic target. Trajbl v0.1 EXP lands near that ceiling while using a compact packet, then scores above the tested compression alternatives in that same low-token zone.

Method Quality kept Tokens spent

Full RAGReference ceiling

100%

Trajbl EXPBest tradeoff85% fewer tokens

97%

15%

LLMLingua Q-awareMicrosoft baseline

81%

17%

Normalized from Phase31 OpenAI-judged benchmark: full RAG top-12 = 0.7225 quality and 7,022.8 mean answer tokens.

Compression baseline comparison

Trajbl v0.1 EXP is not a replacement for RAG. It is a measured compact-context layer: in tested evidence-QA setups, it produces stronger compact answers than LLMLingua-family compression baselines.

LLMLingua question-aware Current Phase31, same retrieval pool, similar token budget.

+20%

LLMLingua blind Compression without question-aware selection.

+31%

LongLLMLingua-7B English multi-hop benchmark, objective F1.

+103%

These are relative quality lifts over each baseline. Full-context RAG remains the quality reference, not the compression target.

Method, Phase31 current benchmark	Raw quality	Quality retained	Token load
Full-context RAG top-12 quality ceiling, expensive context	0.7225	100%	7,023 / 100%
Trajbl v0.1 EXP pool12 best tradeoffBest tradeoff	0.7041	97%	1,088 / 15%
LLMLingua question-aware Microsoft Research family baseline	0.5872	81%	1,216 / 17%
LLMLingua blind compression without question awareness	0.5390	75%	1,185 / 17%
Naive truncation simple cut baseline	0.6789	94%	992 / 14%

The headline is not "Trajbl v0.1 EXP beats full RAG." Full RAG is the reference ceiling, not the economic target. The headline is that Trajbl v0.1 EXP keeps near-ceiling quality while cutting token load by 85% and producing stronger compact answers than LLMLingua-style compressors.

Matched against a trained SOTA pruner

Naver Provence is the strongest open evidence pruner — a trained ~300M-parameter model. With zero training, zero model and zero GPU, Trajbl v0.1 EXP holds parity on evidence coverage and wins outright on some question types.

Metric, Trajbl vs trained Provence	Trajbl v0.1 EXP	Provence
Evidence recall MultiSpanQA coverage	0.93	0.90
Coverage at matched budget Croatian evidence frontier	0.92 / 338w	0.91 / 368w
Yes/no questions experimental Plan-A bridge+0.19 to +0.30	0.71	0.52
Cost to run operational footprint	no training / model / GPU	trained 300M model

On overall answer quality the trained model is marginally ahead — but Trajbl v0.1 EXP reaches that level with none of its training, model or GPU cost, never prunes to an empty context, and runs on any language out of the box.

Technical benchmark summary

Phase31 measures how much RAG quality Trajbl preserves at a fraction of the token load.

The current benchmark uses the same 109-task FairV4 scope and the same top-12 retrieval pool for each arm. Full-context RAG is treated as the quality ceiling; Trajbl v0.1 EXP is evaluated as the economical evidence-packing layer that runs between retrieval and generation.

Scope 109 evidence-QA tasks, same retrieval source for all compared arms.

Primary metrics Blind judged answer quality, mean answer tokens, and forbidden rate.

Main result 0.7041 quality vs 0.7225 full RAG, with 1,088 vs 7,023 tokens.

Fairness control Question-aware LLMLingua is included at a similar compact token budget.

Benchmark controls

Reference ceiling Full-context RAG top-12 remains the absolute quality reference.

Best tradeoff Trajbl v0.1 EXP pool12: 97% of full-RAG quality with 85% fewer tokens in Phase31.

Compression baseline LLMLingua question-aware: 0.5872 quality at 1,216 mean answer tokens.

Claim boundary: this is an experimental benchmark result, not a claim that Trajbl universally beats full-context RAG. The defensible claim is near-ceiling quality at much lower context cost, with stronger compact answers than tested LLMLingua-family baselines.

Quality retained, same retrieval pool

Normalized to full-context RAG as the quality ceiling. Higher retained quality is better.

Full RAG 7,023 tokens

100%

Trajbl EXP 1,088 tokens

97%

LLMLingua Q-aware 1,216 tokens

81%

LLMLingua blind 1,185 tokens

75%

In Phase31, Trajbl v0.1 EXP keeps 97% of full-RAG quality while Microsoft LLMLingua baselines score lower at similar compact scale.

Tokens sent to the model

Lower token load means lower recurring inference cost.

Full RAG

7,023

LLMLingua Q-aware

1,216

Trajbl EXP

1,088

LLMLingua blind

1,185

Trajbl v0.1 EXP uses 85% fewer tokens than full-context RAG while staying stronger than LLMLingua at similar scale.

Early cross-document signal: cheaper multi-source evidence

Cross-document QA is where token pruning can break structure. Trajbl v0.1 EXP keeps balanced evidence from multiple sources.

Token saving 1,308 vs 7,456 tokens

82%

Full-RAG quality retained 97% of the full-context score

97%

Lift vs LLMLingua Q-aware +95% relative judged quality

+95%

This is an experimental cross-document result, but it is the right kind of signal: large token savings plus stronger evidence preservation.

Validated through multiple lenses

The same efficiency pattern shows up across blind judging, objective F1, and cross-document probes.

1 — Current OpenAI blind judge

Phase31 shows Trajbl v0.1 EXP at 97% of full-RAG quality, with 85% fewer tokens and a +20% lift over question-aware LLMLingua.

2 — Objective English F1

On English multi-hop QA, Trajbl v0.1 EXP scores 0.4001 F1 vs 0.2329 for LLMLingua-2 and 0.1971 for LongLLMLingua-7B.

3 — Cross-document potential

Early cross-document testing shows near full-RAG quality with 82% fewer tokens and a large lead over question-aware LLMLingua.

Where Trajbl fits

The RAG context-reduction landscape, in plain terms.

Trajbl v0.1 EXP is not another retriever, LLM, or RAG framework. It is the evidence-packing layer after retrieval and before generation: closer to sentence-level pruning than to generic prompt compression.

Fit	Option	Category	Why teams use it	Tradeoff for Trajbl's job
1	Trajbl v0.1 EXP evidence-packing layerExact target	Deterministic sentence selection Whole, traceable evidence sentences selected after retrieval.	Designed for lower RAG token cost, readable evidence packets, reproducible behavior, and zero model/GPU overhead.	Available through partnership discussions today; technical deep-dive and deployment details available directly with the Trajbl team.
2	Provence / XProvence trained evidence pruner	Sentence-level context pruning A public trained-model direction closest to Trajbl's evidence-pruning role.	Strong external analogue for teams looking at sentence-level RAG pruning and reranking.	Requires a trained model and has less transparent decision logic than deterministic evidence rules.
3	Cohere, Jina, Voyage and other rerankers commercial or local reranking	Chunk or document reranking Reorder retrieved candidates and pass only the best ones forward.	Mature, easy to deploy, and often a strong production baseline for improving retrieval quality.	Usually ranks chunks, not sentence-level evidence packets; auditability depends on the surrounding system.
4	LLMLingua / LongLLMLingua prompt compression baselines	Token or span compression Reduce prompt length by pruning parts of the input context.	Recognized open-source baseline family for prompt compression and long-context reduction.	Can fragment evidence; in Trajbl's tested evidence-QA setup, Trajbl produces stronger compact answers.
5	LangChain Contextual Compression framework pattern	RAG orchestration A way to combine retrievers, compressors, filters, rerankers, and LLMs.	Fast path for prototyping and integrating existing compression or reranking components.	Not one algorithm; quality depends on which compressor or reranker is plugged in.
6	LlamaIndex Node Postprocessors framework pattern	RAG postprocessing toolkit Filter, reorder, or transform retrieved nodes before answer generation.	Practical ecosystem for teams already building RAG pipelines in LlamaIndex.	Useful infrastructure, but not equivalent to Trajbl unless a Trajbl-like selector is implemented or plugged in.
7	Full-context RAG quality reference	Baseline architecture Send most or all retrieved context to the LLM.	Often the quality ceiling when cost, latency, and context-window pressure are acceptable.	Expensive at scale; every retrieved token becomes recurring inference cost.

How to read this table: LangChain and LlamaIndex are frameworks where a Trajbl-like layer could live. Trajbl competes more directly with context pruners, rerankers, and prompt-compression methods. The ranking is for the evidence-packing use case, not a universal leaderboard for all RAG systems.

Why it matters

Lower token cost, stronger compact answers, production-friendly RAG economics.

The core advantage is simple: Trajbl v0.1 EXP cuts expensive context before the LLM reads it, while preserving a traceable evidence packet for the final answer.

85% lower token load

Phase31 shows 1,088 Trajbl v0.1 EXP tokens versus 7,023 for full-context RAG, while retaining 97% of full-RAG judged quality in that benchmark.

Better than compression baselines

In the tested setup, Trajbl v0.1 EXP scores +20% over question-aware LLMLingua, +31% over blind LLMLingua, and +103% over LongLLMLingua on the English F1 benchmark.

Deterministic & model-free

No training, no GPU, runs on CPU. The same input always yields the same output — exactly reproducible, and cheap to operate at scale.

Auditable by design

Every sentence forwarded is verbatim and traceable to its source — a natural fit for regulated domains like healthcare, finance, and law.

Promising on multi-source evidence

Cross-document testing shows 82% fewer tokens, 97% of full-RAG quality retained, and a +95% lift over question-aware LLMLingua.

Model- & vendor-agnostic

Works in front of any LLM and any retrieval stack. No lock-in, no fine-tuning — it slots into systems teams already run.

Language-independent, zero-training

Proven on English (+0.167 F1 vs LLMLingua) and Croatian (+0.19) with no language-specific tuning. Works on a new language instantly — no data, no retraining.

Never returns empty

Even trained pruners can cut away all context on hard inputs. Trajbl always returns a traceable evidence packet — a safety property for production and regulated use.

On par with a trained SOTA pruner

Matches Naver Provence on evidence coverage (recall 0.93 vs 0.90) — and wins yes/no questions — while using zero training, model or GPU.

Where it stands & where it's going

Built for auditability today, with a clear path to higher accuracy.

By design

What makes Trajbl different.

Whole-sentence, not token spans — Trajbl forwards verbatim, auditable sentences, the right output for legal, biomed and compliance where you must explain why an evidence was kept.

Abstract single-slot questions are where trained semantics still lead — and that is exactly what the next phase targets.

Roadmap

Core today, Pro next.

Core — today (locked). Zero-model, auditable, any language, robust. The product that ships now.

Booster — R&D. A small trained disambiguator aimed only where Core loses. Ships only if it beats Core on unseen data.

Hybrid / Pro — goal. A gold-free router picks Core or Booster per question: free auditable Core, plus higher-accuracy Pro.

The opportunity

RAG token spend is becoming infrastructure spend. Trajbl v0.1 EXP cuts it without giving up answer quality.

As AI assistants move into production, context tokens become a recurring infrastructure cost on every query. Trajbl v0.1 EXP cuts that load before generation starts: fewer tokens sent to the model, less wasted context, and compact answers that benchmark stronger than generic compression.

Recurring token savings

The benefit lands on every query, so value scales directly with customer AI usage.

Horizontal across industries

Search, support, biomed, legal, finance — anywhere evidence-grounded answers are needed.

Baseline edge

Current benchmarks show Trajbl v0.1 EXP scoring above Microsoft LLMLingua-family baselines at compact token budgets.

In plain language

How this all works — explained for anyone.

No jargon. Here's how AI assistants answer questions today, the world-standard way, and where Trajbl v0.1 EXP quietly changes the economics.

You ask a question

A large language model (LLM) — the technology behind ChatGPT-style assistants — is great at writing answers, but on its own it doesn't actually know your company's documents. So it can't reliably answer questions about them.

The system fetches documents (this is "RAG")

The world-standard fix is called RAG — Retrieval-Augmented Generation. Before answering, the system searches your files and grabs the chunks that look related to the question, then hands all of that text to the model as background reading. This is now the default way serious AI products answer from private data.

The catch: you pay for every word

The model is billed by the token (roughly, a piece of a word) for everything it reads. RAG tends to grab a lot of text to be safe — most of which is irrelevant. So you pay for piles of noise, answers get slower, and the few sentences that truly matter get buried.

The usual shortcut: squeeze the text

The standard way to cut that cost is "context compression" — tools that shrink the text by deleting words they predict are unimportant. It saves money, but it chops sentences into fragments and often loses the meaning: a "not" goes missing, a number drifts from its label, and the answer quietly degrades.

Where Trajbl v0.1 EXP fits in

Trajbl v0.1 EXP slots in right after the documents are fetched and before the model reads them. Instead of squeezing text blindly, it builds a short, traceable evidence packet for the question. The model gets a clean brief; you pay for a fraction of the words; and the answer benchmarks stronger than the squeeze-the-text shortcut.

The bottom line: RAG made AI useful on your own data, but every extra context token becomes cost, latency, and infrastructure load. Trajbl v0.1 EXP is the evidence layer that cuts that spend: 85% fewer tokens in the current Phase31 benchmark, 97% of full-RAG quality retained in Phase31, and stronger compact answers than Microsoft LLMLingua-family baselines tested so far.

For investors & partners

Let's talk.

Trajbl v0.1 EXP is a working evidence layer for teams that need to cut RAG token cost without losing traceability. If you're an investor, design partner, or a team running AI on your own documents, we can walk through the benchmark detail and pilot path.

Deep-dive available under NDA — token savings, LLMLingua comparisons, and methodology on request.

Pilot-ready for evidence-grounded, cost-sensitive AI workloads.

Get in touch

Near full-RAG quality, far fewer RAG tokens.

Public evidence package now available

RAG works. It just sends too much.

Context is the cost.

Keep the evidence. Drop the noise.

Near-ceiling quality. Much lower context cost.

Cost vs quality, at a glance

Compression baseline comparison

Matched against a trained SOTA pruner

Phase31 measures how much RAG quality Trajbl preserves at a fraction of the token load.

Quality retained, same retrieval pool

Tokens sent to the model

Early cross-document signal: cheaper multi-source evidence

Validated through multiple lenses

1 — Current OpenAI blind judge

2 — Objective English F1

3 — Cross-document potential

The RAG context-reduction landscape, in plain terms.

Lower token cost, stronger compact answers, production-friendly RAG economics.

85% lower token load

Better than compression baselines

Deterministic & model-free

Auditable by design

Promising on multi-source evidence

Model- & vendor-agnostic

Language-independent, zero-training

Never returns empty

On par with a trained SOTA pruner

Built for auditability today, with a clear path to higher accuracy.

What makes Trajbl different.

Core today, Pro next.

RAG token spend is becoming infrastructure spend. Trajbl v0.1 EXP cuts it without giving up answer quality.

Recurring token savings

Horizontal across industries

Baseline edge

How this all works — explained for anyone.

You ask a question

The system fetches documents (this is "RAG")

The catch: you pay for every word

The usual shortcut: squeeze the text

Where Trajbl v0.1 EXP fits in

Let's talk.

Contact