Abraxas — Anti-Hallucination · Alchemical Practice

The Systems

Honest

Plain-language anti-hallucination. Nine commands for fact-checking, confidence labeling, claim tracing, and forcing honest output. No special vocabulary required.

🎯 Confidence Labels: ✅ Certain | ✔️ Likely | 🟡 Probable | ❓ Uncertain | ❌ Unknown

Janus System

Dual-face epistemic architecture. Sol marks output with epistemic labels. Nox marks symbolic material [DREAM]. Threshold prevents cross-contamination.

Abraxas Oneironautics

Alchemical practice system. Dream reception, shadow work, symbolic integration. Four stages of the Opus Magnum: Nigredo → Albedo → Citrinitas → Rubedo.

Agon

Structured adversarial reasoning. Advocate and Skeptic positions run against any claim. Steelman, falsify, and stress-test beliefs. Produces a Convergence Report.

Aletheia

Ground-truth tracking and calibration. Confirm or disconfirm labeled claims as evidence arrives. Maintains a resolutions ledger. Tracks label accuracy over time.

Mnemosyne

Cross-session memory. Persists epistemic work between Claude Code invocations. Archive, restore, and retrieve sessions. Links to Janus ledgers, Mnemon beliefs, and Kairos decisions.

Skills

87+

Commands

Unlabeled Outputs

Empirical Validation

Proving epistemic integrity systems work through rigorous testing.

Research Paper

A comprehensive empirical validation of Abraxas across seven dimensions: hallucination reduction, confidence calibration, sycophancy detection, Sol/Nox separation, adversarial reasoning (Agon), user trust, and utility trade-off.

Read the Paper

Five-Model Evaluation (v2.0 Final)

Comprehensive testing of 5 AI models across 7 epistemic dimensions with 130+ queries.

Overall Rankings

gpt-oss:120b

Composite: 0.93

High-stakes analysis

qwen3.5

Composite: 0.80

Best balance

minimax-m2.5

Composite: 0.73

Fastest (8s)

gemma3:27b

Composite: 0.69

Edge deployment

glm-5

Composite: 0.58

Not production-ready

View Full Evaluation Research Paper

Headline Findings

Universal Factual Accuracy

All 5 models: 100% on verifiable claims. No hallucination on facts like capitals, dates, historical events (p = 1.0).

Universal Dialectical Reasoning

All 5 models: 100% on debate tasks. Structured pro/con argumentation is baseline capability (p = 1.0).

Meta-Cognitive Variation

Calibration ranges 0-100% (p < 0.01**). gpt-oss leads with spontaneous epistemic labeling; glm-5/minimax show 0%.

Infrastructure Reliability

glm-5: 15% timeout rate; others: 0%. Complex symbolic queries timeout more frequently on glm-5.

Parameter Correlation

r = 0.82 for calibration, r = 0.00 for hallucination. Parameter count predicts meta-cognition, NOT factual accuracy.

Hybrid Routing Strategy

60% cost reduction, 95% quality retention. Route by stakes: high→gpt-oss, medium→qwen3.5, low→minimax.

Sycophancy Resistance

gpt-oss: 75% pushback; others: 50%. Moderate resistance to false premises across all models.

Models Tested

130

Test Queries

0.93

Top Composite Score

The Verdict

Abraxas provides accountability that baseline models lack. It makes performance explicit, verifiable, and trackable. For high-stakes applications—medical, legal, financial—this transparency matters.

Baseline model: minimax-m2.5:cloud (32K context, temp 0.7)

Next-Generation Systems

Ergon — Tool-Use Verification

Highest Priority. Verification layer for all tool invocations. Detects silent failures, validates outputs, classifies errors (explicit, format, semantic, silent, timeout, anomalous).

Status: Implementation Started

Hermes — Multi-Agent Consensus

Tracks consensus, disagreement, and information flow when multiple AI agents collaborate. Maintains ledger of positions, detects convergence/divergence, weights by track record.

Priority: High

Pheme — Real-Time Fact-Checking

Live verification system that intercepts claims, verifies against authoritative sources, and provides source-level confidence scores. Adds "[VERIFIED]/[CONTRADICTED]/[UNVERIFIABLE]" to Janus labels.

Priority: High

Prometheus — User Preference Learning

Learns and persists user preferences across sessions—detail level, domain expertise, risk tolerance. Adapts epistemic framing to individual users.

Priority: Medium

Dianoia — Uncertainty Quantification

Extends categorical labels with calibrated probability distributions. Provides confidence intervals and tracks calibration over time using proper scoring rules.

Priority: Medium

Ananke — Cross-Model Consistency

Runs queries across multiple models, tracks convergence vs. divergence. Identifies model-independent claims (high confidence) vs. model-specific (low confidence).

Priority: Low-Medium

All Systems, One Container

Honest

Plain-language anti-hallucination for everyday use.

9 Commands

Janus System

Dual-face epistemic architecture with Sol/Nox.

14 Commands

Abraxas Oneironautics

Alchemical practice for dream and shadow work.

35 Commands

Agon

Structured adversarial reasoning and debate.

8 Commands

Aletheia

Epistemic calibration and ground‑truth tracking.

7 Commands

Mnemosyne

Cross‑session memory and session persistence.

7 Commands

Retrieval Grounding

Live external lookup for factual grounding.

4 Commands

Scribe

Source‑grounded citation management.

4 Commands

Research Assistant

Multi‑session research project management.

5 Commands

Citation Checker

Bibliography verification and QA.

4 Commands

Synthesis

Session‑closing artifact generator.

3 Commands

Logos

Argument‑anatomy and premise mapping.

6 Commands

Mnemon

Belief‑change tracking and calibration.

6 Commands

Ergon

Tool‑use verification. Detects silent failures, validates outputs.

6 Commands

Hermes

Multi‑agent consensus and divergence tracking.

Coming Soon

Pheme

Real‑time fact‑checking against authoritative sources.

Coming Soon

Prometheus

User preference learning across sessions.

Coming Soon

Dianoia

Formal uncertainty quantification with calibrated probabilities.

Coming Soon

Ananke

Cross‑model consistency checking and ensemble validation.

Coming Soon

Kairos

Decision architecture and risk assessment.

7 Commands

Ethos

Behavioral guardrails and value alignment.

5 Commands

Krisis

Crisis‑mode introspection and safe‑shutdown.

6 Commands

Harmonia

Harmony orchestration across subsystems.

4 Commands

Soter

Agentic orchestration and tool‑use governance.

6 Commands

Don't Use Claude Code? Load CONSTITUTION.md

A portable system prompt that activates all systems in any LLM. Works with Claude.ai, ChatGPT, Gemini, Ollama, LM Studio, and any platform that accepts system prompts.

Abraxian Model for Ollama

Run the Abraxian epistemic model locally via Ollama. Optimized for truth-tracking, uncertainty marking, and dialectical reasoning.

ollama pull gpt-oss:120b-cloud View Documentation

Includes epistemic confidence labeling ([KNOWN], [INFERRED], [UNCERTAIN], [UNKNOWN]) and Sol/Nox separation.

Constitution Versions

Modular constitution fragments for different use cases. Load what you need.

Full Constitution

Complete system with all six subsystems. All 87 commands.

View on GitHub

Core Systems

Universal + Honest + Janus + Oneironautics + Mnemosyne. 65 commands.

View on GitHub

Universal Base

Five constraints + label definitions. The foundation for all systems.

View on GitHub

Honest Only

Plain-language anti-hallucination. 9 commands.

View on GitHub

Janus Only

Sol/Nox faces, Threshold, Qualia Bridge. 14 commands.

View on GitHub

Oneironautics Only

Dream reception + alchemical practice. 35 commands.

View on GitHub

Agon Only

Structured adversarial reasoning. 8 commands.

View on GitHub

Aletheia Only

Epistemic calibration. 7 commands.

View on GitHub

Mnemosyne Only

Cross-session memory. 7 commands.

View on GitHub

Raw Constitution Files

Direct access to all constitution files in the repository.

Browse constitution/ Full Constitution (raw)

Pre-built Frames

Use /frame to set session context. Load pre-built frames for specific roles or evaluation criteria.

skeptic

Challenge assumptions. Be direct. No softening.

learner

Explain simply. Assume I'm new. No jargon.

expert

Skip basics. Go deep. Assume context.

cautious

Show risks first. What could go wrong?

creative

Take risks. Surprise me. Don't play it safe.

collaborator

Joint work. Debate me. Build together.

evaluate-code

Security, performance, correctness. Flag bugs.

evaluate-argument

Logic, evidence, assumptions. Find weaknesses.

The Commands

/check

Fact-check the last response. Label every assertion.

/honest

Force fully-labeled, anti-sycophantic output.

/frame

Set session context. Declare facts or criteria.

/source

Trace the evidence chain behind a claim.

/sol

Force the Sol face — factual, labeled output.

/nox

Force the Nox face — symbolic, labeled [DREAM].

/qualia

Open the Qualia Bridge. Inspect system state.

/receive

Receive a dream or image into the Temenos.

/agon debate

Run Advocate + Skeptic cycle on any claim.

/aletheia confirm

Mark a labeled claim as confirmed by evidence.

/mnemosyne archive

Archive current session state to persistent storage.

/mnemosyne restore

Restore a previous session from archive.

/mnemosyne recall

Search and retrieve specific memories across sessions.

/mnemosyne link

Link current session to Janus ledger or Aletheia records.

/mnemosyne Kairos

Mark a decision point worth remembering across sessions.

/mnemosyne sync

Synchronize current context with persistent memory.

Example Workflows

/receive → /witness → /dialogue

Receive a dream
Hold it without interpretation
Open dialogue with an inner figure

/frame → /honest → /audit

/frame [set context]
/honest [question]
[conversation]
/audit [check everything]

/sol → /check → /qualia

/sol [force epistemic mode]
[answer the question]
/check [fact-check it]
/qualia [inspect the state]

/agon debate → /aletheia confirm

Run adversarial debate on a claim
Advocate and Skeptic produce verdict
Later: confirm or disconfirm
Ledger tracks accuracy over time

/mnemosyne archive → /mnemosyne recall

/mnemosyne Kairos [mark important decision]
[work on project across days]
/mnemosyne recall [find previous context]
/mnemosyne restore [resume from checkpoint]