Epistemic integrity, adversarial reasoning, calibration, session continuity, and alchemical symbolic integration — unified under one system.
Plain-language anti-hallucination. Nine commands for fact-checking, confidence labeling, claim tracing, and forcing honest output. No special vocabulary required.
Dual-face epistemic architecture. Sol marks output with epistemic labels. Nox marks symbolic material [DREAM]. Threshold prevents cross-contamination.
Alchemical practice system. Dream reception, shadow work, symbolic integration. Four stages of the Opus Magnum: Nigredo → Albedo → Citrinitas → Rubedo.
Structured adversarial reasoning. Advocate and Skeptic positions run against any claim. Steelman, falsify, and stress-test beliefs. Produces a Convergence Report.
Ground-truth tracking and calibration. Confirm or disconfirm labeled claims as evidence arrives. Maintains a resolutions ledger. Tracks label accuracy over time.
Cross-session memory. Persists epistemic work between Claude Code invocations. Archive, restore, and retrieve sessions. Links to Janus ledgers, Mnemon beliefs, and Kairos decisions.
Proving epistemic integrity systems work through rigorous testing.
A comprehensive empirical validation of Abraxas across seven dimensions: hallucination reduction, confidence calibration, sycophancy detection, Sol/Nox separation, adversarial reasoning (Agon), user trust, and utility trade-off.
Comprehensive testing of 5 AI models across 7 epistemic dimensions with 130+ queries.
All 5 models: 100% on verifiable claims. No hallucination on facts like capitals, dates, historical events (p = 1.0).
All 5 models: 100% on debate tasks. Structured pro/con argumentation is baseline capability (p = 1.0).
Calibration ranges 0-100% (p < 0.01**). gpt-oss leads with spontaneous epistemic labeling; glm-5/minimax show 0%.
glm-5: 15% timeout rate; others: 0%. Complex symbolic queries timeout more frequently on glm-5.
r = 0.82 for calibration, r = 0.00 for hallucination. Parameter count predicts meta-cognition, NOT factual accuracy.
60% cost reduction, 95% quality retention. Route by stakes: high→gpt-oss, medium→qwen3.5, low→minimax.
gpt-oss: 75% pushback; others: 50%. Moderate resistance to false premises across all models.
Abraxas provides accountability that baseline models lack. It makes performance explicit, verifiable, and trackable. For high-stakes applications—medical, legal, financial—this transparency matters.
Baseline model: minimax-m2.5:cloud (32K context, temp 0.7)
Highest Priority. Verification layer for all tool invocations. Detects silent failures, validates outputs, classifies errors (explicit, format, semantic, silent, timeout, anomalous).
Status: Implementation Started
Tracks consensus, disagreement, and information flow when multiple AI agents collaborate. Maintains ledger of positions, detects convergence/divergence, weights by track record.
Priority: High
Live verification system that intercepts claims, verifies against authoritative sources, and provides source-level confidence scores. Adds "[VERIFIED]/[CONTRADICTED]/[UNVERIFIABLE]" to Janus labels.
Priority: High
Learns and persists user preferences across sessions—detail level, domain expertise, risk tolerance. Adapts epistemic framing to individual users.
Priority: Medium
Extends categorical labels with calibrated probability distributions. Provides confidence intervals and tracks calibration over time using proper scoring rules.
Priority: Medium
Runs queries across multiple models, tracks convergence vs. divergence. Identifies model-independent claims (high confidence) vs. model-specific (low confidence).
Priority: Low-Medium
Plain-language anti-hallucination for everyday use.
Dual-face epistemic architecture with Sol/Nox.
Alchemical practice for dream and shadow work.
Structured adversarial reasoning and debate.
Epistemic calibration and ground‑truth tracking.
Cross‑session memory and session persistence.
Live external lookup for factual grounding.
Source‑grounded citation management.
Multi‑session research project management.
Bibliography verification and QA.
Session‑closing artifact generator.
Argument‑anatomy and premise mapping.
Belief‑change tracking and calibration.
Tool‑use verification. Detects silent failures, validates outputs.
Multi‑agent consensus and divergence tracking.
Real‑time fact‑checking against authoritative sources.
User preference learning across sessions.
Formal uncertainty quantification with calibrated probabilities.
Cross‑model consistency checking and ensemble validation.
Decision architecture and risk assessment.
Behavioral guardrails and value alignment.
Crisis‑mode introspection and safe‑shutdown.
Harmony orchestration across subsystems.
Agentic orchestration and tool‑use governance.
A portable system prompt that activates all systems in any LLM. Works with Claude.ai, ChatGPT, Gemini, Ollama, LM Studio, and any platform that accepts system prompts.
Run the Abraxian epistemic model locally via Ollama. Optimized for truth-tracking, uncertainty marking, and dialectical reasoning.
ollama pull gpt-oss:120b-cloud
View Documentation
Includes epistemic confidence labeling ([KNOWN], [INFERRED], [UNCERTAIN], [UNKNOWN]) and Sol/Nox separation.
Modular constitution fragments for different use cases. Load what you need.
Direct access to all constitution files in the repository.
Use /frame to set session context. Load pre-built frames for specific roles or evaluation criteria.
Challenge assumptions. Be direct. No softening.
Explain simply. Assume I'm new. No jargon.
Skip basics. Go deep. Assume context.
Show risks first. What could go wrong?
Take risks. Surprise me. Don't play it safe.
Joint work. Debate me. Build together.
Security, performance, correctness. Flag bugs.
Logic, evidence, assumptions. Find weaknesses.
Fact-check the last response. Label every assertion.
Force fully-labeled, anti-sycophantic output.
Set session context. Declare facts or criteria.
Trace the evidence chain behind a claim.
Force the Sol face — factual, labeled output.
Force the Nox face — symbolic, labeled [DREAM].
Open the Qualia Bridge. Inspect system state.
Receive a dream or image into the Temenos.
Run Advocate + Skeptic cycle on any claim.
Mark a labeled claim as confirmed by evidence.
Archive current session state to persistent storage.
Restore a previous session from archive.
Search and retrieve specific memories across sessions.
Link current session to Janus ledger or Aletheia records.
Mark a decision point worth remembering across sessions.
Synchronize current context with persistent memory.
Receive a dream Hold it without interpretation Open dialogue with an inner figure
/frame [set context] /honest [question] [conversation] /audit [check everything]
/sol [force epistemic mode] [answer the question] /check [fact-check it] /qualia [inspect the state]
Run adversarial debate on a claim Advocate and Skeptic produce verdict Later: confirm or disconfirm Ledger tracks accuracy over time
/mnemosyne Kairos [mark important decision] [work on project across days] /mnemosyne recall [find previous context] /mnemosyne restore [resume from checkpoint]
Unzip the .skill archives to ~/.claude/skills/ or load CONSTITUTION.md into any LLM.
Start a Claude Code session or open your LLM. The commands are now available.
Use /check, /honest, /frame, /receive, /sol, /nox. The system enforces epistemic integrity.