v0.3.0 · Latent Embedding Operating System

The operating system
that thinks in vectors.

leOS is a local AI substrate where knowledge, tools, media, routing decisions, and cached responses all live as points on the surface of a high-dimensional sphere. Agents don't search by keywords. They search by meaning, route by geometry, and learn by accumulating experience in embedding space.

Free & open source
Fully local runtime
CPU-only embeddings
6 modalities unified
01 · The Thesis

A subconscious for the model.

When you get up to walk to the kitchen you don't issue instructions to your legs. You don't think about balance, quadriceps, foot placement, or the cat in your path. Your conscious mind declares a goal. Everything else happens below that layer of awareness, handled by a vast substrate of automatic processes, cached reflexes, and accumulated muscle memory. Evolution figured out that a conscious mind needs a subconscious beneath it in order to actually function. Language models don't have one. They reason through everything from scratch, every turn, serially, at great expense. Today's agent frameworks hit a hard wall for exactly this reason: the more tools you give the model, the worse it performs. The industry's default answer is to reach for a bigger model. leOS answers differently, by building the subconscious the model is missing.

CONSCIOUS LAYER the model · slow · serial · expensive ~20% of queries reach here LLM local AWARENESS HORIZON bubbles up becomes reflex SUBCONSCIOUS SUBSTRATE · leOS fast · parallel · continuous · runs on CPU ~80% handled here REFLEX ARC muscle memory cached replay SIGNAL BUS background attention always listening SKELETONS procedural memory practiced chains DREAMING sleep & consolidation idle-time learning everything below awareness, running in parallel EMBEDDING SPACE the medium everything runs in
Two layers of cognition · the model is consulted · the substrate does the rest

leOS is not a language model. It is not a gating layer inside one either. That configuration, experts embedded as sub-networks with a router selecting which ones fire, is Mixture-of-Experts. MoE is a structural choice inside the mind. It is set at training time and learns nothing after the weights freeze. leOS is an external substrate that sits beneath whichever local LLM you run, and provides perception, reflexes, a memory that accretes, and a continuous learning loop. All of it lives in embedding space on CPU. The LLM stops being the whole system. It becomes the conscious layer: a consultant the substrate calls only when a task genuinely needs high-level reasoning. Today that conscious layer runs locally. Hosted-model support is on the roadmap once the substrate is fully tuned.

Most of what a capable agent does is not thinking. It's fetching, embedding, routing, searching, remembering, caching, filtering, recognizing. Every one of those runs below the model on CPU and never wakes it. The reflex arc is muscle memory. Familiar inputs replay cached outputs in microseconds. The signal bus is background attention. Hundreds of small signals per minute, most resolved before any conscious thought. The dreaming engine is sleep. Idle-time consolidation that reorganizes memory, fills knowledge gaps, and prunes what isn't working. Ideas bubble up the same way they do in people: from pattern matches in the substrate that rise into conscious view only when they are worth thinking about.

This is the consequence that matters. A mind with a well-developed subconscious accomplishes far more than a mind without one, regardless of raw intelligence. A concert pianist thinks about phrasing while their fingers handle the notes. A practiced driver holds a conversation while navigating traffic. The mind doesn't do less work, it does the work at the right layer. A small local model plugged into leOS inherits that same advantage. A 7B running entirely offline has access to every skill the substrate has ever learned: every successful tool chain, every cached response, every refined plan from past sessions. It performs well above its weight class because it no longer has to solve already-solved problems from scratch. The substrate remembers so the model doesn't have to.

Every interaction teaches the substrate something. Successful tool chains are saved as skeletons the planner reaches for next time, like procedural memory. Failed trajectories are recorded as displacements so similar tasks avoid the bad path, like aversive conditioning. Every novel LLM answer gets compressed into a reflex, and the next time the same kind of task arrives, the answer replays from geometric cache with no model call at all. One successful call to the LLM teaches the substrate to handle every similar task without calling the LLM again. The design target is that most work never reaches the conscious layer, exactly as it works in you.

Capability grows without retraining. Speed comes from reflexes caching familiar patterns. Accuracy comes from learning which tools actually succeed on which kinds of task. New skills emerge from combining existing skills, composed by the planner rather than authored by a human. When enough capability-gap signals cluster in the same region of embedding space, the substrate proposes a new tool built from parts it already has. Agents expand beyond their original designed purpose organically, as a function of use. A small model used for a year through leOS behaves nothing like the same model used in isolation.

When the conscious layer does need tools, leOS does not dump the whole catalogue into the context window. Every tool definition eats tokens. Load 200 tools and there is no room for thought. The substrate keeps the catalogue in an embedding-indexed registry and scores tools against the incoming task in three passes. Centroid culling against domain prototypes discards 80-90% instantly. Semantic plus keyword scoring picks the fine-grained survivors. A learned usage-history blend weighs the tools that succeeded on similar tasks before. The model receives only the 6-8 tools that matter, the way a person reaches for the right tool without searching every drawer in the house.

"Every capability in the system, from embed text to transcribe a YouTube video to query the SDSS catalog, is a typed atomic operation we call a bone. Bones compose into chains. Chains that work become skeletons. Skeletons become skills."

The FABRIK planner (borrowed from inverse kinematics in character animation) works backward from the desired output and forward from available inputs to assemble chains that achieve goals. Successful chains are saved as skeletons, pre-validated patterns reused at zero-LLM cost. Failed trajectories get recorded as displacements so the next similar task avoids the bad path. The library of known-good chains grows every interaction.

FORWARD → INPUT "YouTube URL" bone media_ingest bone whisper convergence bone summarize ← BACKWARD GOAL "KB article" what produces this? candidate kb_save chains meet in the middle · a skeleton is born
FABRIK · forward from typed inputs · backward from the goal · converge on a valid chain
The difference it makes

Look inside the context window.

Traditional agent harnesses pack every tool definition into the model's context at every turn. Two hundred tools eats most of the window and leaves the model drowning in options. leOS keeps the catalogue outside the context and hands the agent only what's relevant. The difference is visible at a glance.

TRADITIONAL HARNESS more tools · worse performance CONTEXT · 8K tokens 200 TOOL DEFINITIONS ~7,200 tokens · evaluated every turn Task + response · ~800 tokens PROBLEMS · full catalogue reloaded every turn · no memory across sessions · drowns in options · picks the wrong tool vs. leOS catalogue lives outside the window CONTEXT · 8K tokens 6–8 TOOLS · ~800 tokens selected semantically Task · work · results room to actually think ~7,200 tokens free WINS · full catalogue · semantically indexed · usage history informs selection · reflex arc can skip the model entirely
Same model · same 8K context · wildly different room to think
REFLEX ARC BYPASS ~80% of familiar patterns replay from geometric cache · microseconds · no LLM IN task embed CATALOGUE ~200 tools PASS 1 centroid cull 80–90% dropped ~30 PASS 2 semantic + keyword ~15 PASS 3 usage history blend SERVED 6–8 tools 200 tools indexed · 2 kernel calls to filter · agent sees 4% ~8,400 → ~2,760 tokens · ~67% of context freed for real work
Three-pass semantic filter · reflex arc bypass · the agent sees only what matters
02 · Architecture

Three layers, one hypersphere.

The system is cleanly split into a hardware-analog layer, a software layer, and a kernel that bridges them, with four CPU embedding processors feeding the whole stack and a semantic membrane exposing the inside to the world.

Layer 00
Browser
Three.js 3D desktop (with 2D canvas fallback) that acts as both the agent's perception layer and the human's observation window. Agents get visual tools, describe the screen, click, type, drag, so they can see what they're doing. The desktop is the spatial representation of embedding space itself: stored vectors, SDF regions, and knowledge density rendered as geometry.
HTTP / WebSocket
Layer 01
SDOL
The Semantic Driven Operation Layer. Variables are 768d vectors, comparison is cosine similarity, branching is routing to the nearest SDF region, assignment is SLERP interpolation. Hosts bone chains, the FABRIK planner, dynamic tool selection, scopes, plans, the task orchestrator, context assembly, and the full SDOL programming language with compiler + REPL.
Software · Python
Layer 02
Kernel
420+ instructions spanning vector math, storage, routing, apps, LLM escalation, filesystem, network, dreaming, scopes, plans, mathematics, the external I/O membrane, the dataset job engine, and full VSA computation primitives (BIND, BUNDLE, PERMUTE, RESONATOR_FACTORIZE).
Dispatch
Layer 03
LVM
The Latent Virtual Machine. The hardware layer. Spherical geometry, SDF regions, the displacement codec, embedding partitions, the reflex arc, the gravitational lens, the holographic cache, the living medium. This is where the system actually computes in vector space.
Hardware-analog
Layer 04
Embeddings
Four CPU-only models, nomic-embed-text (768d), nomic-embed-vision (768d), Qwen3-Embedding (1024d), and ImageBind (1024d, six modalities: vision, text, audio, depth, thermal, IMU). The Rosetta codec translates between 768d and 1024d via Procrustes alignment.
Perception
03 · How it Works

One loop, four moves.

The same cycle runs whether the agent is answering a question, writing code, analysing a chart, or ingesting a 50 GB astronomy catalog. Each pass leaves the system a little smarter than it found it.

01

Embed

The incoming task becomes a vector on the unit hypersphere. Literal strings are pre-computed at compile time, zero runtime cost.

02

Route

Three-pass tool selection: centroid culling, semantic scoring, learned history. Agent sees only the 6-8 tools that matter.

03

Plan

FABRIK searches backward from the goal and forward from available inputs. If a known skeleton matches (similarity ≥ 0.80), reuse before assembly.

04

Learn

Successful chains become skeletons. Failed trajectories get recorded. Idle time consolidates and repairs via the dreaming engine.

reflex arc 01 EMBED 02 ROUTE 03 PLAN 04 LEARN TASK IN SKELETON OUT feedback
The loop closes on itself · every run feeds the reflex arc at the centre
04 · The Sensory Cortex

Four models.
One shared perceptual field.

These four models aren't just listed in a config file. They work together as a system in ways that produce capabilities none of them has individually. Every model is open-weight and every one of them runs on CPU, no GPU required for the perceptual layer.

nomic-embed-text v1.5
Primary text space
The workhorse. Routing, classification, semantic search, tool selection, knowledge base, all land here. Open Apache-2.0 weights, Matryoshka-trained so you can truncate to 256d for speed without retraining. Fast on CPU, aggressively cached. This is the 768d lingua franca of the system.
nomic-embed-vision v1.5
Images in the text space
Critical trick: this model produces vectors in the same 768d space as nomic-embed-text. Embed an image and a sentence, cosine-compare them, and you get meaningful similarity with zero alignment step. CROSS_SEARCH queries the text store and the vision store with a single vector. Text-to- image and image-to-text search become the default.
Qwen3-Embedding 0.6B
Instruction-aware contextual text
Bigger, deeper, instruction-aware. The same raw text embedded with different instructions produces different vectors optimised for different retrieval contexts, "represent this market event for correlation with sentiment signals" vs. "find observations related to cascade events." Used for context stores, code-aware scoring, and the blackboard. The second voice in the opponent-channel duet.
ImageBind
Six modalities, one space
Meta's multimodal encoder that lands vision, text, audio, depth, thermal, and IMU in a single 1024d space. A sound, an image, a text caption, and a depth map of the same scene embed to nearby points. This is the mechanism behind searching a stellar spectrum with plain English, both go in the same space.

Emergent capabilities

Shared-space search

Cross-modal by default

Because nomic-text and nomic-vision emit into the same 768d space, searching images by text description (or text by image) is just cosine similarity. No separate index. No alignment layer. No cross-encoder. CROSS_SEARCH is one kernel instruction.

Rosetta codec

768d ⇄ 1024d translation

The nomic (768d) and Qwen/ImageBind (1024d) spaces are different geometries. The Rosetta codec learns a projection matrix between them via Procrustes alignment. Find the orthogonal W minimising ‖AW − B‖ over paired embeddings. Once calibrated, a displacement learned in one space broadcasts to all four models.

Opponent channels

Disagreement becomes signal

When two models embed the same content, decomposing their disagreement gives five channels: agreement, A-exclusive, B-exclusive, magnitude dispute, and the purple channel. Emergent information in neither model alone. Used for contradiction detection, ad filtering, semantic denoising, and divergence interrupts when the models see something fundamentally differently.

Synesthesia

Data becomes cross-modal

Any 768d nomic vector can be packed into a 16×16 RGB image (768 = 256 × 3) and re-embedded through ImageBind vision. No Rosetta projection needed, the vision encoder preserves local structure automatically. Numerical data becomes frequency sweeps, rhythms, chords, or OFDM-style spectrograms, then embeds through ImageBind audio. Different encoders surface different structural properties of the same data.

Giulio Tononi's Integrated Information Theory provides the design principle. A shared embedding space that every subsystem reads and writes has fundamentally higher Φ (integration) than a collection of independent modules reporting to a dashboard, the whole genuinely exceeds the sum of its parts.

05 · Two-Tier Compute

The intern and the swarm.

Most agent systems have exactly one model doing everything, and it blocks the whole system while it thinks. leOS runs a two-tier architecture: a lightweight intern model and an army of bots, all on CPU, in parallel with the main agent. Nothing ever competes for GPU memory.

The Intern · Qwen3 0.8B · CPU

A small model that never blocks.

The intern is a 0.8-billion parameter model running CPU-only with num_gpu=0. It's never user-facing. It's called via the kernel's ASSIST instruction, which checks the reflex arc first (maybe the answer is already cached) before invoking the model at all. When it does run, it processes at ~100-200 tokens/second, not fast by GPU standards, but free, because it never touches the GPU the main 9B model is using.

The intern handles work across 40+ modules:

  • Failure analysis. 2-3 sentence root-cause summaries injected into retry prompts so the main agent doesn't repeat mistakes
  • Deliverable summarization. Purposeful summaries at scope boundaries instead of blind truncation
  • Context compaction. Summarises older messages before the main model ever runs again
  • Decision tree evaluation. LLMNode questions cost 0-2 intern calls instead of burning main model context on routing logic
  • Lesson extraction. Compact statements from flagged learning experiences
  • Two-pass structured output. Main model generates in thinking mode, intern does the JSON pass so reasoning isn't constrained
  • Status companion. Answers user questions while the main agent is busy, routing through an embedding-classified decision tree and posting notes to a shared blackboard the main agent reads when it returns
The Bots · Pure CPU workers

The system's subconscious.

Bots run on schedules, monitor data sources, detect anomalies, and only escalate to an LLM when something genuinely needs language understanding. A bot cycle (perceive → evaluate → act) runs entirely on CPU. HTTP requests, file reads, embedding comparisons (~0.1ms each), threshold checks, regex patterns. The system can run dozens of bot cycles per minute without touching the main model.

Bots are assembled, not programmed. The factory combines reusable templates:

  • 14 perception types. perceive_web, perceive_api, perceive_rss, perceive_file, perceive_kb, perceive_partition, perceive_observation, perceive_kernel, perceive_diff, perceive_multi, perceive_port, and more
  • 18 action types. act_record, act_alert, act_kb, act_escalate, act_chain, act_ingest, act_spawn_bot, act_displace, act_emit, act_llm, and more
  • Scoped work containers. Parent-child scopes let agents spawn sub-work without polluting the parent's reasoning
  • Price watchers, channel monitors, KB gap scanners. All from the same primitives

The dreaming engine itself is a scoped agent: during idle time it operates in a System Self-Improvement scope, spawning child scopes for scope health review, capability audits, reflex optimisation, KB gap analysis, and context compaction. The system uses the same machinery to improve itself that it uses to do anything else.

06 · Novel Techniques

Mechanisms you won't find
anywhere else.

leOS borrows mathematics from character animation, cosmology, the demoscene, neuroscience, and video compression, and applies it directly to the embedding medium. These aren't metaphors. They're the same math on different data.

VSA · Turing-complete

Vector symbolic algebra

Three primitives, bundling (addition), binding (circular convolution via FFT, O(d log d)), and permutation (cyclic shift), form a Turing-complete computing framework (Kleyko et al., Proc. IEEE, 2022). The same three ops compose sets, sequences, trees, and graphs into a single fixed-width vector.

Displacement Codec · H.264 for cognition

Trajectories compress like video

Every task-to-response is recorded as a tangent vector on the hypersphere. Similar trajectories compress into shared I-frames, P-frames, and B-frames. The codec stores the pattern of transformation, not the output. Reconstructing a response costs a vector lookup.

Reflex Arc · Conformal bounds

The LLM stops running

When enough consistent displacements accumulate in a region (5+ by default), the reflex engine fires cached responses with conformal confidence bounds. Familiar patterns bypass the LLM entirely and replay from geometric cache in microseconds.

SDF Regions · from the demoscene

Semantic signed distance fields

Named ellipsoidal regions in embedding space define semantic boundaries using signed distance field math. Union is min(a,b), intersection is max(a,b), subtraction is max(a,−b). Arbitrarily complex semantic filters from trivial operations. The gradient gives a free "direction to nearest boundary" vector.

Gravitational Lens · Barnes-Hut

Queries bend toward competence

Dense SDF regions deflect nearby queries toward them, like light bending around a galaxy. Implemented with the Barnes-Hut tree. The same O(n log n) algorithm used for galactic N-body simulation. Frequently-used vectors exert more pull over time.

Holographic Cache · HRR

Many keys in one vector

Circular convolution stores multiple key-value pairs in one fixed-width vector: record = k₁⊗v₁ + k₂⊗v₂ + … + kₙ⊗vₙ. Retrieve with v_i ≈ k_i† ⊗ record. Based on Plate's Holographic Reduced Representations. Noise after 10-20 compositions is handled by the cleanup memory. Error correction analogous to digital systems.

Living Medium · mycelium model

A field that grows where you work

An information-density field modeled on mycorrhizal networks. Grows toward areas of activity via success-density feedback. Prunes neglected regions. Hub vectors become knowledge redistributors (inspired by Simard's mother tree research on scale-free mycorrhizal topology).

Residue Arithmetic · CRT

Integer math without an ALU

Via the Chinese Remainder Theorem. Pick coprime moduli (e.g. 7, 11, 13, 17, 19, 23, product ≈ 3.2M), assign random digit vectors, BIND them. Addition becomes binding. Comparison becomes cosine similarity. Integer math up to ~3.2M using only vector ops the embedding hardware already supports. Solves subset-sum via resonator networks.

Resonator Network · factorization

Pulling bindings apart

Given a composite vector s = x₁ ⊗ x₂ ⊗ … ⊗ xₖ and candidate codebooks, each factor estimate iteratively updates until convergence in 5-50 iterations. The inverse of VSA binding, the mechanism behind NP-hard search inside embedding space.

Predictive Coding · Friston

System 1 & System 2

Based on Karl Friston's active inference framework. A lightweight linear predictor estimates the expected output before running any agent. Confident → use prediction directly (System 1, fast). Uncertain → full LLM runs (System 2). Target ratio: ~80% of routine tasks handled without LLM inference.

Inception Hierarchy · MRL

Multi-resolution views

Same vector queried at multiple Matryoshka scales. 32d shows broad regions ("work", "media"), 128d shows subregions ("project notes", "code"), 1024d shows individual documents. Continuous landscape. Zooming costs nothing, same vector under different projections.

Activation Steering · compiled geometry

Steering vectors replace prompts

Run the intern on contrastive example pairs, compute the mean hidden-state difference, normalize. At inference, inject via forward hook: hidden += α · steering. One tensor addition per token, microseconds. Replaces fragile prompt engineering with compiled geometric subroutines.

Generation Probe · read the mind

Route before the model speaks

Before the intern generates a single token, a linear probe reads its hidden state after prefill and predicts REFLEX (cache hit), ASSIST (intern handles it), or ESCALATE (main model). Trained online from accumulated outcomes. Reaches 92%+ accuracy with use.

Void Detection · knowledge gaps

Maps of what isn't known

The KB void map probes the space between knowledge clusters. "Know Python, know async, but no article on Python async" is a void. When frequency crosses a threshold, the dreaming engine autonomously researches and fills these gaps during idle time.

Scopes · ephemeral context

Work containers with their own memory

Parent and child scopes let agents spawn sub-work without polluting the parent's reasoning. Only deliverables cross scope boundaries. The context assembler builds agent prompts from eight priority tiers, each with its own token budget.

Dreaming Engine · idle consolidation

Sleep for the substrate

Runs during idle time. Consolidates the displacement codec, runs void detection, grows and prunes the living medium, compacts stale scopes, audits bones, renders deferred thought monologues, generates reflections from unprocessed learning. The system uses itself to improve itself.

I-FRAME keyframe · full vector P-FRAMES sparse deltas B-FRAMES bidirectional STORAGE I-frame: 768 floats P-frame: ~30 deltas B-frame: bidirectional ~15× compression a conversation's trajectory across the hypersphere, compressed
Displacement codec · H.264 for cognition · stores the pattern of transformation not the output
DENSE REGION "crypto price lookups" high competence · exerts pull SPARSE low competence QUERY ambiguous straight path (no lens) bent by the lens BARNES-HUT O(n log n) · galaxy N-body math
Gravitational lens · dense SDF regions deflect ambiguous queries toward the region most competent to answer them
07 · Drift Detection

Catching hallucination
with astronomy.

leOS has a quality-control system that catches agents hallucinating, looping, or producing shallow non-answers, without making a single LLM call. Everything is pure vector geometry on the unit hypersphere. The core concept is borrowed from cosmological observation.

TASK embedding expected zone μ ± 1σ convergence ✓ on-task REDSHIFT > 1.8σ · receding BLUESHIFT < 1.5σ · echoing VOID unexplored
Task embedding at origin · displacement magnitude against neighborhood prediction decides drift class
→ diverging
Redshift

The response is receding.

The displacement vector, the tangent from task embedding to response embedding, computed via the logarithmic map, is unusually long compared to what the neighborhood predicts. In astronomy, redshift means an object is moving away from the observer.

In leOS, redshift means the response is semantically receding from the task. Drift, off-topic wander, confabulation, hallucination. The agent's mouth is working but it's answering a different question.

← converging
Blueshift

The response is collapsing in.

The displacement is suspiciously short, or task-response cosine similarity exceeds 0.92. In astronomy, blueshift means an object is approaching.

In leOS, blueshift means the response is echoing the task back in different words. A non-answer like "I'll do that!" or "Great question, let me think about it." Catches the failure mode of appearing to engage without producing output.

For each response, the detector queries the displacement log for the K most similar past tasks and computes the mean and standard deviation of their displacement magnitudes. If the actual displacement exceeds the prediction by more than 1.8σ: redshift. Below the prediction by 1.5σ: blueshift. Fewer than 3 similar past tasks: void. Unexplored territory. Pairwise cosine similarity above 0.85 across the last N responses flags a semantic loop even when the text differs.

The drift detector replaces LLM-based quality checking with pure math. It runs on every agent response automatically and costs zero tokens. The metaphor also drives the emotion parameters for voice synthesis, redshift produces uncertain delivery, convergence produces calm confidence, void produces a contemplative hush.

08 · Voice & Thought Canvas

Agents narrate their work.
In their own voice.

Drift detection, voice synthesis, visual rendering, and the knowledge base all connect into a single learning loop. A flagged learning experience becomes a narrated thought video. The video gets triple-embedded (vision, audio, text) and stored as a searchable KB article. Future agents find it by meaning and learn from past mistakes without anyone writing documentation.

ChatterboxTTS · zero-shot cloning

The voice.

ChatterboxTTS is a two-stage neural TTS (T3 autoregressive + S3Gen decoder). Voice cloning is zero-shot: provide 5-30 seconds of reference audio and the model matches timbre, pitch, and cadence. No fine-tuning. Reference audio can come from direct uploads, video extraction via ffmpeg, or URLs processed through yt-dlp.

Drift state drives emotion. The EmotionMapper converts the geometric drift classification into TTS parameters per line:

  • Redshift → uncertain: 0.85× speech rate, restrained exaggeration, pauses before speaking. Mild: [sniff]. Strong: [sigh].
  • Blueshift → excited: 1.15× rate, high exaggeration, flowing delivery. Mild: [chuckle]. Strong: [laugh].
  • Convergence → confident: normal pace, pauses after statements. Strong: [clears throat].
  • Void → contemplative: 0.75× rate, intimate exaggeration, pauses both sides. Strong: [gasp].

The paralinguistic tags are rendered by the same voice model that produces the speech, a [sigh] during redshift sounds like a real sigh from the speaker. Voice modulation also adjusts the TTS sampling itself: blueshift lowers min_p (more creative output), redshift raises it (more stable). The voice isn't just speaking differently, the model is generating differently. All output is watermarked with resemble-perth as AI-generated.

Thought Canvas · 2D Gaussian splats

The visual workspace.

The thought canvas is a 256×224 pixel numpy array (deliberately SNES-era resolution, the visual output is a byproduct of computation, not the point of it) where agents render 2D Gaussian splats while they work. Each splat is 8 floats: position, scale, rotation, color, opacity.

The renderer uses accumulated summation. For each pixel, the color is the sum of all splat contributions weighted by their gaussian falloff. This is order-independent: no z-sorting pass. Hundreds of splats at 256×224 render in single-digit milliseconds on CPU with numpy vectorization. The /thought page streams it live.

When the system ingests images, the SplatFitter decomposes them into splat representations, iterative optimization fits the gaussian parameters to a target image, then stores the parameters alongside the image's embedding. Over time this builds a learned mapping from concept-space to splat-space. An agent wanting to visualize a concept searches this cache for the nearest match, renders splats, embeds the result, and refines via perceptual feedback from nomic-vision. The system learns to draw by practicing.

The MonologueRenderer combines it all: canvas frames + ChatterboxTTS audio + EmotionMapper params → composite MP4 → triple-embed (vision + audio + text) → knowledge base article. Cross-modal search retrieves thought videos by query in any modality.

09 · Media Ingest

Every tool.
On every asset.

The media pipeline runs every applicable analyzer on incoming media and lands the results in embedding space. The philosophy: you don't know in advance what you'll want to search for, so extract everything, embed everything, and let the geometry sort out relevance later.

Images

Ten passes, one asset.

Every incoming image goes through every tool that might produce useful signal:

  • Thumbnail generation + metadata extraction
  • Universal upscaling (small images benefit all downstream ML)
  • YOLO classification. Top-5 predictions
  • YOLO object detection. Bounding boxes
  • YOLO segmentation. Pixel-level masks with area coverage
  • OpenCV face detection
  • Tesseract OCR with smart upscaling
  • Combined description generation
  • nomic-vision embedding (768d, cross-modal with text)
  • nomic-text embedding of the description
  • ImageBind embedding (1024d, cross-space divergence measured)
  • Auto-tagging from every analysis result
Videos · Audio

Transcribed, keyframed, embedded per-frame.

Videos: ffprobe metadata, multi-strategy keyframe extraction (scene-detect, low-threshold, timed fallback) with perceptual deduplication up to 30 frames, Whisper speech transcription, audio extraction. The entire image pipeline then runs on every extracted keyframe. Audio spectrograms via matplotlib.

Audio: metadata, Whisper transcription, spectrogram, ImageBind audio embedding.

The original video file is deleted after keyframe extraction to save disk, all the information survives in the embeddings and analysis records. A 2-hour video becomes 30 embedded keyframes, a full transcript, and an audio vector. All of it is cross-searchable.

All processing runs through a single-worker job queue so simultaneous submissions don't step on each other. YouTube and TikTok both route through yt-dlp automatically. The same MEDIA_INGEST kernel instruction handles URLs, files, streams, and uploads.

What this unlocks

Because every modality lands in a shared space, questions that usually require three different tools become one query. "Find the video clip where someone is explaining SDR with a red flag visible in frame" runs in a single cross-modal search: the text query embeds, the 30 keyframes per video embed, the transcripts embed, the audio embeds, and cosine similarity does the rest. No "video search API" required.

10 · The Membrane

A semantic boundary,
not an API gateway.

The temptation is to bolt a traditional API gateway onto leOS, a collection of REST endpoints that external things POST to. We didn't. The membrane treats every entry point as an embedding-space citizen with an intent vector, a topic region, and a subscription fan-out. Data arriving through the membrane gets perceived, embedded, evaluated, and published to whoever is listening in the correct semantic neighborhood.

This works equally well for an IoT temperature reading processed in milliseconds and a 50 GB FITS astronomy catalog processed over hours, streamed incrementally, and resumable across server restarts. The difference is the processing path, not the model.

Real-time ports

The small-data side.

Create a named port via POST /ports. Every port has an intent vector derived from its name + description, which lets agents and external systems discover it semantically, "what can leOS accept about genomics?" returns matching ports.

Push data with POST /in/<port_id> or streaming with /in/<port_id>/stream. The ingestion pipeline runs normalize → perceive → embed → evaluate → store → notify on every item. Port config decides which field to embed, which reference text to compare against for signal detection, and what partition to land in.

The subscription bus (SSE, webhooks, internal queues) fans out every event to whoever's listening, with topic wildcards and filters.

Dataset job engine

The big-data side.

Upload a file, create a job, watch it stream results. The reader never loads the full file, peak memory is one chunk plus the already-loaded embedding models.

Supported formats: .csv, .tsv, .jsonl, .fasta, .npy, .npz, .fits (astropy), .h5/.hdf5 (h5py), .parquet (pyarrow), plain text.

Jobs are resumable across restarts with checkpoint recovery. Results stream live, you don't wait for a 50 GB file to finish before seeing the first flagged row. Every output port exposes /rows, /download, /csv, /embeddings (as a numpy file), and /search (semantic search within the results, while the job is still running).

Embedding strategies for scientific data

imagebind_audio

Treat any 1D array as a waveform

A stellar spectrum, a protein expression profile, a seismic trace, an EEG channel, all become 1024d vectors via ImageBind's audio encoder. They now coexist in the same space as sounds, images, and text. This is the strategy that makes searching spectra by plain English description possible.

imagebind_image

2D slices into the shared space

Telescope images, microscopy slides, medical scans, FITS image HDUs. Each 2D array → 1024d ImageBind vector. Cross-queryable with text, audio, and any other embedded modality.

numeric_direct

Pure numerical vectors

For gene expression profiles, physics simulation states, financial time-series, anywhere the numbers themselves are the semantic content. Random projection into 768d, normalized to the unit hypersphere. Preserves Euclidean structure while entering embedding space.

row_to_text

Template-based conversion

Catalog rows become descriptive sentences via user templates: "galaxy ra {ra} dec {dec} redshift {z:.2f} type {class}". Embeds via nomic-text. Mixed-type tabular data where the meaning of each row matters.

Scientific use cases

Astronomy

SDSS spectra anomaly scan

Upload a FITS catalog. Port embeds each spectrum's flux array via ImageBind audio encoder. The flag_void_region analysis bone identifies spectra landing in low-density regions of the embedding space, anomalous observations that don't cluster with known types. No labeled training set required.

POST /outputs/<id>/search "broadened emission lines consistent with AGN outflow"
Genomics

FASTA sequence sorting

Stream a FASTA file sequence-by-sequence. Embed via ImageBind audio (sequences as waveforms) or numeric_direct. Void detection flags unusual sequences automatically. A bot wakes up, escalates the most unusual ones to the main LLM for interpretation, and writes the interpretations back to the knowledge base.

Physics · Time-series

Simulation state tracking

HDF5 snapshots via h5py, NumPy arrays via mmap. Same object at different time steps produces a displacement vector that encodes what changed and in what semantic direction. After 10,000 snapshots, the reflex arc has learned which regions correspond to which regimes, subsequent runs route through cache.

Chemistry · Biology

Cross-modal correlation

Because everything lands in the same space, a molecular descriptor (numeric_direct) and a paper abstract (Qwen3) and a crystal structure image (imagebind_image) are all cosine-comparable. Find papers related to a compound you've never seen before, by its properties, not its name.

MCP protocol

Claude Desktop, Cursor, Windsurf

leOS exposes an MCP server with tools for store, search, cross-modal search, embed, status, media-ingest, and arbitrary kernel execution. Any MCP-compatible client can use leOS as a remote brain with full cross-modal semantic search and the learning substrate behind it.

Self-learning adapters

Learn APIs by description

API_LEARN ingests an API spec and stores it as a reusable adapter. The spec itself gets embedded, so agents find the right adapter semantically. leOS publishes its own API as a learnable spec, other leOS instances can learn it and call it.

Traditional tools, pandas, numpy, scikit-learn, treat a million-row dataset as a matrix to be filtered by explicit rules. The researcher must know what they're looking for before they look. leOS treats the same dataset as a million points in semantic space. Anomaly detection requires no labeled training set; low-density regions of the embedding space are unusual by definition. The second million-row dataset of the same type processes faster than the first. That isn't incremental improvement, it's a fundamentally different relationship between a researcher and their data.

11 · The Learning Loop

Six mechanisms,
one growing substrate.

Every interaction feeds back into one or more of these systems. None require manual training. The substrate gets faster, smarter, and more knowledgeable automatically.

Mechanism 01

Displacement codec

Every task-to-response trajectory recorded as a tangent vector. Similar trajectories compress via I/P/B frames. The codec stores the pattern of transformation, not the output.

Mechanism 02

Reflex arc

Enough consistent displacements in a region graduate into cached responses with conformal confidence bounds. Familiar patterns bypass the LLM in microseconds.

Mechanism 03

Skeleton library

Successful bone chains become pre-validated patterns. FABRIK tries known skeletons first (similarity ≥ 0.80) before assembling anything new.

Mechanism 04

Tool-selection memory

Every session records which tools got used. Usage history feeds back as a 20% weight in scoring. The system learns that "PDF table extraction" reliably needs doc_query.

Mechanism 05

Skill assimilator

Tracks capability gaps, tasks where no tool scored well. Gap vectors cluster naturally. When a cluster crosses a frequency threshold, the system can generate a new tool from existing parts.

Mechanism 06

Self-extending instructions

When an LLM escalation succeeds on a novel task, the displacement compiler captures the trajectory and creates a permanent reflex entry. One successful call teaches the system to handle all similar tasks without LLM involvement.

12 · Peer-Reviewed

Not speculative.
Grounded in published work.

The approach leOS takes is built on recent work across several fields. The mathematical proofs exist and the experimental results are published.

01

VSAs are Turing complete

Kleyko, Davies, Frady, Kanerva et al., Proc. IEEE, 2022

Proved by emulating a (2,4) Turing machine and Rule 110 cellular automaton using only bundling, binding, and permutation. The emulated machine executed over 10⁹ error-free updates.

02

Coconut: reasoning without tokens

Meta AI, December 2024

LLM reasoning entirely in continuous latent space, outperforming chain-of-thought. Continuous thought vectors encode multiple alternative reasoning paths simultaneously. Breadth-first search natively in continuous space.

03

nGPT: the hypersphere pays off

NVIDIA, 2024

Constraining representations to unit norm and expressing transformations as hypersphere displacements produced a 10× training speedup.

04

Neural Field Turing Machine

Malhotra et al., August 2025

A differentiable, continuous-field computer. O(N) scaling with Turing completeness. Demonstrates cellular automata, PDE solving, and image refinement in one architecture.

05

Residue Hyperdimensional Computing

Kymn et al., Neural Computation, Jan 2025

Unified residue number systems with HD vectors. Addition and multiplication as separate binding operators. Resources scale only logarithmically with numeric range. Solves NP-complete subset-sum via resonator networks.

06

RenderFormer: neural pipeline

Microsoft Research, SIGGRAPH 2025

First model to learn a complete graphics pipeline without ray tracing or rasterization. Scenes as triangle tokens. Rendering is pure attention over embeddings.

07

LatentMAS: shared-vector collaboration

Zou et al., Princeton / Stanford, Nov 2025

LLM agents collaborating through shared continuous latent space achieve 14.6% higher accuracy, 70-84% fewer tokens, 4× faster inference. The shared space is the coordination mechanism.

08

LangSplat: CLIP on Gaussians

CVPR 2024 · LangSplatV2 NeurIPS 2025

Compresses 512d CLIP embeddings to 3d per Gaussian via scene-specific autoencoder. LangSplatV2 reaches 476 FPS for feature splatting, a 42× speedup. Points in a 3D scene carry natural-language meaning.

13 · Direction

Where we're taking it.

leOS is a single-developer project built in the open. The mission is to build the growing, adapting substrate that AI agents need to become genuinely capable, not a static tool library, but a living system that gets smarter, faster, and more knowledgeable with every interaction.

The current milestone is a clean public release: a first-time user should be able to install, launch, and complete real tasks, token reports, web research, scientific dataset ingestion, small app creation, without tripping on anything.

  1. Public release One-command install with everything automated. No manual setup steps for end users.
  2. Platform, not tool Enough exposed surface area that developers in any industry can build their own systems on top of leOS, embedding-indexed registries, plug-in bones, hostable services, MCP access, and the full membrane protocol.
  3. Continuous learning Agents stopped only by detected problems (loops, stalls, drift), never by time alone. Every task grows the skeleton library.
  4. Hosted-model support Once the substrate is fully tuned, wire in optional adapters for hosted LLMs. The local models remain the default. A hosted option unlocks several things at once: multiple concurrent agents running in parallel on genuinely heavy reasoning, the ability to distill the larger model's behaviour back into the substrate as new reflexes and skeletons (the small local model inherits what the big model figured out), and faster turnaround on the few queries that actually need that horsepower. Holding this back until the substrate is stable avoids burning inference tokens on a system that's still learning its own habits.
  5. Self-funding development Use LP fees from the associated Solana token to sustain solo development indefinitely, free of investor pressure or roadmap capture.
14 · Funding

The $leOS token.

leOS is free and open source. The work that keeps it free is paid for by liquidity-pool fees from the $leOS token on Solana. $100K starting market cap, concentrated liquidity on Raydium CLMM, three pairs running at once. Every swap feeds the development wallet. If you want to support the project, the most direct way is to trade the token, because fees come from volume, not price.

Live on Solana

Trade, don't
just hold.

Every swap on the official liquidity pools sends a share of fees to the development wallet. No subscription, no paywall, no VC pressure, no roadmap dictated by anyone's exit plan. Holding the token is fine, but holding alone doesn't generate fees. Trading does. Every buy, every sell, every rotation between the stable, $satfi, and SOL pairs produces fees, regardless of which direction the price moves. What the token costs matters far less than how often it changes hands.

Most community tokens launch around a $4-10K market cap on full-range AMM pools. With a standard x·y=k curve stretching liquidity from zero to infinity, roughly 90% of supply ends up purchased by the time the cap hits $150K, concentrating ownership in a few wallets before anyone else shows up. Teams paper over this with buybacks and bundle trades that burn capital just to push the number up.

$leOS launches at $100K directly, using Raydium's tick-based concentrated liquidity pools. The same capital provides roughly 10× deeper liquidity inside the active price range, so a realistic starting valuation holds organically from the first block. No capital waste, no supply-capture spiral, no stunts.

Ticker
$leOS
Chain
Solana
Supply
1,000,000,000
Launch MCap
$100,000
LP mechanism
Raydium CLMM
Contract
5xgsnby6P9zqGK71J7H4yJLxzqPvNbC7rDZxNzjHmj7e
Three pools, three purposes

One token, paired three different ways.

At launch $leOS runs three separate Raydium CLMM pools in parallel. Each pool serves a different job. Together they give the token a dollar reference, a long-horizon store-of-value correlation, and full native composability across every DEX, aggregator, and DeFi protocol on Solana.

Pool 01 · Stable

The dollar anchor.

$leOS paired with a stablecoin gives the token a reliable price reference in dollars. This is where conventional price discovery happens and where traders arriving from centralized exchanges land first.

The stable pair's concentrated range defines the realistic trading corridor. Capital deployed inside the tick range works at the full efficiency of CLMM mathematics instead of being smeared across a curve from $0 to $∞.

Pool 02 · $satfi

Bitcoin-aligned store of value.

$satfi is a Bitcoin Supply Shock Token on Solana paired with native wBTC. When it pumps, the token absorbs more wBTC into its liquidity pool, reducing circulating Bitcoin supply on Solana. LP yield funds a perpetual buy-and-burn targeting 21M tokens, Bitcoin's max supply.

Pairing $leOS directly with $satfi means a slice of every trade produces Bitcoin-correlated exposure with a long-horizon thesis attached, not just short-term price action. The two projects share a launch-mechanics philosophy deliberately.

Pool 03 · SOL

Native ecosystem access.

The SOL pair makes $leOS directly composable with every DEX, aggregator, and DeFi protocol on Solana. No bridge, no wrapper, no intermediary hop. Native liquidity, native speed.

For smart-order routing: the SOL pool gives Jupiter and Raydium's aggregator direct paths into $leOS from anywhere in the ecosystem. This prevents cold-start slippage, makes the token discoverable through Solana-native workflows from day one, and ensures fair pricing across the three pools via arbitrage.

Every swap across any of the three pools contributes fees to the development wallet that keeps leOS free and open source. Buy, sell, rotate between pairs, any of it works. The contract is verifiable on Solscan and tradeable through Jupiter, Raydium, and any Solana-native aggregator.

15 · Join in

Read the code.
Watch it grow.

leOS is free and open source, developed in public, and funded by liquidity-pool fees from the $leOS token. Clone the repo, run it locally, trade the token if you want to help keep the lights on. All three feed the same loop.