Spec 004: Storage Model And Object Graph

Status: Draft Date: 2026-05-27 Project: Glyph

Thesis

Glyph needs a canonical object graph that can represent source, work, visibility, policy, and provenance without inheriting Git’s object model. The storage model should be content-addressed where useful, append-friendly for auditability, and flexible enough to support projections, work contexts, and translators.

Problem

Git stores blobs, trees, commits, and refs. That model is excellent for distributed snapshots, but Glyph needs additional native concepts:

Realms
Projections
Work contexts
Automatic snapshots
Visibility labels
Publication events
Agent provenance
Policy decisions
Git import/export metadata

Trying to encode these as branches, commits, refs, and commit messages would make Glyph a Git wrapper. Glyph needs its own object graph.

Goals

Define the major object types Glyph stores.
Separate content storage from visibility and work metadata.
Support efficient projections.
Support audit records for import, publication, and policy changes.
Allow local-first prototypes.
Leave room for future distributed sync.

Non-Goals

Selecting the final database technology.
Defining cryptographic signing in detail.
Defining distributed consensus.
Optimizing for huge monorepos in the first prototype.
Finalizing wire formats.

Object Types

Content Object

A content object stores bytes and basic content metadata. Fields:

Object ID
Content hash
Byte length
Media type or detected kind
Optional encoding metadata

Content objects do not by themselves imply visibility.

Tree Object

A tree object maps paths to content objects or other tree objects. Fields:

Tree ID
Entries
Parent tree references where useful
Generated or materialized status

Source Object

A source object is a policy-aware unit of source identity. It can represent:

File
Directory
Generated artifact
External import
Metadata object

Fields:

Source object ID
Current content or tree reference
Path identity
Labels
Ownership metadata
Provenance links

External Mount Object

An external mount object records a source root mounted at a path inside another source graph. Fields:

Mount ID
Mount path
Source type
Remote origin
Pinned revision or source graph reference
Import policy
Export policy
Allowed realms
Local write policy

External mount objects preserve the boundary between a parent project and a separately maintained dependency or subproject.

Realm Primitive Object

A realm object defines a named projection policy. Fields:

Realm ID
Name
Description
Policy reference
Grants
Default labels
Publication rules

Work Context Object

A work context object records active or historical work. Fields:

Work context ID
Owner identity
Agent or human provenance
Base projection
Overlay references
Snapshots
Intent
Status

Snapshot Object

A snapshot object records a work context state at a point in time. Fields:

Snapshot ID
Work context ID
Tree reference
Timestamp
Trigger
Actor
Parent snapshot references

Publication Object

A publication object records visibility widening or source promotion. Fields:

Publication ID
Source realm
Destination realm
Included objects
Excluded objects
Actor
Review approvals
Policy checks
Timestamp
Translator outputs

Policy Object

A policy object records versioned access and publication rules. Fields:

Policy ID
Policy document
Version
Author
Effective time
Superseded policy reference

Provenance Object

A provenance object records where work or content came from. Examples:

Human edit
Agent write
Git import
Generated file
Refactoring tool
Package update

Storage Invariants

Content is not visibility Possessing bytes in storage does not mean they are visible in a realm.
Policies are versioned Policy changes are stored as objects with history.
Publication is auditable Every visibility-widening operation creates a publication object.
Imports are honest Imported Git data is provenance, not native Glyph truth unless adopted through realm objects.
Objects are addressable Important objects have stable IDs suitable for audit, references, and translators.
Materialized projections are disposable Filesystem checkouts and Git exports can be regenerated from canonical objects.

Storage Decisions

Object IDs should use typed hash-based identifiers by default, such as content:sha256:..., tree:sha256:..., and work:.... Content-like objects should be content-addressed. Event-like objects may include typed generated IDs plus content hashes for integrity. Glyph should borrow Git’s plumbing discipline without inheriting Git’s porcelain model. The useful lesson from “Write Yourself a Git” is that a small version-control core can be built from simple, inspectable primitives: repository discovery, object read/write, content hashing, tree materialization, reference-like pointers, status comparisons, and export commands. Glyph should keep similarly sharp plumbing commands for agents and debugging, while replacing commits, branches, staging, and refs with work contexts, realms, snapshots, publications, and policy. The canonical object graph should store complete file states logically, not patch chains. Diffs are derived review artifacts and may be cached, but source reconstruction should not depend on replaying diffs. Complete logical snapshots must not mean physically copying every file every time. Snapshot and tree objects should store path-to-object mappings, while file bytes are stored as content-addressed objects and deduplicated by hash. If two snapshots reference identical file content, both snapshots point to the same content object. For large files, Glyph should support chunked content objects so an update to one region does not require storing another full copy of the file. The first prototype may begin with whole-file content addressing and add chunking behind the same content object abstraction later. Path identity should survive renames as a stable source object. Paths can change, but the source object remains the continuity anchor. Policies should be stored as source graph objects and can be visible through realms according to policy. Public projects should normally expose public policy, while private policy details may remain restricted. The first prototype should make policy objects, publication objects, audit events, and imported provenance append-only. Source content can be superseded by newer objects rather than mutated. Local storage should use SQLite as the canonical local source-control database. In Git terms, .glyph/store.db is the closest equivalent to the .git directory’s metadata and object graph: it records source objects, realms, work contexts, snapshots, publications, remotes, mounts, schema version, and indexes. Large byte content may live as content-addressed files beside the SQLite database under .glyph/content/, but those blobs are still part of the .glyph store and are addressed from SQLite. Audit events live under .glyph/audit/ as append-only JSONL so they remain easy to inspect and stream. Large binary files should be represented as content objects with metadata and optional chunking. The first prototype can store them as external content-addressed blobs and avoid diffing them deeply. Glyph should keep an index-like table for fast status and projection operations, but it should not expose a staging area as a user primitive. Unlike Git’s index, Glyph’s index is an implementation detail for answering questions such as:

Which paths changed in a workspace projection?
Which content hashes belong to this snapshot?
Which source objects differ between a workspace and its base realm?
Which workspace projections are stale, conflicting, or ready to publish?

The index may represent incomplete or conflicted states. Published tree and realm projections must remain complete, unambiguous states.

Plumbing Commands

Glyph should expose a small set of low-level commands for debugging, agent integration, and compatibility testing. These commands should be stable enough for tooling but clearly lower-level than the human workflow. Candidate plumbing commands:

glyph object hash <path> --json
glyph object read <id> --json
glyph object write --type content --json < file
glyph tree read <id> --json
glyph tree materialize <id> --out ./dir --json
glyph store find --json
glyph store fsck --json

These mirror the educational shape of Git plumbing such as object hashing, object inspection, tree reading, checkout/materialization, and repository discovery. They should operate on Glyph objects and policies, not Git objects.

Prototype Storage

The first prototype can use a simple local directory store:

.glyph/
  store.db
  manifest.yaml
  objects/
  content/
  policies/
  work/
  realms/
  publications/
  audit/

This directory is not Git. It is Glyph’s initial local source-control database. SQLite is the canonical local index and graph store; generated filesystem projections and Git exports are disposable materializations. The exact layout may change, but the prototype should keep canonical state separate from materialized workspaces.

Success Criteria

This spec is successful if a prototype can:

Store file content separately from visibility policy.
Create source, realm, work context, snapshot, and publication objects.
Create external mount objects for linked repositories.
Regenerate a public projection from stored objects.
Record a policy change.
Record an agent edit as provenance.
Export public state without reading hidden objects directly.

​Thesis

​Problem

​Goals

​Non-Goals

​Object Types

​Content Object

​Tree Object

​Source Object

​External Mount Object

​Realm Primitive Object

​Work Context Object

​Snapshot Object

​Publication Object

​Policy Object

​Provenance Object

​Storage Invariants

​Storage Decisions

​Plumbing Commands

​Prototype Storage

​Success Criteria