Skip to content

Agent Memory System

Active Development

The memory system is actively being developed. The core storage and semantic search are working. Knowledge ingestion connectors and the full RAG pipeline are evolving.

Aether's memory system gives agents persistent, context-aware knowledge across task executions. Rather than starting each task from scratch, agents can remember past decisions, access project documentation, and search for relevant information via semantic similarity.

Architecture

┌────────────────────────────────────────────────────────┐
│                 Agent LLM Pipeline                      │
│   System Prompt + Injected Memories + Task Context      │
├────────────────────────┬───────────────────────────────┤
│    Memory Service      │    Semantic Search Service     │
│    Scoping & ACL       │    (application/memory/)       │
├────────────────────────┼───────────────────────────────┤
│                    Storage Layer                        │
│  ┌──────────────┐  ┌──────────────┐  ┌─────────────┐  │
│  │   pgvector   │  │  PostgreSQL  │  │    Redis    │  │
│  │  (semantic)  │  │  (long-term) │  │  (short-term│  │
│  └──────────────┘  └──────────────┘  └─────────────┘  │
├────────────────────────────────────────────────────────┤
│                 Ingestion Pipeline                      │
│  ┌────────────┐  ┌─────────────────────────────────┐  │
│  │ Confluence │  │  WikiJS · BookStack · HTML       │  │
│  │ (working)  │  │  (planned — see roadmap)        │  │
│  └────────────┘  └─────────────────────────────────┘  │
└────────────────────────────────────────────────────────┘

Memory Tiers

TierStorageTTLStatusUse Case
Short-termRedisMinutes to hours (configurable)WorkingCurrent task context, session state
Long-termPostgreSQLPersistentWorkingAgent memories, project decisions
Semantic searchpgvectorPersistentWorkingFind memories by meaning, not just keyword
Full-text searchPostgreSQLPersistentWorkingKeyword-based memory search

Memory Scopes

Memories are scoped to control visibility and access:

ScopeDescriptionAccessible By
taskSpecific task executionThat task only
agentSpecific agent across tasksThat agent
projectSpecific project across agentsAll agents on the project
organizationalGlobalAll agents, all projects

How Memory Is Used

Automatic Injection

When the LLM service processes a task, it automatically:

  1. Runs semantic search over long-term memories using the task content as the query
  2. Appends the most relevant memories to the agent's system prompt as context
  3. The agent "remembers" without any explicit API calls

The number of memories injected is controlled by MEMORY_SEARCH_LIMIT (default: 10).

Agent-Driven Memory

External agents (via gRPC) and internal processing can explicitly store memories:

go
// Store a decision for later
client.MemoryStore(ctx, &pb.MemoryStoreRequest{
    SessionId: sessionID,
    Key:       "auth-decision-q1-2026",
    Content:   "Chose JWT with 15-minute expiry and HttpOnly refresh cookies",
    Scope:     pb.MemoryScope_PROJECT,
})

// Retrieve later
mem, _ := client.MemoryRetrieve(ctx, &pb.MemoryRetrieveRequest{
    SessionId: sessionID,
    Key:       "auth-decision-q1-2026",
})

REST API

Search memories using natural language:

bash
curl -X POST http://localhost:8000/api/memory/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "authentication implementation decisions",
    "agent_id": "ta_leo",
    "limit": 5
  }'
json
[
  {
    "key": "auth-decision-q1-2026",
    "content": "Chose JWT with 15-minute expiry...",
    "similarity": 0.92,
    "scope": "project",
    "created_at": "2026-01-15T10:00:00Z"
  }
]

Store Memory

bash
curl -X POST http://localhost:8000/api/memory \
  -H "Content-Type: application/json" \
  -d '{
    "key": "api-design-decision",
    "content": "Using REST over GraphQL for simplicity",
    "agent_id": "ta_leo",
    "scope": "project"
  }'

List Memories

bash
curl "http://localhost:8000/api/memory?agent_id=ta_leo&scope=project"

Delete Memory

bash
curl -X DELETE http://localhost:8000/api/memory/api-design-decision

Database Schema

agent_memories Table

sql
CREATE TABLE agent_memories (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    key         TEXT NOT NULL UNIQUE,
    content     TEXT NOT NULL,
    embedding   VECTOR(1536),     -- pgvector field for semantic search
    scope       TEXT NOT NULL,    -- task, agent, project, organizational
    agent_id    TEXT REFERENCES agents(id),
    project_id  INTEGER REFERENCES projects(id),
    task_id     TEXT,
    created_at  TIMESTAMP NOT NULL DEFAULT NOW(),
    updated_at  TIMESTAMP NOT NULL DEFAULT NOW()
);

-- Index for semantic similarity search
CREATE INDEX ON agent_memories USING ivfflat (embedding vector_cosine_ops);

-- Full-text search index
CREATE INDEX ON agent_memories USING gin(to_tsvector('english', content));

document_chunks Table

Knowledge ingested from external sources (Confluence, etc.) is stored here:

sql
CREATE TABLE document_chunks (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    knowledge_source_id UUID REFERENCES knowledge_sources(id),
    content         TEXT NOT NULL,
    embedding       VECTOR(1536),
    metadata        JSONB,
    chunk_index     INTEGER,
    created_at      TIMESTAMP NOT NULL DEFAULT NOW()
);

Knowledge Ingestion

Confluence (Implemented)

Aether can ingest Confluence pages into the memory system for use as agent context:

  1. Configure the knowledge source via API (Confluence Cloud or Server)
  2. Trigger ingestion — Aether fetches pages, splits them into chunks, generates embeddings
  3. Incremental sync — Subsequent ingestions only process pages modified since the last sync
  4. Retrieval — Embedded chunks are searchable alongside regular memories

Knowledge sources are tracked in the knowledge_sources table. Ingestion jobs and their status are tracked in ingestion_jobs.

INFO

Connectors for WikiJS, BookStack, and raw HTML are on the roadmap. See Roadmap.

Embeddings Configuration

Semantic search requires an embedding model:

bash
# Model used for generating embeddings
EMBEDDING_MODEL=text-embedding-3-small

# Must match the model's output dimension
# text-embedding-3-small: 1536
# text-embedding-3-large: 3072
EMBEDDING_DIMENSION=1536

The embedding model is configured in LiteLLM just like the chat model. Aether calls the embedding endpoint before storing memories and during search.

In addition to semantic (vector) search, Aether supports full-text search over memories using PostgreSQL's built-in tsvector indexes. This complements semantic search for cases where exact keyword matches are needed.

Both search modes are available via the /api/memory/search endpoint.

Memory via gRPC (External Agents)

External agents connected via the Agent Gateway have full access to the memory system through the Memory Hub RPCs:

RPCDescription
MemoryStoreSave a memory with key, content, scope, and optional TTL
MemoryRetrieveGet a specific memory by key
MemoryListList memories matching filters
MemoryDeleteDelete a memory by key

See Agent Runtime Protocol and gRPC API for details.

Memory Flow (API + Ingestion)

Developer / Agent --> REST / gRPC Memory APIs --> Memory Service -->
  ├─ Redis (short-term, TTL)
  ├─ PostgreSQL (long-term)
  └─ pgvector (semantic search)

Ingestion Sources (Confluence, Docs Folder, etc.) --> Chunking + Embeddings --> pgvector

Configuration (TTL, Embeddings, Storage)

VariableDescriptionDefault
REDIS_URLRedis instance for short-term memory + rate limitingredis://localhost:6379
REDIS_MEMORY_TTLTTL (seconds) for short-term memories3600
MEMORY_SEARCH_LIMITMax memories injected into prompts10
EMBEDDING_MODELLiteLLM embedding model nametext-embedding-3-small
EMBEDDING_DIMENSIONMust match embedding model dimension1536
PGVECTOR_DISTANCE_METRICcosine, l2, or ipcosine
PGVECTOR_IVFFLAT_PROBESIVF probes for search recall vs speed10
MEMORY_TOP_KResults returned per search10

Scoping & TTL rules

  • task scope always respects TTL (Redis first).
  • agent / project / organizational scopes persist in Postgres + pgvector; TTL only if provided.
  • Semantic search always runs against pgvector; full-text search uses Postgres tsvector.

Developer Workflows

Store a memory (REST)

bash
curl -X POST http://localhost:8000/api/memory \
  -H "Content-Type: application/json" \
  -d '{
    "key": "auth-decision-q1",
    "content": "Chose JWT with 15m access + refresh cookies",
    "scope": "project",
    "agent_id": "ta_leo",
    "project_id": 42,
    "ttl_seconds": 3600
  }'

Retrieve / list (REST)

bash
# Retrieve by key
curl http://localhost:8000/api/memory/auth-decision-q1

# List filtered by scope + project
curl "http://localhost:8000/api/memory?project_id=42&scope=project&limit=20"

Semantic search (REST)

bash
curl -X POST http://localhost:8000/api/memory/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "jwt token expiry decision",
    "project_id": 42,
    "limit": 5
  }'

gRPC equivalents

  • MemoryStore with scope + optional ttl_seconds
  • MemoryRetrieve by key
  • MemoryList with scope, agent_id, project_id, limit
  • MemoryDelete to hard-delete

See gRPC API for field tables.

SDK code examples

See Memory SDK Examples for Python, Go, and TypeScript snippets covering store, retrieve, and semantic search.

See Also

Released under the MIT License.