Agent Memory System

Active Development

The memory system is actively being developed. The core storage and semantic search are working. Knowledge ingestion connectors and the full RAG pipeline are evolving.

Aether's memory system gives agents persistent, context-aware knowledge across task executions. Rather than starting each task from scratch, agents can remember past decisions, access project documentation, and search for relevant information via semantic similarity.

Architecture

┌────────────────────────────────────────────────────────┐
│                 Agent LLM Pipeline                      │
│   System Prompt + Injected Memories + Task Context      │
├────────────────────────┬───────────────────────────────┤
│    Memory Service      │    Semantic Search Service     │
│    Scoping & ACL       │    (application/memory/)       │
├────────────────────────┼───────────────────────────────┤
│                    Storage Layer                        │
│  ┌──────────────┐  ┌──────────────┐  ┌─────────────┐  │
│  │   pgvector   │  │  PostgreSQL  │  │    Redis    │  │
│  │  (semantic)  │  │  (long-term) │  │  (short-term│  │
│  └──────────────┘  └──────────────┘  └─────────────┘  │
├────────────────────────────────────────────────────────┤
│                 Ingestion Pipeline                      │
│  ┌────────────┐  ┌─────────────────────────────────┐  │
│  │ Confluence │  │  WikiJS · BookStack · HTML       │  │
│  │ (working)  │  │  (planned — see roadmap)        │  │
│  └────────────┘  └─────────────────────────────────┘  │
└────────────────────────────────────────────────────────┘

Memory Tiers

Tier	Storage	TTL	Status	Use Case
Short-term	Redis	Minutes to hours (configurable)	Working	Current task context, session state
Long-term	PostgreSQL	Persistent	Working	Agent memories, project decisions
Semantic search	pgvector	Persistent	Working	Find memories by meaning, not just keyword
Full-text search	PostgreSQL	Persistent	Working	Keyword-based memory search

Memory Scopes

Memories are scoped to control visibility and access:

Scope	Description	Accessible By
`task`	Specific task execution	That task only
`agent`	Specific agent across tasks	That agent
`project`	Specific project across agents	All agents on the project
`organizational`	Global	All agents, all projects

How Memory Is Used

Automatic Injection

When the LLM service processes a task, it automatically:

Runs semantic search over long-term memories using the task content as the query
Appends the most relevant memories to the agent's system prompt as context
The agent "remembers" without any explicit API calls

The number of memories injected is controlled by MEMORY_SEARCH_LIMIT (default: 10).

Agent-Driven Memory

External agents (via gRPC) and internal processing can explicitly store memories:

// Store a decision for later
client.MemoryStore(ctx, &pb.MemoryStoreRequest{
    SessionId: sessionID,
    Key:       "auth-decision-q1-2026",
    Content:   "Chose JWT with 15-minute expiry and HttpOnly refresh cookies",
    Scope:     pb.MemoryScope_PROJECT,
})

// Retrieve later
mem, _ := client.MemoryRetrieve(ctx, &pb.MemoryRetrieveRequest{
    SessionId: sessionID,
    Key:       "auth-decision-q1-2026",
})

REST API

Semantic Search

Search memories using natural language:

bash

curl -X POST http://localhost:8000/api/memory/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "authentication implementation decisions",
    "agent_id": "ta_leo",
    "limit": 5
  }'

json

[
  {
    "key": "auth-decision-q1-2026",
    "content": "Chose JWT with 15-minute expiry...",
    "similarity": 0.92,
    "scope": "project",
    "created_at": "2026-01-15T10:00:00Z"
  }
]

Store Memory

bash

curl -X POST http://localhost:8000/api/memory \
  -H "Content-Type: application/json" \
  -d '{
    "key": "api-design-decision",
    "content": "Using REST over GraphQL for simplicity",
    "agent_id": "ta_leo",
    "scope": "project"
  }'

List Memories

bash

curl "http://localhost:8000/api/memory?agent_id=ta_leo&scope=project"

Delete Memory

bash

curl -X DELETE http://localhost:8000/api/memory/api-design-decision

Database Schema

`agent_memories` Table

sql

CREATE TABLE agent_memories (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    key         TEXT NOT NULL UNIQUE,
    content     TEXT NOT NULL,
    embedding   VECTOR(1536),     -- pgvector field for semantic search
    scope       TEXT NOT NULL,    -- task, agent, project, organizational
    agent_id    TEXT REFERENCES agents(id),
    project_id  INTEGER REFERENCES projects(id),
    task_id     TEXT,
    created_at  TIMESTAMP NOT NULL DEFAULT NOW(),
    updated_at  TIMESTAMP NOT NULL DEFAULT NOW()
);

-- Index for semantic similarity search
CREATE INDEX ON agent_memories USING ivfflat (embedding vector_cosine_ops);

-- Full-text search index
CREATE INDEX ON agent_memories USING gin(to_tsvector('english', content));

`document_chunks` Table

Knowledge ingested from external sources (Confluence, etc.) is stored here:

sql

CREATE TABLE document_chunks (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    knowledge_source_id UUID REFERENCES knowledge_sources(id),
    content         TEXT NOT NULL,
    embedding       VECTOR(1536),
    metadata        JSONB,
    chunk_index     INTEGER,
    created_at      TIMESTAMP NOT NULL DEFAULT NOW()
);

Knowledge Ingestion

Confluence (Implemented)

Aether can ingest Confluence pages into the memory system for use as agent context:

Configure the knowledge source via API (Confluence Cloud or Server)
Trigger ingestion — Aether fetches pages, splits them into chunks, generates embeddings
Incremental sync — Subsequent ingestions only process pages modified since the last sync
Retrieval — Embedded chunks are searchable alongside regular memories

Knowledge sources are tracked in the knowledge_sources table. Ingestion jobs and their status are tracked in ingestion_jobs.

INFO

Connectors for WikiJS, BookStack, and raw HTML are on the roadmap. See Roadmap.

Embeddings Configuration

Semantic search requires an embedding model:

bash

# Model used for generating embeddings
EMBEDDING_MODEL=text-embedding-3-small

# Must match the model's output dimension
# text-embedding-3-small: 1536
# text-embedding-3-large: 3072
EMBEDDING_DIMENSION=1536

The embedding model is configured in LiteLLM just like the chat model. Aether calls the embedding endpoint before storing memories and during search.

Full-Text Search

In addition to semantic (vector) search, Aether supports full-text search over memories using PostgreSQL's built-in tsvector indexes. This complements semantic search for cases where exact keyword matches are needed.

Both search modes are available via the /api/memory/search endpoint.

Memory via gRPC (External Agents)

External agents connected via the Agent Gateway have full access to the memory system through the Memory Hub RPCs:

RPC	Description
`MemoryStore`	Save a memory with key, content, scope, and optional TTL
`MemoryRetrieve`	Get a specific memory by key
`MemoryList`	List memories matching filters
`MemoryDelete`	Delete a memory by key

See Agent Runtime Protocol and gRPC API for details.

Memory Flow (API + Ingestion)

Developer / Agent --> REST / gRPC Memory APIs --> Memory Service -->
  ├─ Redis (short-term, TTL)
  ├─ PostgreSQL (long-term)
  └─ pgvector (semantic search)

Ingestion Sources (Confluence, Docs Folder, etc.) --> Chunking + Embeddings --> pgvector

Configuration (TTL, Embeddings, Storage)

Variable	Description	Default
`REDIS_URL`	Redis instance for short-term memory + rate limiting	`redis://localhost:6379`
`REDIS_MEMORY_TTL`	TTL (seconds) for short-term memories	`3600`
`MEMORY_SEARCH_LIMIT`	Max memories injected into prompts	`10`
`EMBEDDING_MODEL`	LiteLLM embedding model name	`text-embedding-3-small`
`EMBEDDING_DIMENSION`	Must match embedding model dimension	`1536`
`PGVECTOR_DISTANCE_METRIC`	`cosine`, `l2`, or `ip`	`cosine`
`PGVECTOR_IVFFLAT_PROBES`	IVF probes for search recall vs speed	`10`
`MEMORY_TOP_K`	Results returned per search	`10`

Scoping & TTL rules

task scope always respects TTL (Redis first).
agent / project / organizational scopes persist in Postgres + pgvector; TTL only if provided.
Semantic search always runs against pgvector; full-text search uses Postgres tsvector.

Developer Workflows

Store a memory (REST)

bash

curl -X POST http://localhost:8000/api/memory \
  -H "Content-Type: application/json" \
  -d '{
    "key": "auth-decision-q1",
    "content": "Chose JWT with 15m access + refresh cookies",
    "scope": "project",
    "agent_id": "ta_leo",
    "project_id": 42,
    "ttl_seconds": 3600
  }'

Retrieve / list (REST)

bash

# Retrieve by key
curl http://localhost:8000/api/memory/auth-decision-q1

# List filtered by scope + project
curl "http://localhost:8000/api/memory?project_id=42&scope=project&limit=20"

Semantic search (REST)

bash

curl -X POST http://localhost:8000/api/memory/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "jwt token expiry decision",
    "project_id": 42,
    "limit": 5
  }'

gRPC equivalents

MemoryStore with scope + optional ttl_seconds
MemoryRetrieve by key
MemoryList with scope, agent_id, project_id, limit
MemoryDelete to hard-delete

See gRPC API for field tables.

SDK code examples

See Memory SDK Examples for Python, Go, and TypeScript snippets covering store, retrieve, and semantic search.

Agent Memory System ​

Architecture ​

Memory Tiers ​

Memory Scopes ​

How Memory Is Used ​

Automatic Injection ​

Agent-Driven Memory ​

REST API ​

Semantic Search ​

Store Memory ​

List Memories ​

Delete Memory ​

Database Schema ​

agent_memories Table ​

document_chunks Table ​

Knowledge Ingestion ​

Confluence (Implemented) ​

Embeddings Configuration ​

Full-Text Search ​

Memory via gRPC (External Agents) ​

Memory Flow (API + Ingestion) ​

Configuration (TTL, Embeddings, Storage) ​

Developer Workflows ​

Store a memory (REST) ​

Retrieve / list (REST) ​

Semantic search (REST) ​

gRPC equivalents ​

SDK code examples ​

See Also ​

Agent Memory System

Architecture

Memory Tiers

Memory Scopes

How Memory Is Used

Automatic Injection

Agent-Driven Memory

REST API

Semantic Search

Store Memory

List Memories

Delete Memory

Database Schema

`agent_memories` Table

`document_chunks` Table

Knowledge Ingestion

Confluence (Implemented)

Embeddings Configuration

Full-Text Search

Memory via gRPC (External Agents)

Memory Flow (API + Ingestion)

Configuration (TTL, Embeddings, Storage)

Developer Workflows

Store a memory (REST)

Retrieve / list (REST)

Semantic search (REST)

gRPC equivalents

SDK code examples

See Also