Agent Memory System
Active Development
The memory system is actively being developed. The core storage and semantic search are working. Knowledge ingestion connectors and the full RAG pipeline are evolving.
Aether's memory system gives agents persistent, context-aware knowledge across task executions. Rather than starting each task from scratch, agents can remember past decisions, access project documentation, and search for relevant information via semantic similarity.
Architecture
┌────────────────────────────────────────────────────────┐
│ Agent LLM Pipeline │
│ System Prompt + Injected Memories + Task Context │
├────────────────────────┬───────────────────────────────┤
│ Memory Service │ Semantic Search Service │
│ Scoping & ACL │ (application/memory/) │
├────────────────────────┼───────────────────────────────┤
│ Storage Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌─────────────┐ │
│ │ pgvector │ │ PostgreSQL │ │ Redis │ │
│ │ (semantic) │ │ (long-term) │ │ (short-term│ │
│ └──────────────┘ └──────────────┘ └─────────────┘ │
├────────────────────────────────────────────────────────┤
│ Ingestion Pipeline │
│ ┌────────────┐ ┌─────────────────────────────────┐ │
│ │ Confluence │ │ WikiJS · BookStack · HTML │ │
│ │ (working) │ │ (planned — see roadmap) │ │
│ └────────────┘ └─────────────────────────────────┘ │
└────────────────────────────────────────────────────────┘Memory Tiers
| Tier | Storage | TTL | Status | Use Case |
|---|---|---|---|---|
| Short-term | Redis | Minutes to hours (configurable) | Working | Current task context, session state |
| Long-term | PostgreSQL | Persistent | Working | Agent memories, project decisions |
| Semantic search | pgvector | Persistent | Working | Find memories by meaning, not just keyword |
| Full-text search | PostgreSQL | Persistent | Working | Keyword-based memory search |
Memory Scopes
Memories are scoped to control visibility and access:
| Scope | Description | Accessible By |
|---|---|---|
task | Specific task execution | That task only |
agent | Specific agent across tasks | That agent |
project | Specific project across agents | All agents on the project |
organizational | Global | All agents, all projects |
How Memory Is Used
Automatic Injection
When the LLM service processes a task, it automatically:
- Runs semantic search over long-term memories using the task content as the query
- Appends the most relevant memories to the agent's system prompt as context
- The agent "remembers" without any explicit API calls
The number of memories injected is controlled by MEMORY_SEARCH_LIMIT (default: 10).
Agent-Driven Memory
External agents (via gRPC) and internal processing can explicitly store memories:
// Store a decision for later
client.MemoryStore(ctx, &pb.MemoryStoreRequest{
SessionId: sessionID,
Key: "auth-decision-q1-2026",
Content: "Chose JWT with 15-minute expiry and HttpOnly refresh cookies",
Scope: pb.MemoryScope_PROJECT,
})
// Retrieve later
mem, _ := client.MemoryRetrieve(ctx, &pb.MemoryRetrieveRequest{
SessionId: sessionID,
Key: "auth-decision-q1-2026",
})REST API
Semantic Search
Search memories using natural language:
curl -X POST http://localhost:8000/api/memory/search \
-H "Content-Type: application/json" \
-d '{
"query": "authentication implementation decisions",
"agent_id": "ta_leo",
"limit": 5
}'[
{
"key": "auth-decision-q1-2026",
"content": "Chose JWT with 15-minute expiry...",
"similarity": 0.92,
"scope": "project",
"created_at": "2026-01-15T10:00:00Z"
}
]Store Memory
curl -X POST http://localhost:8000/api/memory \
-H "Content-Type: application/json" \
-d '{
"key": "api-design-decision",
"content": "Using REST over GraphQL for simplicity",
"agent_id": "ta_leo",
"scope": "project"
}'List Memories
curl "http://localhost:8000/api/memory?agent_id=ta_leo&scope=project"Delete Memory
curl -X DELETE http://localhost:8000/api/memory/api-design-decisionDatabase Schema
agent_memories Table
CREATE TABLE agent_memories (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
key TEXT NOT NULL UNIQUE,
content TEXT NOT NULL,
embedding VECTOR(1536), -- pgvector field for semantic search
scope TEXT NOT NULL, -- task, agent, project, organizational
agent_id TEXT REFERENCES agents(id),
project_id INTEGER REFERENCES projects(id),
task_id TEXT,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);
-- Index for semantic similarity search
CREATE INDEX ON agent_memories USING ivfflat (embedding vector_cosine_ops);
-- Full-text search index
CREATE INDEX ON agent_memories USING gin(to_tsvector('english', content));document_chunks Table
Knowledge ingested from external sources (Confluence, etc.) is stored here:
CREATE TABLE document_chunks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
knowledge_source_id UUID REFERENCES knowledge_sources(id),
content TEXT NOT NULL,
embedding VECTOR(1536),
metadata JSONB,
chunk_index INTEGER,
created_at TIMESTAMP NOT NULL DEFAULT NOW()
);Knowledge Ingestion
Confluence (Implemented)
Aether can ingest Confluence pages into the memory system for use as agent context:
- Configure the knowledge source via API (Confluence Cloud or Server)
- Trigger ingestion — Aether fetches pages, splits them into chunks, generates embeddings
- Incremental sync — Subsequent ingestions only process pages modified since the last sync
- Retrieval — Embedded chunks are searchable alongside regular memories
Knowledge sources are tracked in the knowledge_sources table. Ingestion jobs and their status are tracked in ingestion_jobs.
INFO
Connectors for WikiJS, BookStack, and raw HTML are on the roadmap. See Roadmap.
Embeddings Configuration
Semantic search requires an embedding model:
# Model used for generating embeddings
EMBEDDING_MODEL=text-embedding-3-small
# Must match the model's output dimension
# text-embedding-3-small: 1536
# text-embedding-3-large: 3072
EMBEDDING_DIMENSION=1536The embedding model is configured in LiteLLM just like the chat model. Aether calls the embedding endpoint before storing memories and during search.
Full-Text Search
In addition to semantic (vector) search, Aether supports full-text search over memories using PostgreSQL's built-in tsvector indexes. This complements semantic search for cases where exact keyword matches are needed.
Both search modes are available via the /api/memory/search endpoint.
Memory via gRPC (External Agents)
External agents connected via the Agent Gateway have full access to the memory system through the Memory Hub RPCs:
| RPC | Description |
|---|---|
MemoryStore | Save a memory with key, content, scope, and optional TTL |
MemoryRetrieve | Get a specific memory by key |
MemoryList | List memories matching filters |
MemoryDelete | Delete a memory by key |
See Agent Runtime Protocol and gRPC API for details.
Memory Flow (API + Ingestion)
Developer / Agent --> REST / gRPC Memory APIs --> Memory Service -->
├─ Redis (short-term, TTL)
├─ PostgreSQL (long-term)
└─ pgvector (semantic search)
Ingestion Sources (Confluence, Docs Folder, etc.) --> Chunking + Embeddings --> pgvectorConfiguration (TTL, Embeddings, Storage)
| Variable | Description | Default |
|---|---|---|
REDIS_URL | Redis instance for short-term memory + rate limiting | redis://localhost:6379 |
REDIS_MEMORY_TTL | TTL (seconds) for short-term memories | 3600 |
MEMORY_SEARCH_LIMIT | Max memories injected into prompts | 10 |
EMBEDDING_MODEL | LiteLLM embedding model name | text-embedding-3-small |
EMBEDDING_DIMENSION | Must match embedding model dimension | 1536 |
PGVECTOR_DISTANCE_METRIC | cosine, l2, or ip | cosine |
PGVECTOR_IVFFLAT_PROBES | IVF probes for search recall vs speed | 10 |
MEMORY_TOP_K | Results returned per search | 10 |
Scoping & TTL rules
taskscope always respects TTL (Redis first).agent/project/organizationalscopes persist in Postgres + pgvector; TTL only if provided.- Semantic search always runs against pgvector; full-text search uses Postgres
tsvector.
Developer Workflows
Store a memory (REST)
curl -X POST http://localhost:8000/api/memory \
-H "Content-Type: application/json" \
-d '{
"key": "auth-decision-q1",
"content": "Chose JWT with 15m access + refresh cookies",
"scope": "project",
"agent_id": "ta_leo",
"project_id": 42,
"ttl_seconds": 3600
}'Retrieve / list (REST)
# Retrieve by key
curl http://localhost:8000/api/memory/auth-decision-q1
# List filtered by scope + project
curl "http://localhost:8000/api/memory?project_id=42&scope=project&limit=20"Semantic search (REST)
curl -X POST http://localhost:8000/api/memory/search \
-H "Content-Type: application/json" \
-d '{
"query": "jwt token expiry decision",
"project_id": 42,
"limit": 5
}'gRPC equivalents
MemoryStorewithscope+ optionalttl_secondsMemoryRetrievebykeyMemoryListwithscope,agent_id,project_id,limitMemoryDeleteto hard-delete
See gRPC API for field tables.
SDK code examples
See Memory SDK Examples for Python, Go, and TypeScript snippets covering store, retrieve, and semantic search.
See Also
- Memory & Knowledge (User Guide) — User-facing guide to managing agent memory
- gRPC API — Memory Hub RPCs
- REST API — Memory REST endpoints
