AI Agentic Engine: Data Dictionary

This document defines the schema, logic, and purpose for every entity in the AI Engine. It is designed to support the Tri-State Architecture (Pipeline, Orchestrator, Persona), JIT Execution, and Strict Compliance Versioning.

1. Core Assets (The Building Blocks)

`Engine_KnowledgeBase`

Represents a collection of documents used for RAG (Retrieval Augmented Generation).

id (UUID): Primary Key.
name (String): Display name (e.g., "AML Policy Documents").
vector_provider (Enum): The backend service hosting the vectors (e.g., PINECONE, QDRANT, PGVECTOR).
embedding_model (String): The specific model used to embed the text (e.g., text-embedding-3-small). Critical: You cannot mix models in one index.
retrieval_config (JSON): Default settings for querying this base (e.g., { "top_k": 5, "threshold": 0.7 }).

`Engine_Document`

Represents a single file inside a Knowledge Base.

id (UUID): Primary Key.
kb_id (UUID): Foreign Key to Engine_KnowledgeBase.
source_url (String): The S3 path or original URL of the uploaded file.
status (Enum): INDEXING | READY | ERROR. Used to show UI spinners.
token_count (Int): Used for billing storage costs.

`Engine_Tool` (Container)

The parent container for a capability. Does not hold code, only identity.

id (UUID): Primary Key.
name (String): Unique identifier (e.g., company_house_search).
type (Enum):
- CODE: Internal JavaScript/TypeScript executed in a sandbox.
- API: External HTTP call defined by OpenAPI spec.
- MCP: Connection to a Model Context Protocol server.

`Engine_ToolVersion` (Immutable Snapshot)

The actual logic of a tool. Versioned to prevent breaking flows.

id (UUID): Primary Key.
tool_id (UUID): Foreign Key to Engine_Tool.
version (Int): Sequential version number.
status (Enum): DRAFT (Mutable) | PUBLISHED (Immutable).
input_schema (JSON): Zod/JSON Schema defining arguments (e.g., { "company_number": "string" }). Passed to the LLM for function calling.
source_code (String): The executable logic (if type=CODE).
api_spec (String): The OpenAPI JSON (if type=API).
mcp_endpoint (String): The URL of the MCP server (if type=MCP).

`Engine_ToolSecret`

Maps abstract keys in the code to environment variables.

id (UUID): Primary Key.
tool_version_id (UUID): Foreign Key to Engine_ToolVersion.
key_reference (String): The variable name used in the code (e.g., API_KEY). The runner looks this up in the secure vault at runtime.

2. Agent Definition (The "Brain")

`Engine_Agent` (Container)

The parent identity of an AI Worker (e.g., "Kira").

id (UUID): Primary Key.
name (String): Display name.

`Engine_AgentVersion` (Immutable Snapshot)

The configuration of the AI at a specific point in time.

id (UUID): Primary Key.
agent_id (UUID): Foreign Key to Engine_Agent.
version (Int): Sequential version number.
status (Enum): DRAFT | PUBLISHED.
builder_type (Enum): FORM (Simple) | FLOW (Built via Type C Flow).
source_flow_id (UUID): If builder_type=FLOW, links to the definition.
model_provider (String): e.g., OPENAI, ANTHROPIC.
model_name (String): e.g., gpt-4-turbo.
temperature (Float): 0.0 to 1.0. Controls randomness.
max_tokens (Int): Hard limit on output length.
system_prompt (String): The compiled instructions.
knowledge_base_id (UUID): Optional link to RAG data.
response_format (JSON): JSON Schema for Structured Output (e.g., force the agent to return valid JSON).

`Engine_AgentToolLink`

Many-to-Many relationship defining which tools an agent can use.

agent_version_id (UUID): FK.
tool_version_id (UUID): FK.

3. Flow Definition (The Architecture)

`Engine_Flow` (Container)

The parent container for a process (e.g., "Vera Due Diligence").

id (UUID): Primary Key.
name (String): Display name.
type (Enum):
- PIPELINE (Type A): Linear, no loops.
- ORCHESTRATOR (Type B): State machine, loops allowed.
- PERSONA (Type C): Compiles to System Prompt.

`Engine_FlowTrigger`

Defines how a flow starts.

id (UUID): Primary Key.
flow_id (UUID): FK.
type (Enum): WEBHOOK | SCHEDULE | MANUAL | EVENT.
cron_expression (String): Standard Cron syntax (e.g., 0 9 * * 1).
webhook_slug (String): The URL path component (e.g., /api/hooks/v1/{slug}).
input_mapping (JSON): Maps incoming webhook payload fields to Flow Input Variables.

`Engine_FlowVariable`

Environment-specific configuration constants.

id (UUID): Primary Key.
flow_id (UUID): FK.
key (String): Variable name (e.g., RISK_THRESHOLD).
value (String): The value (e.g., 80).
environment (Enum): DEV | PROD. Allows testing different settings without changing the graph.

`Engine_FlowVersion` (The Executable Graph)

id (UUID): Primary Key.
flow_id (UUID): FK.
version (Int): Sequential number.
status (Enum): DRAFT | PUBLISHED.
input_schema (JSON): Defines what data is required to start this flow. Used to auto-generate UI forms.
output_schema (JSON): Defines the guaranteed shape of the final result.
compiled_graph (JSON): The optimized adjacency list used by the JIT Runner. Strips out UI metadata.

`Engine_Node` (The Step)

A single unit of work in the graph.

id (UUID): Primary Key.
version_id (UUID): FK to FlowVersion.
label (String): UI Label.
type (Enum):
- AGENT: Calls an LLM.
- TOOL: Calls a specific tool directly.
- SUBFLOW: Triggers another Flow.
- ROUTER: Semantic (LLM) decision.
- LOGIC: Deterministic (Code) decision.
- HANDOVER: Pauses for user interaction.
- WEBHOOK_WAIT: Pauses for external event.
ref_agent_version_id (UUID): FK. If set, uses a specific version of a reusable Agent.
ref_tool_version_id (UUID): FK. If set, uses a specific version of a reusable Tool.
inline_system_prompt (String): If no reference, defines the prompt here.
router_rules (JSON): For Semantic Routers. Maps intents to Target Node IDs.
logic_conditions (JSON): For Logic Gates. JS expressions (e.g., input.score > 50).
wait_event_slug (String): For WEBHOOK_WAIT. The event name to listen for.
input_map (JSON): Maps outputs from previous nodes to inputs of this node (e.g., {{node_1.output}}).

`Engine_Edge` (The Connection)

id (UUID): Primary Key.
source_node_id (UUID): FK.
target_node_id (UUID): FK.
type (Enum):
- DEFAULT: Standard path.
- SEMANTIC: Chosen by LLM Router.
- CONDITIONAL: Chosen by Logic Gate.
- ERROR: Followed if the Source Node fails/crashes.
condition_value (String): The value that triggers this path (e.g., "High Risk" or "true").

4. Execution (The Runtime)

`Engine_Execution` (The Job)

id (UUID): Primary Key.
flow_version_id (UUID): FK. Links to the specific version used.
trigger_id (UUID): FK. Which trigger started this?
status (Enum): RUNNING | SLEEPING | PAUSED | COMPLETED | FAILED.
session_id (String): External ID for grouping (e.g., User Session).
global_memory (JSON): The "State Object." Accumulates data as the flow runs.

`Engine_StepState` (The Checkpoint)

id (UUID): Primary Key.
execution_id (UUID): FK.
current_node_id (String): The ID of the node that just finished or is about to run.
status (Enum): ACTIVE | AWAITING_CALLBACK | AWAITING_USER.
node_outputs (JSON): A map of every node ID to its result. Used for "Time Travel" debugging.
stack_trace (JSON): If failed, the error stack.

`Engine_RuntimeArtifact` (Generated Files)

id (UUID): Primary Key.
execution_id (UUID): FK.
node_id (UUID): FK. Which node created this file?
filename (String): e.g., due_diligence_report.pdf.
storage_path (String): S3 Key.
mime_type (String): e.g., application/pdf.

`Engine_StepLog` (Billing & Debugging)

id (UUID): Primary Key.
execution_id (UUID): FK.
node_id (UUID): FK.
input_tokens (Int): Tokens sent to LLM.
output_tokens (Int): Tokens received from LLM.
cost_usd (Float): Calculated cost of this step.
duration_ms (Float): Execution time.
provider_response_id (String): OpenAI/Anthropic Request ID (for tracing).

5. Interaction (Handover)

`Engine_ChatSession`

id (UUID): Primary Key.
execution_id (UUID): FK.
mode (Enum):
- CO_PILOT: Iterative. User can trigger re-runs.
- ANALYST: Consultative. Read-only report, Q&A only.
is_active (Boolean): True if the session is currently open.

`Engine_ChatMessage`

id (UUID): Primary Key.
session_id (UUID): FK.
role (Enum): USER | ASSISTANT | SYSTEM | TOOL.
content (String): The message text.
tool_calls (JSON): If the agent used a tool (e.g., "Re-run Node"), the call details are stored here.

1. Core Assets (The Building Blocks)​

Engine_KnowledgeBase​

Engine_Document​

Engine_Tool (Container)​

Engine_ToolVersion (Immutable Snapshot)​

Engine_ToolSecret​

2. Agent Definition (The "Brain")​

Engine_Agent (Container)​

Engine_AgentVersion (Immutable Snapshot)​

Engine_AgentToolLink​

3. Flow Definition (The Architecture)​

Engine_Flow (Container)​

Engine_FlowTrigger​

Engine_FlowVariable​

Engine_FlowVersion (The Executable Graph)​

Engine_Node (The Step)​

Engine_Edge (The Connection)​

4. Execution (The Runtime)​

Engine_Execution (The Job)​

Engine_StepState (The Checkpoint)​

Engine_RuntimeArtifact (Generated Files)​

Engine_StepLog (Billing & Debugging)​

5. Interaction (Handover)​

Engine_ChatSession​

Engine_ChatMessage​