Skip to main content

AI Agentic Engine: Data Dictionary

This document defines the schema, logic, and purpose for every entity in the AI Engine. It is designed to support the Tri-State Architecture (Pipeline, Orchestrator, Persona), JIT Execution, and Strict Compliance Versioning.


1. Core Assets (The Building Blocks)

Engine_KnowledgeBase

Represents a collection of documents used for RAG (Retrieval Augmented Generation).

  • id (UUID): Primary Key.
  • name (String): Display name (e.g., "AML Policy Documents").
  • vector_provider (Enum): The backend service hosting the vectors (e.g., PINECONE, QDRANT, PGVECTOR).
  • embedding_model (String): The specific model used to embed the text (e.g., text-embedding-3-small). Critical: You cannot mix models in one index.
  • retrieval_config (JSON): Default settings for querying this base (e.g., { "top_k": 5, "threshold": 0.7 }).

Engine_Document

Represents a single file inside a Knowledge Base.

  • id (UUID): Primary Key.
  • kb_id (UUID): Foreign Key to Engine_KnowledgeBase.
  • source_url (String): The S3 path or original URL of the uploaded file.
  • status (Enum): INDEXING | READY | ERROR. Used to show UI spinners.
  • token_count (Int): Used for billing storage costs.

Engine_Tool (Container)

The parent container for a capability. Does not hold code, only identity.

  • id (UUID): Primary Key.
  • name (String): Unique identifier (e.g., company_house_search).
  • type (Enum):
    • CODE: Internal JavaScript/TypeScript executed in a sandbox.
    • API: External HTTP call defined by OpenAPI spec.
    • MCP: Connection to a Model Context Protocol server.

Engine_ToolVersion (Immutable Snapshot)

The actual logic of a tool. Versioned to prevent breaking flows.

  • id (UUID): Primary Key.
  • tool_id (UUID): Foreign Key to Engine_Tool.
  • version (Int): Sequential version number.
  • status (Enum): DRAFT (Mutable) | PUBLISHED (Immutable).
  • input_schema (JSON): Zod/JSON Schema defining arguments (e.g., { "company_number": "string" }). Passed to the LLM for function calling.
  • source_code (String): The executable logic (if type=CODE).
  • api_spec (String): The OpenAPI JSON (if type=API).
  • mcp_endpoint (String): The URL of the MCP server (if type=MCP).

Engine_ToolSecret

Maps abstract keys in the code to environment variables.

  • id (UUID): Primary Key.
  • tool_version_id (UUID): Foreign Key to Engine_ToolVersion.
  • key_reference (String): The variable name used in the code (e.g., API_KEY). The runner looks this up in the secure vault at runtime.

2. Agent Definition (The "Brain")

Engine_Agent (Container)

The parent identity of an AI Worker (e.g., "Kira").

  • id (UUID): Primary Key.
  • name (String): Display name.

Engine_AgentVersion (Immutable Snapshot)

The configuration of the AI at a specific point in time.

  • id (UUID): Primary Key.
  • agent_id (UUID): Foreign Key to Engine_Agent.
  • version (Int): Sequential version number.
  • status (Enum): DRAFT | PUBLISHED.
  • builder_type (Enum): FORM (Simple) | FLOW (Built via Type C Flow).
  • source_flow_id (UUID): If builder_type=FLOW, links to the definition.
  • model_provider (String): e.g., OPENAI, ANTHROPIC.
  • model_name (String): e.g., gpt-4-turbo.
  • temperature (Float): 0.0 to 1.0. Controls randomness.
  • max_tokens (Int): Hard limit on output length.
  • system_prompt (String): The compiled instructions.
  • knowledge_base_id (UUID): Optional link to RAG data.
  • response_format (JSON): JSON Schema for Structured Output (e.g., force the agent to return valid JSON).

Many-to-Many relationship defining which tools an agent can use.

  • agent_version_id (UUID): FK.
  • tool_version_id (UUID): FK.

3. Flow Definition (The Architecture)

Engine_Flow (Container)

The parent container for a process (e.g., "Vera Due Diligence").

  • id (UUID): Primary Key.
  • name (String): Display name.
  • type (Enum):
    • PIPELINE (Type A): Linear, no loops.
    • ORCHESTRATOR (Type B): State machine, loops allowed.
    • PERSONA (Type C): Compiles to System Prompt.

Engine_FlowTrigger

Defines how a flow starts.

  • id (UUID): Primary Key.
  • flow_id (UUID): FK.
  • type (Enum): WEBHOOK | SCHEDULE | MANUAL | EVENT.
  • cron_expression (String): Standard Cron syntax (e.g., 0 9 * * 1).
  • webhook_slug (String): The URL path component (e.g., /api/hooks/v1/{slug}).
  • input_mapping (JSON): Maps incoming webhook payload fields to Flow Input Variables.

Engine_FlowVariable

Environment-specific configuration constants.

  • id (UUID): Primary Key.
  • flow_id (UUID): FK.
  • key (String): Variable name (e.g., RISK_THRESHOLD).
  • value (String): The value (e.g., 80).
  • environment (Enum): DEV | PROD. Allows testing different settings without changing the graph.

Engine_FlowVersion (The Executable Graph)

  • id (UUID): Primary Key.
  • flow_id (UUID): FK.
  • version (Int): Sequential number.
  • status (Enum): DRAFT | PUBLISHED.
  • input_schema (JSON): Defines what data is required to start this flow. Used to auto-generate UI forms.
  • output_schema (JSON): Defines the guaranteed shape of the final result.
  • compiled_graph (JSON): The optimized adjacency list used by the JIT Runner. Strips out UI metadata.

Engine_Node (The Step)

A single unit of work in the graph.

  • id (UUID): Primary Key.
  • version_id (UUID): FK to FlowVersion.
  • label (String): UI Label.
  • type (Enum):
    • AGENT: Calls an LLM.
    • TOOL: Calls a specific tool directly.
    • SUBFLOW: Triggers another Flow.
    • ROUTER: Semantic (LLM) decision.
    • LOGIC: Deterministic (Code) decision.
    • HANDOVER: Pauses for user interaction.
    • WEBHOOK_WAIT: Pauses for external event.
  • ref_agent_version_id (UUID): FK. If set, uses a specific version of a reusable Agent.
  • ref_tool_version_id (UUID): FK. If set, uses a specific version of a reusable Tool.
  • inline_system_prompt (String): If no reference, defines the prompt here.
  • router_rules (JSON): For Semantic Routers. Maps intents to Target Node IDs.
  • logic_conditions (JSON): For Logic Gates. JS expressions (e.g., input.score > 50).
  • wait_event_slug (String): For WEBHOOK_WAIT. The event name to listen for.
  • input_map (JSON): Maps outputs from previous nodes to inputs of this node (e.g., {{node_1.output}}).

Engine_Edge (The Connection)

  • id (UUID): Primary Key.
  • source_node_id (UUID): FK.
  • target_node_id (UUID): FK.
  • type (Enum):
    • DEFAULT: Standard path.
    • SEMANTIC: Chosen by LLM Router.
    • CONDITIONAL: Chosen by Logic Gate.
    • ERROR: Followed if the Source Node fails/crashes.
  • condition_value (String): The value that triggers this path (e.g., "High Risk" or "true").

4. Execution (The Runtime)

Engine_Execution (The Job)

  • id (UUID): Primary Key.
  • flow_version_id (UUID): FK. Links to the specific version used.
  • trigger_id (UUID): FK. Which trigger started this?
  • status (Enum): RUNNING | SLEEPING | PAUSED | COMPLETED | FAILED.
  • session_id (String): External ID for grouping (e.g., User Session).
  • global_memory (JSON): The "State Object." Accumulates data as the flow runs.

Engine_StepState (The Checkpoint)

  • id (UUID): Primary Key.
  • execution_id (UUID): FK.
  • current_node_id (String): The ID of the node that just finished or is about to run.
  • status (Enum): ACTIVE | AWAITING_CALLBACK | AWAITING_USER.
  • node_outputs (JSON): A map of every node ID to its result. Used for "Time Travel" debugging.
  • stack_trace (JSON): If failed, the error stack.

Engine_RuntimeArtifact (Generated Files)

  • id (UUID): Primary Key.
  • execution_id (UUID): FK.
  • node_id (UUID): FK. Which node created this file?
  • filename (String): e.g., due_diligence_report.pdf.
  • storage_path (String): S3 Key.
  • mime_type (String): e.g., application/pdf.

Engine_StepLog (Billing & Debugging)

  • id (UUID): Primary Key.
  • execution_id (UUID): FK.
  • node_id (UUID): FK.
  • input_tokens (Int): Tokens sent to LLM.
  • output_tokens (Int): Tokens received from LLM.
  • cost_usd (Float): Calculated cost of this step.
  • duration_ms (Float): Execution time.
  • provider_response_id (String): OpenAI/Anthropic Request ID (for tracing).

5. Interaction (Handover)

Engine_ChatSession

  • id (UUID): Primary Key.
  • execution_id (UUID): FK.
  • mode (Enum):
    • CO_PILOT: Iterative. User can trigger re-runs.
    • ANALYST: Consultative. Read-only report, Q&A only.
  • is_active (Boolean): True if the session is currently open.

Engine_ChatMessage

  • id (UUID): Primary Key.
  • session_id (UUID): FK.
  • role (Enum): USER | ASSISTANT | SYSTEM | TOOL.
  • content (String): The message text.
  • tool_calls (JSON): If the agent used a tool (e.g., "Re-run Node"), the call details are stored here.