gateway/docs/code-documentation/aicore-component.md

64 KiB
Raw Blame History

AI Core Component Documentation

Overview

The aicore module is the centralized AI infrastructure layer that provides a plugin-based architecture for integrating multiple AI providers (OpenAI, Anthropic, Perplexity, Tavily) into the application. It acts as an abstraction layer between high-level AI services and specific AI provider APIs, enabling dynamic model discovery, intelligent model selection, and automatic failover.

Key Responsibilities:

  • Dynamic discovery and registration of AI connectors (plugins)
  • Model registry with unified model metadata
  • Intelligent model selection based on operation type, context size, and optimization criteria
  • Automatic failover between models
  • Standardized interface for AI operations across all providers

Architecture

System Architecture Overview

graph TB
    subgraph "Application Layer"
        Routes[FastAPI Routes<br/>routeWorkflows.py<br/>routeChatPlayground.py]
    end
    
    subgraph "Service Layer"
        AiService[AiService<br/>mainServiceAi.py]
        Methods[callAiPlanning<br/>callAiDocuments<br/>callAiText]
        AiService --> Methods
    end
    
    subgraph "Interface Layer"
        AiObjects[AiObjects<br/>interfaceAiObjects.py]
        CallHandler[call request<br/>Handles failover & model calls]
        AiObjects --> CallHandler
    end
    
    subgraph "AI Core Layer"
        Registry[ModelRegistry<br/>discoverConnectors<br/>registerConnector<br/>getAvailableModels]
        Selector[ModelSelector<br/>selectModel<br/>getFailoverModelList<br/>scoring logic]
        Base[BaseConnectorAi<br/>getModels<br/>getConnectorType<br/>getCachedModels]
        
        Registry -.-> Selector
        Selector -.-> Base
    end
    
    subgraph "Plugin Connectors"
        OpenAI[aicorePluginOpenai]
        Anthropic[aicorePluginAnthropic]
        Perplexity[aicorePluginPerplexity]
        Tavily[aicorePluginTavily]
    end
    
    subgraph "AI Provider APIs"
        OpenAI_API[OpenAI API<br/>api.openai.com]
        Anthropic_API[Anthropic API<br/>api.anthropic.com]
        Perplexity_API[Perplexity API<br/>api.perplexity.ai]
        Tavily_API[Tavily API<br/>api.tavily.com]
    end
    
    Routes --> AiService
    AiService --> AiObjects
    AiObjects --> Registry
    AiObjects --> Selector
    
    Base --> OpenAI
    Base --> Anthropic
    Base --> Perplexity
    Base --> Tavily
    
    OpenAI --> OpenAI_API
    Anthropic --> Anthropic_API
    Perplexity --> Perplexity_API
    Tavily --> Tavily_API
    
    style Routes fill:#e1f5ff
    style AiService fill:#fff3e0
    style AiObjects fill:#f3e5f5
    style Registry fill:#e8f5e9
    style Selector fill:#e8f5e9
    style Base fill:#e8f5e9
    style OpenAI fill:#fff9c4
    style Anthropic fill:#fff9c4
    style Perplexity fill:#fff9c4
    style Tavily fill:#fff9c4

Component Structure

The aicore module is organized into several key files:

  • aicoreBase.py: Defines the abstract base class that all AI connectors must inherit from, establishing the contract for plugin implementations
  • aicoreModelRegistry.py: Manages the centralized registry of all available AI models across all connectors
  • aicoreModelSelector.py: Implements the intelligent model selection algorithm based on multiple criteria
  • aicorePlugin.py files*: Individual connector implementations for each AI provider (OpenAI, Anthropic, Perplexity, Tavily, and potentially internal systems)

Each plugin file follows the naming convention aicorePlugin<Provider>.py, which enables the automatic discovery mechanism to find and register them at startup without requiring manual configuration.

Core Components

1. BaseConnectorAi (aicoreBase.py)

The abstract base class that establishes the contract for all AI connector implementations. This class ensures that every AI provider plugin implements a consistent interface, making the system extensible and maintainable.

Core Responsibilities:

The base connector defines several essential methods that every plugin must implement:

  • Model Discovery: Each connector provides its list of available models through getModels(), which returns comprehensive metadata about each model including capabilities, costs, and performance characteristics
  • Connector Identification: The getConnectorType() method returns a unique identifier string for the connector (such as "openai" or "anthropic"), used throughout the system for routing and logging
  • Cached Model Access: The getCachedModels() method provides performance optimization by returning cached model metadata with automatic TTL (Time-To-Live) validation
  • Model Lookup: Utility methods like getModelByDisplayName() enable quick retrieval of specific models by their unique identifiers
  • Cache Management: The clearCache() method allows manual cache invalidation when model configurations need immediate refresh

Critical Design Principle - Unique Display Names:

The system enforces a strict uniqueness constraint on model display names across all connectors. While the name field (used for actual API calls) can be duplicated across different model instances (for example, "gpt-4o" might have multiple instances for different use cases), the displayName must be globally unique. This serves as the primary key in the model registry and prevents configuration conflicts. Examples of unique display names include "OpenAI GPT-4o", "OpenAI GPT-4o Instance Vision", and "Anthropic Claude 3 Opus".

Performance Optimization Through Caching:

To minimize unnecessary operations, the base connector implements a sophisticated caching mechanism with a 5-minute TTL. When getCachedModels() is called, the system checks if cached data exists and if the last update timestamp is within the 300-second window. If the cache is still valid, it returns the cached models immediately, avoiding the overhead of regenerating model metadata. If the cache has expired, it automatically refreshes by calling getModels() and updates both the cache and timestamp. This approach significantly reduces computational overhead during high-frequency operations while ensuring data freshness.

2. ModelRegistry (aicoreModelRegistry.py)

The centralized registry serves as the single source of truth for all available AI models in the system. It acts as a dynamic inventory management system, automatically discovering, validating, and organizing models from all registered connectors.

Automatic Plugin Discovery:

The registry implements a sophisticated auto-discovery mechanism that scans the aicore directory for any files matching the pattern aicorePlugin*.py. This pattern-based discovery enables zero-configuration extensibility - developers can add new AI providers simply by creating a properly named file, and the system automatically detects and integrates it during startup. The discovery process imports each plugin module, inspects its classes to find those inheriting from BaseConnectorAi, and instantiates them for registration.

Dynamic Registration and Validation:

When a connector is registered through registerConnector(), the registry performs critical validation steps. It calls the connector's getCachedModels() method to retrieve all available models, then validates that each model's displayName is unique across the entire registry. If a duplicate is detected, the registration fails with a detailed error message identifying both the existing and conflicting model configurations. This strict validation prevents configuration errors that could lead to unpredictable model selection behavior.

Intelligent Refresh Mechanism:

The registry maintains model freshness through a dual-refresh strategy. First, it implements automatic periodic refresh with a 5-minute interval - when any query method is called, the system checks if the last refresh timestamp exceeds this threshold and triggers an automatic update if needed. Second, it provides a refreshModels() method with a force parameter, allowing manual refresh operations that bypass the TTL check. This is particularly useful during development or when connector configurations change dynamically.

Comprehensive Query Interface:

The registry exposes a rich query interface for model retrieval:

  • Direct Lookup: getModel(displayName) provides O(1) access to specific models using their unique identifier
  • Complete Inventory: getModels() returns the full catalog of registered models
  • Connector Filtering: getModelsByConnector(connectorType) enables retrieval of all models from a specific provider
  • Availability Filtering: getAvailableModels() returns only models currently marked as available, filtering out any disabled or problematic models
  • Reverse Lookup: getConnectorForModel(displayName) retrieves the connector instance responsible for a specific model, enabling direct connector interaction
  • Statistical Analysis: getModelStats() provides aggregate metrics including model counts by connector, capability, and priority

Singleton Pattern:

The registry is implemented as a global singleton instance (modelRegistry) that can be imported and used throughout the application, ensuring consistent model access and preventing duplicate registries.

3. ModelSelector (aicoreModelSelector.py)

The intelligent model selection engine implements a sophisticated scoring algorithm that evaluates available models against multiple criteria to determine the optimal choice for each AI operation. Rather than using hard-coded rules or simple priority lists, the selector employs a weighted scoring system that considers operation compatibility, resource constraints, and performance preferences to create a ranked failover list.

Selection Algorithm:

flowchart TD
    Start[AI Call Request] --> GetModels[Get Available Models<br/>from Registry]
    GetModels --> OpFilter[Filter by Operation Type<br/>MUST support requested operation]
    OpFilter --> SizeFilter[Filter by Prompt Size<br/>Prompt must fit within 80% of context]
    SizeFilter --> Scoring[Calculate Score for Each Model]
    
    Scoring --> Score1[Operation Type Rating × 1000<br/>PRIMARY sorting criteria]
    Scoring --> Score2[Size Rating<br/>How well prompt+context fits]
    Scoring --> Score3[Processing Mode Rating<br/>Compatibility score]
    Scoring --> Score4[Priority Rating<br/>Speed/Quality/Cost preference]
    
    Score1 --> Combine[Combine All Scores]
    Score2 --> Combine
    Score3 --> Combine
    Score4 --> Combine
    
    Combine --> Sort[Sort by Total Score<br/>Descending]
    Sort --> Failover[Create Failover List]
    Failover --> Return[Return Best Model<br/>+ Fallback Models]
    
    style Start fill:#e1f5ff
    style OpFilter fill:#fff3e0
    style SizeFilter fill:#fff3e0
    style Scoring fill:#f3e5f5
    style Sort fill:#e8f5e9
    style Return fill:#c8e6c9

Detailed Algorithm Process:

Phase 1: Operation Type Filtering (Mandatory Constraint)

The first filtering phase is absolute - a model must explicitly support the requested operation type to be considered. Each model in the registry declares its supported operations through an operationTypes list, where each operation (such as PLAN, DATA_ANALYSE, DATA_GENERATE, IMAGE_ANALYSE) is associated with a performance rating from 1-10. Models lacking the required operation type are immediately excluded from consideration, regardless of their other characteristics. This ensures that specialized operations like image analysis are only routed to vision-capable models, and web search operations are directed to appropriate connectors.

Phase 2: Context Size Validation (Resource Constraint)

After operation filtering, the selector validates that each remaining model can physically accommodate the input. The system calculates the approximate token count for both the prompt and context (using a 4-byte-per-token approximation), then compares this against 80% of each model's declared context length. This 80% threshold provides a safety margin for message formatting overhead, system prompts, and output token reservation. Models with insufficient context capacity are filtered out, preventing runtime failures due to context length violations. For models with zero context length (indicating unlimited capacity), this check is bypassed.

Phase 3: Multi-Factor Scoring (Quality Assessment)

Each model that passes both mandatory filters receives a composite score calculated from four weighted components:

  • Operation Type Rating (Primary Factor): Multiplied by 1000 to establish it as the dominant sorting criterion. A model rated 9/10 for DATA_ANALYSE will score 9000 points from this factor alone, while a model rated 7/10 scores only 7000. This massive weighting ensures that operation-specific optimization takes precedence over other factors.

  • Size Efficiency Rating: Measures how efficiently the model utilizes its context window. If the prompt+context fits comfortably (total size ≤ 80% of capacity), the rating equals (actual size / maximum allowed size), rewarding larger models for handling substantial content. If the content exceeds the limit (shouldn't happen after filtering, but serves as safety), the rating inverts to (maximum / actual), penalizing undersized models.

  • Processing Mode Compatibility: Evaluates alignment between the model's processing mode (BASIC, ADVANCED, DETAILED) and the requested mode. Perfect matches score 1.0, while compatible mismatches receive fractional scores (e.g., 0.5 for ADVANCED model handling BASIC request). This allows flexible matching while preferring mode-appropriate models.

  • Priority Optimization: Applies user preference for speed, quality, or cost efficiency. For SPEED priority, models with high speedRating values score better. For QUALITY, qualityRating dominates. For COST, the system inverts cost metrics to favor inexpensive models while adding weighted bonuses for speed and quality. BALANCED priority treats all factors equally.

Phase 4: Ranking and Failover List Generation

After scoring, models are sorted in descending order by their composite scores. The resulting list represents an optimal failover chain - the first model is the best match for the specific request, while subsequent models serve as progressively less optimal but still viable alternatives. This ranked list is returned for use by the call handler, which attempts models in order until one succeeds.

Primary Methods:

The selector exposes two main methods: selectModel() returns only the top-ranked model (index 0 of the failover list), while getFailoverModelList() returns the complete ranked list for failover handling. Both methods accept the same parameters: the prompt text, context data, AI call options, and the list of available models.

Global Singleton:

Like the registry, the selector is implemented as a global singleton (modelSelector) for consistent access throughout the application.

4. Plugin Connectors (aicorePlugin*.py)

Each plugin connector represents a concrete implementation of the BaseConnectorAi interface, tailored to a specific AI provider's API specifications and capabilities. These plugins serve as translation layers between the system's standardized interface and the provider-specific API requirements.

Architectural Pattern:

Each connector follows a consistent architectural pattern with four main components:

Initialization and Configuration: The constructor loads provider-specific configuration from the application's environment settings, including API keys, endpoint URLs, and any provider-specific parameters. It also initializes an HTTP client (typically using httpx) with appropriate timeouts, retry logic, and authentication headers. This separation of configuration from code enables easy deployment across different environments without code changes.

Connector Identification: The getConnectorType() method returns a simple string identifier for the connector, such as "openai", "anthropic", "perplexity", or "tavily". This identifier is used throughout the system for logging, routing, and model attribution. It must be unique across all connectors and is stored in every model's metadata.

Model Catalog Definition: The getModels() method returns a comprehensive list of AiModel instances, each representing a distinct AI model or model configuration. Each model entry includes:

  • Identity: Unique displayName (e.g., "OpenAI GPT-4o") and API name (e.g., "gpt-4o")
  • Technical Specifications: Context window size in tokens, maximum output tokens, default temperature
  • Economic Metrics: Cost per 1000 input tokens and output tokens, enabling accurate cost tracking
  • Performance Characteristics: Speed rating (1-10) indicating response time, quality rating (1-10) for output quality
  • Operational Capabilities: List of supported operation types with performance ratings for each
  • Execution Reference: A callable reference (functionCall) pointing to the method that handles API communication
  • Strategic Attributes: Priority classification (SPEED, QUALITY, COST, BALANCED) and processing mode (BASIC, ADVANCED, DETAILED)

API Communication Implementation:

Connectors implement one or more call methods (such as callAiBasic(), callAiImage(), or specialized methods) that handle the actual communication with the AI provider's API. These methods:

  • Accept standardized AiModelCall objects containing messages, model reference, and options
  • Transform the standardized request format into the provider's specific API format (different providers use varying JSON schemas for requests)
  • Execute HTTP requests with appropriate error handling, timeouts, and retry logic
  • Parse provider-specific response formats back into standardized AiModelResponse objects
  • Calculate actual costs based on token usage reported by the provider
  • Handle provider-specific error codes and translate them into meaningful exceptions

Provider-Specific Adaptations:

Each connector adapts to its provider's unique characteristics:

  • OpenAI Connectors: Support both text completion and vision capabilities, handle rate limiting, manage model versioning
  • Anthropic Connectors: Implement Claude-specific message formatting, handle thinking tokens, manage conversation context
  • Perplexity Connectors: Integrate web search capabilities, handle citation extraction, manage search-enhanced responses
  • Tavily Connectors: Implement web crawling protocols, handle structured data extraction, manage crawl depth and scope

Connection to serviceAi

The aicore module is the foundation layer that serviceAi (AI Service) builds upon. Here's how they connect:

Integration Flow

sequenceDiagram
    participant App as Application<br/>(app.py)
    participant Service as Service Layer<br/>(mainServiceAi.py)
    participant Interface as Interface Layer<br/>(interfaceAiObjects.py)
    participant Core as AI Core<br/>(aicore/)
    participant Provider as AI Provider APIs
    
    App->>Service: HTTP Request
    Service->>Interface: callAiDocuments/Planning
    Interface->>Core: AiCallRequest
    Core->>Core: Model Selection
    Core->>Provider: API Call
    Provider-->>Core: API Response
    Core-->>Interface: AiCallResponse
    Interface-->>Service: Processed Result
    Service-->>App: HTTP Response

Initialization Sequence

sequenceDiagram
    participant App as app.py
    participant Lifecycle as featuresLifecycle
    participant Service as AiService
    participant AiObjects as AiObjects
    participant Registry as ModelRegistry
    participant Plugins as Plugin Connectors
    
    App->>Lifecycle: lifespan startup
    Lifecycle->>Lifecycle: start()
    Lifecycle->>Service: create AiService
    Service->>AiObjects: AiObjects.create()
    
    AiObjects->>AiObjects: _discoverAndRegisterConnectors()
    AiObjects->>Registry: discoverConnectors()
    
    Registry->>Registry: Scan aicore folder<br/>for aicorePlugin*.py
    Registry->>Plugins: Import & instantiate connectors
    
    loop For each discovered connector
        AiObjects->>Registry: registerConnector(connector)
        Registry->>Plugins: connector.getModels()
        Plugins-->>Registry: List[AiModel]
        Registry->>Registry: Validate displayName uniqueness
        Registry->>Registry: Store models with displayName as key
    end
    
    Registry-->>AiObjects: Registration complete
    AiObjects-->>Service: Initialized with all models
    Service-->>Lifecycle: AiService ready
    Lifecycle-->>App: Startup complete
    
    Note over Registry: Models cached for 5 minutes<br/>with auto-refresh

Service-to-Core Communication

The communication between the service layer and aicore follows a well-defined request-response pattern with multiple abstraction layers, each serving a specific purpose in the overall architecture.

High-Level Service Operations:

The AiService class (in mainServiceAi.py) provides domain-specific methods that application features and workflows can invoke. These methods abstract away the complexity of AI operations, presenting simple interfaces like callAiPlanning() for task planning and callAiDocuments() for document processing.

When callAiPlanning() is invoked, it handles prompt construction by integrating placeholders and building a complete prompt string. It then creates an AiCallRequest object configured specifically for planning operations - with operation type set to PLAN, priority set to QUALITY (since planning requires accurate reasoning), and processing mode set to DETAILED (to ensure comprehensive analysis). This request is passed to aiObjects.call(), initiating the core AI processing chain.

The callAiDocuments() method follows a similar pattern but with more flexibility - it accepts custom options, handles document attachments, and can process various output formats. It manages document extraction, prompt building with continuation contexts, and result formatting, while delegating the actual AI communication to the aicore layer.

Interface Layer Orchestration:

The AiObjects class (in interfaceAiObjects.py) serves as the orchestration layer, coordinating between the service layer's high-level requests and the aicore's model selection and execution capabilities. When its call() method receives an AiCallRequest, it follows a three-phase process:

Phase 1 - Model Selection: The interface queries the modelRegistry to retrieve all currently available models. It then invokes the modelSelector's getFailoverModelList() method, passing the request's prompt, context, and options. The selector returns a prioritized list of suitable models, ranked from most to least optimal for the specific request characteristics.

Phase 2 - Failover Execution: The interface iterates through the failover list, attempting each model in sequence. For each attempt, it calls the internal _callWithModel() method, which constructs a standardized AiModelCall object and invokes the model's functionCall reference. This reference points to the connector's API communication method, which executes the actual HTTP request to the AI provider.

If the model call succeeds, the interface immediately returns the AiCallResponse to the service layer, completing the request. If an exception occurs (due to API errors, rate limits, or other issues), the interface logs the error with detailed context and proceeds to the next model in the failover list.

Phase 3 - Completion or Failure: If any model succeeds, the operation completes successfully. If all models in the failover list fail (a rare but possible scenario during API outages or configuration errors), the interface returns an AiCallResponse with an error message and error count, allowing the service layer to handle the failure gracefully.

Cross-Cutting Concerns:

Throughout this communication flow, several cross-cutting concerns are handled automatically:

  • Metrics Collection: Every AI call records timing, token usage, costs, and error counts for monitoring and optimization
  • Progress Tracking: Long-running operations emit progress updates through callbacks for user feedback
  • Content Chunking: Large content that exceeds model context limits is automatically chunked and processed in segments
  • Token Management: The system calculates token usage estimates and reserves appropriate context space for prompts, system messages, and expected outputs

Key Integration Points

  1. Model Selection: serviceAi delegates to modelSelector for choosing the right model
  2. Failover Handling: AiObjects.call() automatically tries multiple models if one fails
  3. Operation Types: serviceAi defines operation types (PLAN, DATA_ANALYSE, etc.) that aicore uses for selection
  4. Standardized Interface: All AI calls go through AiCallRequest/AiCallResponse regardless of provider

Connection to the Application

Application Flow

sequenceDiagram
    participant User
    participant Route as FastAPI Route
    participant Workflow as Workflow/Feature
    participant Service as AiService
    participant Objects as AiObjects
    participant Registry as ModelRegistry
    participant Selector as ModelSelector
    participant Plugin as Plugin Connector
    participant API as AI Provider API
    
    User->>Route: HTTP Request
    Route->>Workflow: Call workflow
    Workflow->>Service: callAiDocuments()
    Service->>Objects: aiObjects.call(request)
    Objects->>Registry: getAvailableModels()
    Registry-->>Objects: List of models
    Objects->>Selector: getFailoverModelList()
    Selector-->>Objects: Sorted model list
    
    loop Try each model until success
        Objects->>Plugin: model.functionCall()
        Plugin->>API: HTTP Request
        
        alt Success
            API-->>Plugin: Response
            Plugin-->>Objects: AiModelResponse
            Objects-->>Service: AiCallResponse
        else Error
            API-->>Plugin: Error
            Plugin-->>Objects: Exception
            Objects->>Objects: Try next model
        end
    end
    
    Service-->>Workflow: Result
    Workflow-->>Route: Response
    Route-->>User: HTTP Response
    
    Note over Objects,Plugin: Automatic failover<br/>tries next best model

Example: Chat Workflow

User Request: "Analyze this document and extract key information"

sequenceDiagram
    participant User
    participant Route as Route Handler<br/>routeChatPlayground.py
    participant Workflow as Workflow Layer
    participant AiService as AiService<br/>mainServiceAi.py
    participant AiObjects as AiObjects<br/>interfaceAiObjects.py
    participant Registry as ModelRegistry
    participant Selector as ModelSelector
    participant Connector as aicorePluginOpenai.py
    participant OpenAI as OpenAI API
    
    User->>Route: POST /chat/message<br/>"Analyze document"
    Route->>Workflow: featureWorkflow.run(request)
    
    Workflow->>AiService: callAiDocuments()<br/>operationType=DATA_EXTRACT
    Note over AiService: Build prompt with placeholders
    
    AiService->>AiObjects: aiObjects.call(request)
    AiObjects->>Registry: getAvailableModels()
    Registry-->>AiObjects: List of models
    
    AiObjects->>Selector: getFailoverModelList()
    Note over Selector: Filter by DATA_EXTRACT<br/>Score and sort models
    Selector-->>AiObjects: [GPT-3.5, GPT-4, ...]
    
    AiObjects->>Connector: model.functionCall(AiModelCall)
    Note over Connector: Format for OpenAI API
    
    Connector->>OpenAI: HTTP POST with messages
    OpenAI-->>Connector: JSON response
    
    Connector-->>AiObjects: AiModelResponse
    AiObjects-->>AiService: AiCallResponse
    Note over AiService: Handle looping if needed
    
    AiService-->>Workflow: Extracted content
    Workflow-->>Route: Result with documents
    Route-->>User: HTTP 200 + JSON response
    
    Note over User,OpenAI: Full request/response cycle<br/>with automatic failover

Detailed Flow Breakdown:

Step 1: HTTP Request Reception When a user sends a chat message through the frontend, it arrives as an HTTP POST request to the /chat/message endpoint defined in routeChatPlayground.py. The route handler receives a ChatMessageRequest containing the user's message, any attached documents, and conversation context. The handler immediately delegates to the workflow system by calling featureWorkflow.run(request), which orchestrates the entire chat processing pipeline.

Step 2: Workflow Orchestration The workflow layer (living between routes and services) analyzes the user's request to determine the appropriate processing strategy. For a document analysis request, it identifies that document extraction is needed and invokes serviceCenter.ai.callAiDocuments(). This call includes the constructed prompt ("Extract key information from documents"), the attached chat documents, and explicitly configured options specifying DATA_EXTRACT as the operation type - signaling that this is an information extraction task rather than generation or analysis.

Step 3: Service Layer Processing The AiService receives the document processing request and performs several preparatory operations. It builds the complete prompt by replacing any placeholder markers with actual content (such as document titles, user context, or system instructions). It validates the documents and converts them into the appropriate format for AI processing. For lengthy responses that might span multiple AI generations, it sets up a looping mechanism that can handle continuation contexts. Finally, it creates an AiCallRequest object and passes it to aiObjects.call(), transitioning into the core AI layer.

Step 4: Intelligent Model Selection The AiObjects interface queries the modelRegistry to retrieve all currently available and healthy models. It then invokes the modelSelector with the full request context - passing the prompt text, any additional context, and the configured options. The selector executes its multi-phase filtering and scoring algorithm, ultimately returning a prioritized failover list. For a DATA_EXTRACT operation, this list typically starts with fast, cost-efficient models (like GPT-3.5 Turbo or Claude Haiku) since extraction doesn't require the highest reasoning capabilities.

Step 5: Model Execution with Failover AiObjects begins iterating through the failover list, attempting each model in sequence. For the first model (assume GPT-3.5 Turbo from OpenAI), it constructs an AiModelCall object containing the formatted messages and invokes the model's registered functionCall, which points to the OpenAI connector's API method. The connector transforms the standardized request into OpenAI's specific JSON format, adds authentication headers, and sends an HTTP POST request to api.openai.com/v1/chat/completions.

If the OpenAI API responds successfully, the connector parses the JSON response, extracts the generated text, calculates costs based on reported token usage, and wraps everything in an AiModelResponse object. This response flows back through AiObjects, which converts it to an AiCallResponse and returns it to the service layer.

If the API call fails (network timeout, rate limit, API error), the connector throws an exception. AiObjects catches this exception, logs detailed error information including the model name and error type, and immediately proceeds to the next model in the failover list. This process continues until either a model succeeds or the entire list is exhausted.

Step 6: Response Assembly and Delivery Once the AiService receives a successful AiCallResponse, it processes the content according to the request specifications. For document extraction, this might involve parsing structured JSON from the AI's response, validating the extracted data against expected schemas, and formatting it for frontend consumption. The processed result flows back up through the workflow layer, which adds any workflow-specific metadata (execution time, step logs), and finally reaches the route handler. The handler constructs an HTTP response with appropriate status codes and headers, delivering the extracted information back to the waiting frontend client.

Error Handling Throughout: At every step, comprehensive error handling ensures graceful degradation. If document processing fails, the workflow might retry with different parameters or return a helpful error message. If all AI models fail, the system returns a structured error response rather than crashing. Each failure point is logged with sufficient context for debugging and monitoring.

Configuration

Environment-Based Secrets Management:

The aicore system loads all sensitive configuration through the application's central APP_CONFIG system, which reads from environment files (env_dev.env, env_int.env, env_prod.env) based on the deployment environment. Each AI provider connector requires its API key stored under a standardized naming convention: Connector_Ai<Provider>_API_SECRET. For example, the OpenAI connector looks for Connector_AiOpenai_API_SECRET, while Anthropic uses Connector_AiAnthropic_API_SECRET. This convention enables consistent configuration management across all providers and environments.

Additional provider-specific settings follow similar naming patterns with descriptive suffixes. The SECRET suffix indicates that these values contain sensitive information and should never be committed to version control or exposed in logs. Configuration loading happens during connector initialization, allowing different API keys per environment without code changes.

Plugin-Level Model Configuration:

Each plugin file contains hard-coded model definitions specifying technical and economic characteristics. These configurations include:

  • Capacity Parameters: Context window sizes (in tokens) define maximum input lengths, while max token settings limit output generation length
  • Economic Metrics: Input and output costs per 1000 tokens enable accurate cost tracking and budget management
  • Performance Characteristics: Speed ratings (1-10 scale) indicate typical response time, while quality ratings reflect output sophistication and accuracy
  • Operational Capabilities: Operation type ratings specify which tasks each model handles well, with ratings from 1-10 for supported operations
  • Strategic Classifications: Priority tags (SPEED, QUALITY, COST, BALANCED) and processing mode designations (BASIC, ADVANCED, DETAILED) guide selection algorithms

These plugin-level configurations represent the static characteristics of models and change only when model capabilities are updated or new models are added. They're versioned with the code rather than stored in environment variables, since they're not environment-specific or sensitive.

Key Features

1. Dynamic Plugin Architecture

graph LR
    subgraph "Auto-Discovery Process"
        Scan[Scan aicore folder<br/>for aicorePlugin*.py]
        Import[Import module dynamically]
        Find[Find BaseConnectorAi<br/>subclasses]
        Instantiate[Instantiate connector]
        Register[Register in ModelRegistry]
    end
    
    subgraph "Plugin Files"
        P1[aicorePluginOpenai.py]
        P2[aicorePluginAnthropic.py]
        P3[aicorePluginPerplexity.py]
        P4[aicorePluginTavily.py]
        P5[aicorePlugin*.py<br/>Add new plugins here]
    end
    
    Scan --> P1
    Scan --> P2
    Scan --> P3
    Scan --> P4
    Scan --> P5
    
    P1 --> Import
    P2 --> Import
    P3 --> Import
    P4 --> Import
    P5 --> Import
    
    Import --> Find
    Find --> Instantiate
    Instantiate --> Register
    
    Register --> Models[All Models Available<br/>in ModelRegistry]
    
    style Scan fill:#e1f5ff
    style Register fill:#c8e6c9
    style P5 fill:#fff9c4
    style Models fill:#e8f5e9

Key Benefits:

  • New AI providers can be added by creating aicorePlugin*.py files
  • No code changes needed in core logic
  • Automatic discovery and registration

2. Intelligent Model Selection

The model selection engine goes far beyond simple rule-based routing by implementing a sophisticated multi-criteria decision system:

Holistic Evaluation: Rather than selecting models based on a single factor, the selector considers operation type compatibility (can this model handle planning vs. extraction?), resource constraints (will the prompt fit?), performance preferences (does the user prioritize speed or quality?), and cost implications. Each factor contributes to a weighted score that reflects the model's overall suitability.

Context-Aware Decisions: The selector analyzes not just what operation is requested, but also the size and complexity of the input. A simple data extraction from a small document might route to a fast, economical model like GPT-3.5 Turbo, while complex multi-document analysis with a large prompt routes to more capable models like GPT-4 or Claude Opus. This context-awareness optimizes the trade-off between cost and capability.

Ranked Failover Lists: Instead of returning a single "best" model, the selector produces a complete ranked list representing a spectrum from optimal to acceptable. This ranked list serves as a failover chain - if the top model fails due to rate limits or transient errors, the system immediately tries the second-ranked model without user intervention or workflow delays. This approach significantly improves system reliability and reduces user-facing errors.

3. Automatic Failover

flowchart TD
    Start[AI Call Request] --> GetList[Get Failover Model List<br/>Sorted by Score]
    GetList --> Loop{Models<br/>Available?}
    
    Loop -->|Yes| Try[Try Model #N]
    Try --> Call[Call model.functionCall]
    
    Call --> Success{Success?}
    Success -->|Yes| Return[Return Response]
    Success -->|No| Log[Log Error with Details]
    
    Log --> More{More Models<br/>in List?}
    More -->|Yes| Next[Try Next Model]
    Next --> Loop
    More -->|No| Fail[All Models Failed]
    
    Loop -->|No| Error[Return Error Response]
    Fail --> Error
    Return --> End[Response to Caller]
    Error --> End
    
    style Start fill:#e1f5ff
    style Try fill:#fff3e0
    style Success fill:#f3e5f5
    style Return fill:#c8e6c9
    style Error fill:#ffcdd2
    style Next fill:#fff9c4

Key Benefits:

  • If primary model fails, automatically tries next best
  • Logs each attempt with detailed error information
  • Ensures high availability of AI operations
  • No manual intervention required

4. Model Caching

stateDiagram-v2
    [*] --> Empty: System Start
    Empty --> Loading: First Request
    Loading --> Cached: getModels() called
    Cached --> Valid: Check TTL
    Valid --> Cached: TTL < 5 min
    Valid --> Expired: TTL >= 5 min
    Expired --> Loading: Refresh
    Loading --> Cached: Cache Updated
    Cached --> [*]: Return Models
    
    note right of Cached
        Models cached for 5 minutes
        Reduces API calls
        Improves performance
    end note
    
    note right of Loading
        Calls connector.getModels()
        Updates _last_cache_update
        Stores in _models_cache
    end note

Key Benefits:

  • 5-minute TTL cache for model metadata
  • Reduces repeated API calls
  • Improves performance
  • Manual cache clearing available via clearCache()

5. Unified Interface

One of the aicore system's most powerful design principles is its provider-agnostic abstraction layer:

Universal Request Format: Regardless of whether the eventual API call goes to OpenAI, Anthropic, Perplexity, or any other provider, the requesting code always uses the same AiCallRequest structure. This insulates application code from the complexity and variability of different provider APIs. Developers can write workflow logic once, and the system handles all provider-specific transformations transparently.

Standardized Response Structure: Every AI operation returns an AiCallResponse object with the same structure and semantics, whether it came from GPT-4, Claude, or a specialized search model. This consistency simplifies response handling code - no need for provider-specific parsing logic or conditional handling based on which model was used.

Consistent Error Semantics: Different AI providers report errors in vastly different formats - OpenAI uses different status codes and error structures than Anthropic, which differs from Perplexity. The aicore connectors translate all these provider-specific error formats into consistent error responses with standardized error counts and messages. This enables unified error handling logic throughout the application.

Normalized Metrics: Cost calculations, timing measurements, and token usage reporting follow the same format regardless of provider. This enables apples-to-apples comparisons of different models' performance and economics, facilitating data-driven decisions about model selection strategies.

6. Operation Type System

The operation type taxonomy provides semantic categorization of AI tasks, enabling intelligent routing and specialized model selection:

Task-Based Classification: Rather than selecting models based on generic "intelligence" levels, the system classifies each request by what it's trying to accomplish. This task-based approach recognizes that different models excel at different types of operations - a model optimized for rapid extraction might not be ideal for deep analytical reasoning, even if both are "capable" in an abstract sense.

Operation Type Catalog:

  • PLAN: Strategic reasoning operations including task decomposition, action sequencing, and decision planning. These operations require strong logical reasoning and the ability to consider multiple factors simultaneously. Typically routed to high-capability models like GPT-4 or Claude Opus.

  • DATA_ANALYSE: Analytical operations that examine data to identify patterns, draw insights, or make assessments. Requires good comprehension and reasoning but not necessarily creative generation. Often uses balanced models that provide good analysis without premium costs.

  • DATA_GENERATE: Creative content generation including report writing, document creation, and structured output generation. Emphasizes coherent, well-structured output over analytical depth. Can often use mid-tier models effectively.

  • DATA_EXTRACT: Information extraction and parsing operations that pull structured data from unstructured sources. Speed and accuracy matter more than sophisticated reasoning. Frequently routed to fast, economical models like GPT-3.5 Turbo or Claude Haiku.

  • IMAGE_ANALYSE: Vision operations including image understanding, OCR, visual question answering, and scene description. Requires specialized vision-capable models with multimodal understanding. Automatically routes to GPT-4 Vision, Claude Vision, or similar models.

  • IMAGE_GENERATE: Image creation and generation operations. Routes to specialized generative models like DALL-E or Stable Diffusion connectors.

  • WEB_SEARCH: Real-time web search operations that query current information. Routes to search-specialized connectors like Perplexity that integrate web search APIs.

  • WEB_CRAWL: Web content extraction and crawling operations. Routes to specialized web crawling connectors like Tavily that handle website traversal and content extraction.

Performance Rating System: Each model declares not just which operations it supports, but how well it performs each operation on a 1-10 scale. A model might rate 9/10 for DATA_ANALYSE but only 6/10 for DATA_GENERATE, reflecting its strengths in analytical over creative tasks. These ratings form the primary sorting criterion in model selection, ensuring task-appropriate routing.

7. Content-Aware Chunking

When content exceeds a model's context capacity, the system employs sophisticated chunking strategies rather than simply failing:

Model-Specific Chunk Sizing: Chunking decisions are based on each model's specific capabilities rather than using universal chunk sizes. A model with a 128K token context window receives much larger chunks than one with a 16K limit. The system calculates optimal chunk sizes by considering the model's total context length, subtracting reserved space for prompts and system messages, and applying a safety margin (typically 70-80% utilization).

Comprehensive Token Accounting: Naive chunking might only consider content size, but the aicore system accounts for all token consumers: the user prompt (which repeats with each chunk), system message overhead (message formatting and instructions), output token reservation (space the model needs for its response), and protocol overhead (JSON structure and metadata). This comprehensive accounting prevents context overflow errors during generation.

Intelligent Result Merging: After processing multiple chunks, their results must be intelligently combined. Simple concatenation can produce disjointed or redundant output. The system employs content-type-aware merging strategies - text chunks are merged with appropriate spacing and deduplication, structured data is merged while preserving relationships, and vision results are aggregated with context preservation. The merging system maintains coherence across chunk boundaries, producing results that read as unified responses rather than fragmented pieces.

Progressive Processing: For very large documents, chunking enables progressive processing where each chunk can be processed as soon as it's prepared, rather than waiting for the entire document. This streaming approach reduces perceived latency and enables progress reporting to users, showing incremental completion rather than a black box wait.

Data Models

Core Data Models (datamodelAi.py)

classDiagram
    class AiModel {
        +string name
        +string displayName
        +string connectorType
        +string apiUrl
        +float temperature
        +int maxTokens
        +int contextLength
        +float costPer1kTokensInput
        +float costPer1kTokensOutput
        +int speedRating
        +int qualityRating
        +callable functionCall
        +PriorityEnum priority
        +ProcessingModeEnum processingMode
        +List~OperationTypeRating~ operationTypes
        +string version
        +callable calculatePriceUsd
    }
    
    class AiCallRequest {
        +string prompt
        +string context
        +AiCallOptions options
        +List~ContentPart~ contentParts
    }
    
    class AiCallOptions {
        +OperationTypeEnum operationType
        +PriorityEnum priority
        +ProcessingModeEnum processingMode
        +bool compressPrompt
        +bool compressContext
    }
    
    class AiCallResponse {
        +string content
        +string modelName
        +float priceUsd
        +float processingTime
        +int bytesSent
        +int bytesReceived
        +int errorCount
    }
    
    class OperationTypeEnum {
        <<enumeration>>
        PLAN
        DATA_ANALYSE
        DATA_GENERATE
        DATA_EXTRACT
        IMAGE_ANALYSE
        IMAGE_GENERATE
        WEB_SEARCH
        WEB_CRAWL
    }
    
    class PriorityEnum {
        <<enumeration>>
        BALANCED
        SPEED
        QUALITY
        COST
    }
    
    class ProcessingModeEnum {
        <<enumeration>>
        BASIC
        ADVANCED
        DETAILED
    }
    
    AiCallRequest --> AiCallOptions
    AiCallOptions --> OperationTypeEnum
    AiCallOptions --> PriorityEnum
    AiCallOptions --> ProcessingModeEnum
    AiModel --> PriorityEnum
    AiModel --> ProcessingModeEnum
    
    note for AiModel "Unique displayName required\nacross all connectors"
    note for AiCallRequest "Input to AI system"
    note for AiCallResponse "Output from AI system"

Core Data Model Descriptions:

AiModel: Represents a complete model configuration with all metadata required for selection, execution, and cost tracking. The name field contains the API-level identifier used in actual provider calls, while displayName serves as the globally unique identifier within the registry. Technical specifications like contextLength (maximum input tokens) and maxTokens (maximum output tokens) inform chunking and validation logic. Economic fields (costPer1kTokensInput, costPer1kTokensOutput) enable precise cost tracking across all operations. Performance metrics (speedRating, qualityRating) influence selection algorithms. The functionCall field holds a callable reference to the connector method that executes API communication. The operationTypes list defines which operation types this model supports and how well it performs each, using ratings from 1-10.

AiCallRequest: Encapsulates all information needed to execute an AI operation. The prompt contains the primary instruction or question, while optional context provides supporting information. The options object configures operation behavior including type, priority, and processing mode. For multi-modal requests (like vision operations), the contentParts list can contain multiple pieces of content with different MIME types.

AiCallOptions: Configures how an AI operation should be executed. The operationType determines what kind of operation this is (planning, analysis, generation, etc.), which drives model selection. The priority indicates whether to optimize for speed, quality, cost, or balance. The processingMode suggests the depth of processing required (basic for simple tasks, detailed for complex reasoning). Boolean flags like compressPrompt and compressContext control whether the system should attempt content compression to fit context limits.

AiCallResponse: Contains the complete result of an AI operation including the generated content, the modelName that produced it, and comprehensive metrics. Cost tracking is provided via priceUsd, calculated based on actual token usage reported by the provider. Performance metrics include processingTime (wall-clock time for the operation), bytesSent and bytesReceived (for network monitoring), and errorCount (zero for success, greater than zero indicating partial or complete failure).

Best Practices

Adding a New AI Provider

The plugin architecture makes adding new AI providers straightforward through a four-step process:

Step 1: Create the Plugin File

Create a new file in the modules/aicore directory following the naming convention aicorePlugin<Provider>.py, where <Provider> is a descriptive name for the AI service (e.g., aicorePluginCohere for Cohere AI). The filename itself triggers automatic discovery - the system scans for any file matching the aicorePlugin*.py pattern during initialization.

Step 2: Implement the Connector Class

Within your plugin file, create a class that inherits from BaseConnectorAi. This class must implement several required methods:

Connector Identification: The getConnectorType() method returns a simple string identifier (lowercase, no spaces) that uniquely identifies this connector throughout the system. This identifier appears in logs, model metadata, and routing decisions.

Model Catalog Definition: The getModels() method returns a list of AiModel instances, one for each model configuration you want to expose. Each AiModel requires comprehensive metadata including:

  • A unique displayName that differs from all other models in the system (e.g., "Cohere Command-R Plus")
  • The API model name used in actual API calls
  • Technical specifications (context length, max output tokens, temperature)
  • Economic data (input and output costs per 1000 tokens)
  • Performance ratings (speed and quality on 1-10 scales)
  • Operational capabilities defined via createOperationTypeRatings(), specifying which operation types the model supports and how well (rating 1-10 for each)
  • A reference to the callable method that handles API communication (typically a method on your connector class)

API Communication Method: Implement one or more async methods (like callAi()) that accept an AiModelCall object and return an AiModelResponse. This method handles the actual HTTP communication with your provider's API. It must:

  • Extract messages from the AiModelCall
  • Transform them into the provider's expected JSON format
  • Execute the HTTP request with proper authentication and error handling
  • Parse the provider's response format
  • Extract the generated text and any usage statistics
  • Calculate costs based on token usage
  • Return everything wrapped in an AiModelResponse object

Step 3: Configure Environment Variables

Add the necessary configuration to your environment files (env_dev.env, env_int.env, env_prod.env). At minimum, this includes the API key for authentication, but might also include endpoint URLs, organization IDs, or other provider-specific settings. Use descriptive configuration key names following the convention Connector_Ai<Provider>_<SettingName>_SECRET for sensitive values.

Step 4: Automatic Integration

No manual registration or configuration code changes are required. When the application next starts, the modelRegistry's discovery mechanism automatically:

  • Scans the aicore directory
  • Finds your new plugin file
  • Imports the module
  • Instantiates your connector class
  • Calls getModels() to retrieve available models
  • Validates displayName uniqueness
  • Registers all models in the global registry

Your new AI provider is now fully integrated and will participate in model selection for appropriate operation types. The system logs will show discovery and registration messages confirming successful integration.

Model Selection Guidelines

  • PLAN operations: Use high-quality models (GPT-4, Claude 3 Opus)
  • DATA_GENERATE: Balanced models for quality/cost trade-off
  • DATA_EXTRACT: Speed-optimized models for bulk processing
  • IMAGE_ANALYSE: Vision-capable models only
  • WEB_SEARCH: Specialized search connectors (Perplexity, Tavily)

Error Handling Philosophy

The aicore system implements a comprehensive error handling strategy designed for resilience and observability:

Automatic Failover: When you invoke aiObjects.call() with a request, the system automatically attempts multiple models from the failover list until one succeeds. Each failure is logged with detailed context (model name, error type, error message) but doesn't interrupt the execution flow. Only if all models in the failover list fail does the method return an error response.

Graceful Degradation: Rather than throwing exceptions that crash workflows, the system returns AiCallResponse objects even in failure scenarios. These error responses have errorCount greater than zero and contain descriptive error messages in the content field. This allows calling code to inspect the errorCount property and decide how to handle partial failures - whether to retry with different parameters, fall back to alternative processing paths, or present user-friendly error messages.

Comprehensive Logging: Every error is logged with sufficient context for debugging: the attempted model's displayName, the operation type, the error type (network timeout, API error, rate limit, etc.), and the full error message. This creates an audit trail for troubleshooting production issues without requiring verbose debug logging during normal operations.

Error Classification: The system distinguishes between transient errors (network timeouts, temporary API issues) that warrant trying another model, and permanent errors (authentication failures, malformed requests) that indicate configuration problems requiring immediate attention. Transient errors trigger failover silently, while permanent errors are logged at higher severity levels.

Performance Considerations

Caching

  • Model registry caches for 5 minutes
  • Connector models cached individually
  • Reduces discovery overhead

Failover Strategy

  • Models sorted by score (best first)
  • Failed models logged with detailed errors
  • Next best model tried automatically

Chunking

  • Large content automatically chunked based on model limits
  • Conservative 70-80% utilization for safety
  • Intelligent merging of chunk results

Cost Optimization

  • Model selector considers cost ratings
  • Price calculated per call for tracking
  • Can prioritize by cost with PriorityEnum.COST

Troubleshooting

Common Issues

  1. "No models available"

    • Check API keys in environment configuration
    • Verify connector plugins exist in aicore/ folder
    • Check logs for connector initialization errors
  2. "No suitable model found"

    • Check if operation type is supported by any model
    • Verify prompt size isn't too large for all models
    • Review model filtering criteria in logs
  3. "All models failed"

    • Check API connectivity and keys
    • Review model-specific error messages in logs
    • Verify request format is correct
  4. "Duplicate displayName"

    • Each model must have unique displayName
    • Check all plugin files for name conflicts
    • Naming convention: <Provider> <Model Name>

Future Enhancements

  • Streaming Support: Real-time response streaming for chat interfaces
  • Model Health Monitoring: Track success rates and performance metrics
  • Cost Budgets: Automatic model selection based on budget constraints
  • Custom Scoring: User-defined scoring functions for model selection
  • A/B Testing: Compare different models for the same operation
  • Rate Limiting: Built-in rate limit handling per provider

Quick Reference

Common Usage Patterns

1. Making AI Calls:

There are two primary approaches for invoking AI operations in the system:

Via AiService (Recommended Approach): The recommended pattern uses the high-level service methods like callAiPlanning(), callAiDocuments(), or callAiText(). These methods are accessed through the serviceCenter and handle all complexity internally. For planning operations, you call serviceCenter.ai.callAiPlanning() with a prompt string and optional placeholder list. Placeholders allow dynamic content injection - the system replaces markers like {TASK} with actual content before sending to the AI. This approach provides automatic prompt building, placeholder resolution, and response formatting.

Direct via AiObjects (Advanced Use): For specialized scenarios requiring fine-grained control, you can construct an AiCallRequest manually and invoke aiObjects.call() directly. This requires creating an AiCallOptions object with explicit operation type and priority settings, then awaiting the call. The response object contains the generated content plus metrics like token usage, processing time, and costs. This approach is typically used within service implementations or for custom AI workflows.

2. Querying Available Models:

The modelRegistry provides comprehensive model inventory access:

Complete Inventory Access: Calling modelRegistry.getAvailableModels() returns all currently available and healthy models across all registered connectors. This list automatically excludes any models marked as unavailable due to configuration issues or connector errors.

Connector-Specific Filtering: Use modelRegistry.getModelsByConnector("openai") to retrieve only models from a specific provider. This is useful when implementing provider-specific features or debugging connector issues. Pass the connector type string (openai, anthropic, perplexity, tavily) as the parameter.

Direct Model Lookup: For retrieving a specific model's full metadata, use modelRegistry.getModel("OpenAI GPT-4o") with the exact displayName. This returns the complete AiModel object including capabilities, costs, ratings, and the functionCall reference.

Statistical Overview: The modelRegistry.getModelStats() method provides aggregate statistics including total model count, availability counts, breakdowns by connector type, capability distribution, and priority classifications. This is valuable for monitoring system health and model distribution.

3. Understanding Model Selection:

To understand how the system selects models for specific requests:

Generating Failover Lists: Invoke modelSelector.getFailoverModelList() with your prompt, context, options, and the list of available models. The selector executes its full filtering and scoring algorithm, returning a ranked list ordered from most to least suitable. The first element represents the optimal choice, while subsequent elements serve as fallback options.

Analyzing Selection Results: Each model in the failover list has been validated for operation type compatibility and context size constraints. Their ordering reflects the composite score from operation ratings, size efficiency, processing mode alignment, and priority preferences. Examining this list helps understand why specific models were chosen or excluded for particular operations.

Operation Types Reference

Operation Type Description Best Models Use Case
PLAN Task planning, action selection GPT-4, Claude Opus Workflow planning, decision making
DATA_ANALYSE Data analysis and insights GPT-4, Claude Sonnet Document analysis, pattern detection
DATA_GENERATE Content generation GPT-4, Claude Sonnet Report creation, document generation
DATA_EXTRACT Information extraction GPT-3.5, Claude Haiku Text extraction, data parsing
IMAGE_ANALYSE Image/vision analysis GPT-4 Vision, Claude Vision Image understanding, OCR
IMAGE_GENERATE Image generation DALL-E, Stable Diffusion Image creation
WEB_SEARCH Web search operations Perplexity Real-time web search
WEB_CRAWL Web crawling Tavily Website content extraction

Priority Reference

Priority Description Selection Behavior
BALANCED Balance speed, quality, cost Default selection
SPEED Prioritize fast response Favor high speedRating models
QUALITY Prioritize high-quality output Favor high qualityRating models
COST Prioritize low cost Favor low-cost models

Processing Mode Reference

Mode Description When to Use
BASIC Simple, straightforward processing Quick tasks, simple questions
ADVANCED Complex reasoning required Multi-step tasks, analysis
DETAILED Comprehensive, thorough output Planning, detailed generation

Module Import Structure

The aicore system is organized across several module paths for clean separation of concerns:

Core Infrastructure Components:

  • The base connector interface lives at modules.aicore.aicoreBase and exports BaseConnectorAi
  • The global model registry singleton is imported from modules.aicore.aicoreModelRegistry as modelRegistry
  • The global model selector singleton is imported from modules.aicore.aicoreModelSelector as modelSelector

Data Model Definitions: All AI-related data models are centralized in modules.datamodels.datamodelAi, including:

  • AiModel: Complete model metadata and configuration
  • AiCallRequest and AiCallResponse: Request/response wrapper objects
  • AiCallOptions: Configuration options for AI operations
  • OperationTypeEnum, PriorityEnum, ProcessingModeEnum: Enumeration types for operation classification

Interface and Service Layers:

  • The AiObjects interface class is available at modules.interfaces.interfaceAiObjects
  • The high-level AiService class is located at modules.services.serviceAi.mainServiceAi

Most application code interacts with the service layer rather than importing core components directly, maintaining proper architectural separation.

Summary

The aicore module is the backbone of AI operations in the application, providing:

  • Abstraction: Single interface for multiple AI providers
  • Intelligence: Smart model selection and automatic failover
  • Flexibility: Plugin architecture for easy provider addition
  • Reliability: Caching, failover, and error handling
  • Performance: Context-aware chunking and optimization

It connects to serviceAi as the foundation layer, enabling high-level AI services to operate without knowledge of specific AI provider implementations. The entire system integrates seamlessly into the application through the service layer architecture.


Related Documentation: