1046 lines
64 KiB
Markdown
1046 lines
64 KiB
Markdown
# AI Core Component Documentation
|
||
|
||
## Overview
|
||
|
||
The `aicore` module is the **centralized AI infrastructure layer** that provides a **plugin-based architecture** for integrating multiple AI providers (OpenAI, Anthropic, Perplexity, Tavily) into the application. It acts as an abstraction layer between high-level AI services and specific AI provider APIs, enabling dynamic model discovery, intelligent model selection, and automatic failover.
|
||
|
||
**Key Responsibilities:**
|
||
- Dynamic discovery and registration of AI connectors (plugins)
|
||
- Model registry with unified model metadata
|
||
- Intelligent model selection based on operation type, context size, and optimization criteria
|
||
- Automatic failover between models
|
||
- Standardized interface for AI operations across all providers
|
||
|
||
## Architecture
|
||
|
||
### System Architecture Overview
|
||
|
||
```mermaid
|
||
graph TB
|
||
subgraph "Application Layer"
|
||
Routes[FastAPI Routes<br/>routeWorkflows.py<br/>routeChatPlayground.py]
|
||
end
|
||
|
||
subgraph "Service Layer"
|
||
AiService[AiService<br/>mainServiceAi.py]
|
||
Methods[callAiPlanning<br/>callAiDocuments<br/>callAiText]
|
||
AiService --> Methods
|
||
end
|
||
|
||
subgraph "Interface Layer"
|
||
AiObjects[AiObjects<br/>interfaceAiObjects.py]
|
||
CallHandler[call request<br/>Handles failover & model calls]
|
||
AiObjects --> CallHandler
|
||
end
|
||
|
||
subgraph "AI Core Layer"
|
||
Registry[ModelRegistry<br/>discoverConnectors<br/>registerConnector<br/>getAvailableModels]
|
||
Selector[ModelSelector<br/>selectModel<br/>getFailoverModelList<br/>scoring logic]
|
||
Base[BaseConnectorAi<br/>getModels<br/>getConnectorType<br/>getCachedModels]
|
||
|
||
Registry -.-> Selector
|
||
Selector -.-> Base
|
||
end
|
||
|
||
subgraph "Plugin Connectors"
|
||
OpenAI[aicorePluginOpenai]
|
||
Anthropic[aicorePluginAnthropic]
|
||
Perplexity[aicorePluginPerplexity]
|
||
Tavily[aicorePluginTavily]
|
||
end
|
||
|
||
subgraph "AI Provider APIs"
|
||
OpenAI_API[OpenAI API<br/>api.openai.com]
|
||
Anthropic_API[Anthropic API<br/>api.anthropic.com]
|
||
Perplexity_API[Perplexity API<br/>api.perplexity.ai]
|
||
Tavily_API[Tavily API<br/>api.tavily.com]
|
||
end
|
||
|
||
Routes --> AiService
|
||
AiService --> AiObjects
|
||
AiObjects --> Registry
|
||
AiObjects --> Selector
|
||
|
||
Base --> OpenAI
|
||
Base --> Anthropic
|
||
Base --> Perplexity
|
||
Base --> Tavily
|
||
|
||
OpenAI --> OpenAI_API
|
||
Anthropic --> Anthropic_API
|
||
Perplexity --> Perplexity_API
|
||
Tavily --> Tavily_API
|
||
|
||
style Routes fill:#e1f5ff
|
||
style AiService fill:#fff3e0
|
||
style AiObjects fill:#f3e5f5
|
||
style Registry fill:#e8f5e9
|
||
style Selector fill:#e8f5e9
|
||
style Base fill:#e8f5e9
|
||
style OpenAI fill:#fff9c4
|
||
style Anthropic fill:#fff9c4
|
||
style Perplexity fill:#fff9c4
|
||
style Tavily fill:#fff9c4
|
||
```
|
||
|
||
### Component Structure
|
||
|
||
The aicore module is organized into several key files:
|
||
|
||
- **aicoreBase.py**: Defines the abstract base class that all AI connectors must inherit from, establishing the contract for plugin implementations
|
||
- **aicoreModelRegistry.py**: Manages the centralized registry of all available AI models across all connectors
|
||
- **aicoreModelSelector.py**: Implements the intelligent model selection algorithm based on multiple criteria
|
||
- **aicorePlugin*.py files**: Individual connector implementations for each AI provider (OpenAI, Anthropic, Perplexity, Tavily, and potentially internal systems)
|
||
|
||
Each plugin file follows the naming convention `aicorePlugin<Provider>.py`, which enables the automatic discovery mechanism to find and register them at startup without requiring manual configuration.
|
||
|
||
### Core Components
|
||
|
||
#### 1. **BaseConnectorAi** (`aicoreBase.py`)
|
||
The abstract base class that establishes the contract for all AI connector implementations. This class ensures that every AI provider plugin implements a consistent interface, making the system extensible and maintainable.
|
||
|
||
**Core Responsibilities:**
|
||
|
||
The base connector defines several essential methods that every plugin must implement:
|
||
|
||
- **Model Discovery**: Each connector provides its list of available models through `getModels()`, which returns comprehensive metadata about each model including capabilities, costs, and performance characteristics
|
||
- **Connector Identification**: The `getConnectorType()` method returns a unique identifier string for the connector (such as "openai" or "anthropic"), used throughout the system for routing and logging
|
||
- **Cached Model Access**: The `getCachedModels()` method provides performance optimization by returning cached model metadata with automatic TTL (Time-To-Live) validation
|
||
- **Model Lookup**: Utility methods like `getModelByDisplayName()` enable quick retrieval of specific models by their unique identifiers
|
||
- **Cache Management**: The `clearCache()` method allows manual cache invalidation when model configurations need immediate refresh
|
||
|
||
**Critical Design Principle - Unique Display Names:**
|
||
|
||
The system enforces a strict uniqueness constraint on model display names across all connectors. While the `name` field (used for actual API calls) can be duplicated across different model instances (for example, "gpt-4o" might have multiple instances for different use cases), the `displayName` must be globally unique. This serves as the primary key in the model registry and prevents configuration conflicts. Examples of unique display names include "OpenAI GPT-4o", "OpenAI GPT-4o Instance Vision", and "Anthropic Claude 3 Opus".
|
||
|
||
**Performance Optimization Through Caching:**
|
||
|
||
To minimize unnecessary operations, the base connector implements a sophisticated caching mechanism with a 5-minute TTL. When `getCachedModels()` is called, the system checks if cached data exists and if the last update timestamp is within the 300-second window. If the cache is still valid, it returns the cached models immediately, avoiding the overhead of regenerating model metadata. If the cache has expired, it automatically refreshes by calling `getModels()` and updates both the cache and timestamp. This approach significantly reduces computational overhead during high-frequency operations while ensuring data freshness.
|
||
|
||
#### 2. **ModelRegistry** (`aicoreModelRegistry.py`)
|
||
The centralized registry serves as the single source of truth for all available AI models in the system. It acts as a dynamic inventory management system, automatically discovering, validating, and organizing models from all registered connectors.
|
||
|
||
**Automatic Plugin Discovery:**
|
||
|
||
The registry implements a sophisticated auto-discovery mechanism that scans the aicore directory for any files matching the pattern `aicorePlugin*.py`. This pattern-based discovery enables zero-configuration extensibility - developers can add new AI providers simply by creating a properly named file, and the system automatically detects and integrates it during startup. The discovery process imports each plugin module, inspects its classes to find those inheriting from BaseConnectorAi, and instantiates them for registration.
|
||
|
||
**Dynamic Registration and Validation:**
|
||
|
||
When a connector is registered through `registerConnector()`, the registry performs critical validation steps. It calls the connector's `getCachedModels()` method to retrieve all available models, then validates that each model's `displayName` is unique across the entire registry. If a duplicate is detected, the registration fails with a detailed error message identifying both the existing and conflicting model configurations. This strict validation prevents configuration errors that could lead to unpredictable model selection behavior.
|
||
|
||
**Intelligent Refresh Mechanism:**
|
||
|
||
The registry maintains model freshness through a dual-refresh strategy. First, it implements automatic periodic refresh with a 5-minute interval - when any query method is called, the system checks if the last refresh timestamp exceeds this threshold and triggers an automatic update if needed. Second, it provides a `refreshModels()` method with a force parameter, allowing manual refresh operations that bypass the TTL check. This is particularly useful during development or when connector configurations change dynamically.
|
||
|
||
**Comprehensive Query Interface:**
|
||
|
||
The registry exposes a rich query interface for model retrieval:
|
||
|
||
- **Direct Lookup**: `getModel(displayName)` provides O(1) access to specific models using their unique identifier
|
||
- **Complete Inventory**: `getModels()` returns the full catalog of registered models
|
||
- **Connector Filtering**: `getModelsByConnector(connectorType)` enables retrieval of all models from a specific provider
|
||
- **Availability Filtering**: `getAvailableModels()` returns only models currently marked as available, filtering out any disabled or problematic models
|
||
- **Reverse Lookup**: `getConnectorForModel(displayName)` retrieves the connector instance responsible for a specific model, enabling direct connector interaction
|
||
- **Statistical Analysis**: `getModelStats()` provides aggregate metrics including model counts by connector, capability, and priority
|
||
|
||
**Singleton Pattern:**
|
||
|
||
The registry is implemented as a global singleton instance (modelRegistry) that can be imported and used throughout the application, ensuring consistent model access and preventing duplicate registries.
|
||
|
||
#### 3. **ModelSelector** (`aicoreModelSelector.py`)
|
||
The intelligent model selection engine implements a sophisticated scoring algorithm that evaluates available models against multiple criteria to determine the optimal choice for each AI operation. Rather than using hard-coded rules or simple priority lists, the selector employs a weighted scoring system that considers operation compatibility, resource constraints, and performance preferences to create a ranked failover list.
|
||
|
||
**Selection Algorithm:**
|
||
|
||
```mermaid
|
||
flowchart TD
|
||
Start[AI Call Request] --> GetModels[Get Available Models<br/>from Registry]
|
||
GetModels --> OpFilter[Filter by Operation Type<br/>MUST support requested operation]
|
||
OpFilter --> SizeFilter[Filter by Prompt Size<br/>Prompt must fit within 80% of context]
|
||
SizeFilter --> Scoring[Calculate Score for Each Model]
|
||
|
||
Scoring --> Score1[Operation Type Rating × 1000<br/>PRIMARY sorting criteria]
|
||
Scoring --> Score2[Size Rating<br/>How well prompt+context fits]
|
||
Scoring --> Score3[Processing Mode Rating<br/>Compatibility score]
|
||
Scoring --> Score4[Priority Rating<br/>Speed/Quality/Cost preference]
|
||
|
||
Score1 --> Combine[Combine All Scores]
|
||
Score2 --> Combine
|
||
Score3 --> Combine
|
||
Score4 --> Combine
|
||
|
||
Combine --> Sort[Sort by Total Score<br/>Descending]
|
||
Sort --> Failover[Create Failover List]
|
||
Failover --> Return[Return Best Model<br/>+ Fallback Models]
|
||
|
||
style Start fill:#e1f5ff
|
||
style OpFilter fill:#fff3e0
|
||
style SizeFilter fill:#fff3e0
|
||
style Scoring fill:#f3e5f5
|
||
style Sort fill:#e8f5e9
|
||
style Return fill:#c8e6c9
|
||
```
|
||
|
||
**Detailed Algorithm Process:**
|
||
|
||
**Phase 1: Operation Type Filtering (Mandatory Constraint)**
|
||
|
||
The first filtering phase is absolute - a model must explicitly support the requested operation type to be considered. Each model in the registry declares its supported operations through an `operationTypes` list, where each operation (such as PLAN, DATA_ANALYSE, DATA_GENERATE, IMAGE_ANALYSE) is associated with a performance rating from 1-10. Models lacking the required operation type are immediately excluded from consideration, regardless of their other characteristics. This ensures that specialized operations like image analysis are only routed to vision-capable models, and web search operations are directed to appropriate connectors.
|
||
|
||
**Phase 2: Context Size Validation (Resource Constraint)**
|
||
|
||
After operation filtering, the selector validates that each remaining model can physically accommodate the input. The system calculates the approximate token count for both the prompt and context (using a 4-byte-per-token approximation), then compares this against 80% of each model's declared context length. This 80% threshold provides a safety margin for message formatting overhead, system prompts, and output token reservation. Models with insufficient context capacity are filtered out, preventing runtime failures due to context length violations. For models with zero context length (indicating unlimited capacity), this check is bypassed.
|
||
|
||
**Phase 3: Multi-Factor Scoring (Quality Assessment)**
|
||
|
||
Each model that passes both mandatory filters receives a composite score calculated from four weighted components:
|
||
|
||
- **Operation Type Rating (Primary Factor)**: Multiplied by 1000 to establish it as the dominant sorting criterion. A model rated 9/10 for DATA_ANALYSE will score 9000 points from this factor alone, while a model rated 7/10 scores only 7000. This massive weighting ensures that operation-specific optimization takes precedence over other factors.
|
||
|
||
- **Size Efficiency Rating**: Measures how efficiently the model utilizes its context window. If the prompt+context fits comfortably (total size ≤ 80% of capacity), the rating equals (actual size / maximum allowed size), rewarding larger models for handling substantial content. If the content exceeds the limit (shouldn't happen after filtering, but serves as safety), the rating inverts to (maximum / actual), penalizing undersized models.
|
||
|
||
- **Processing Mode Compatibility**: Evaluates alignment between the model's processing mode (BASIC, ADVANCED, DETAILED) and the requested mode. Perfect matches score 1.0, while compatible mismatches receive fractional scores (e.g., 0.5 for ADVANCED model handling BASIC request). This allows flexible matching while preferring mode-appropriate models.
|
||
|
||
- **Priority Optimization**: Applies user preference for speed, quality, or cost efficiency. For SPEED priority, models with high `speedRating` values score better. For QUALITY, `qualityRating` dominates. For COST, the system inverts cost metrics to favor inexpensive models while adding weighted bonuses for speed and quality. BALANCED priority treats all factors equally.
|
||
|
||
**Phase 4: Ranking and Failover List Generation**
|
||
|
||
After scoring, models are sorted in descending order by their composite scores. The resulting list represents an optimal failover chain - the first model is the best match for the specific request, while subsequent models serve as progressively less optimal but still viable alternatives. This ranked list is returned for use by the call handler, which attempts models in order until one succeeds.
|
||
|
||
**Primary Methods:**
|
||
|
||
The selector exposes two main methods: `selectModel()` returns only the top-ranked model (index 0 of the failover list), while `getFailoverModelList()` returns the complete ranked list for failover handling. Both methods accept the same parameters: the prompt text, context data, AI call options, and the list of available models.
|
||
|
||
**Global Singleton:**
|
||
|
||
Like the registry, the selector is implemented as a global singleton (modelSelector) for consistent access throughout the application.
|
||
|
||
#### 4. **Plugin Connectors** (`aicorePlugin*.py`)
|
||
Each plugin connector represents a concrete implementation of the BaseConnectorAi interface, tailored to a specific AI provider's API specifications and capabilities. These plugins serve as translation layers between the system's standardized interface and the provider-specific API requirements.
|
||
|
||
**Architectural Pattern:**
|
||
|
||
Each connector follows a consistent architectural pattern with four main components:
|
||
|
||
**Initialization and Configuration:**
|
||
The constructor loads provider-specific configuration from the application's environment settings, including API keys, endpoint URLs, and any provider-specific parameters. It also initializes an HTTP client (typically using httpx) with appropriate timeouts, retry logic, and authentication headers. This separation of configuration from code enables easy deployment across different environments without code changes.
|
||
|
||
**Connector Identification:**
|
||
The `getConnectorType()` method returns a simple string identifier for the connector, such as "openai", "anthropic", "perplexity", or "tavily". This identifier is used throughout the system for logging, routing, and model attribution. It must be unique across all connectors and is stored in every model's metadata.
|
||
|
||
**Model Catalog Definition:**
|
||
The `getModels()` method returns a comprehensive list of AiModel instances, each representing a distinct AI model or model configuration. Each model entry includes:
|
||
|
||
- **Identity**: Unique `displayName` (e.g., "OpenAI GPT-4o") and API `name` (e.g., "gpt-4o")
|
||
- **Technical Specifications**: Context window size in tokens, maximum output tokens, default temperature
|
||
- **Economic Metrics**: Cost per 1000 input tokens and output tokens, enabling accurate cost tracking
|
||
- **Performance Characteristics**: Speed rating (1-10) indicating response time, quality rating (1-10) for output quality
|
||
- **Operational Capabilities**: List of supported operation types with performance ratings for each
|
||
- **Execution Reference**: A callable reference (`functionCall`) pointing to the method that handles API communication
|
||
- **Strategic Attributes**: Priority classification (SPEED, QUALITY, COST, BALANCED) and processing mode (BASIC, ADVANCED, DETAILED)
|
||
|
||
**API Communication Implementation:**
|
||
|
||
Connectors implement one or more call methods (such as `callAiBasic()`, `callAiImage()`, or specialized methods) that handle the actual communication with the AI provider's API. These methods:
|
||
|
||
- Accept standardized `AiModelCall` objects containing messages, model reference, and options
|
||
- Transform the standardized request format into the provider's specific API format (different providers use varying JSON schemas for requests)
|
||
- Execute HTTP requests with appropriate error handling, timeouts, and retry logic
|
||
- Parse provider-specific response formats back into standardized `AiModelResponse` objects
|
||
- Calculate actual costs based on token usage reported by the provider
|
||
- Handle provider-specific error codes and translate them into meaningful exceptions
|
||
|
||
**Provider-Specific Adaptations:**
|
||
|
||
Each connector adapts to its provider's unique characteristics:
|
||
|
||
- **OpenAI Connectors**: Support both text completion and vision capabilities, handle rate limiting, manage model versioning
|
||
- **Anthropic Connectors**: Implement Claude-specific message formatting, handle thinking tokens, manage conversation context
|
||
- **Perplexity Connectors**: Integrate web search capabilities, handle citation extraction, manage search-enhanced responses
|
||
- **Tavily Connectors**: Implement web crawling protocols, handle structured data extraction, manage crawl depth and scope
|
||
|
||
## Connection to serviceAi
|
||
|
||
The `aicore` module is the **foundation layer** that `serviceAi` (AI Service) builds upon. Here's how they connect:
|
||
|
||
### Integration Flow
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant App as Application<br/>(app.py)
|
||
participant Service as Service Layer<br/>(mainServiceAi.py)
|
||
participant Interface as Interface Layer<br/>(interfaceAiObjects.py)
|
||
participant Core as AI Core<br/>(aicore/)
|
||
participant Provider as AI Provider APIs
|
||
|
||
App->>Service: HTTP Request
|
||
Service->>Interface: callAiDocuments/Planning
|
||
Interface->>Core: AiCallRequest
|
||
Core->>Core: Model Selection
|
||
Core->>Provider: API Call
|
||
Provider-->>Core: API Response
|
||
Core-->>Interface: AiCallResponse
|
||
Interface-->>Service: Processed Result
|
||
Service-->>App: HTTP Response
|
||
```
|
||
|
||
### Initialization Sequence
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant App as app.py
|
||
participant Lifecycle as featuresLifecycle
|
||
participant Service as AiService
|
||
participant AiObjects as AiObjects
|
||
participant Registry as ModelRegistry
|
||
participant Plugins as Plugin Connectors
|
||
|
||
App->>Lifecycle: lifespan startup
|
||
Lifecycle->>Lifecycle: start()
|
||
Lifecycle->>Service: create AiService
|
||
Service->>AiObjects: AiObjects.create()
|
||
|
||
AiObjects->>AiObjects: _discoverAndRegisterConnectors()
|
||
AiObjects->>Registry: discoverConnectors()
|
||
|
||
Registry->>Registry: Scan aicore folder<br/>for aicorePlugin*.py
|
||
Registry->>Plugins: Import & instantiate connectors
|
||
|
||
loop For each discovered connector
|
||
AiObjects->>Registry: registerConnector(connector)
|
||
Registry->>Plugins: connector.getModels()
|
||
Plugins-->>Registry: List[AiModel]
|
||
Registry->>Registry: Validate displayName uniqueness
|
||
Registry->>Registry: Store models with displayName as key
|
||
end
|
||
|
||
Registry-->>AiObjects: Registration complete
|
||
AiObjects-->>Service: Initialized with all models
|
||
Service-->>Lifecycle: AiService ready
|
||
Lifecycle-->>App: Startup complete
|
||
|
||
Note over Registry: Models cached for 5 minutes<br/>with auto-refresh
|
||
```
|
||
|
||
### Service-to-Core Communication
|
||
|
||
The communication between the service layer and aicore follows a well-defined request-response pattern with multiple abstraction layers, each serving a specific purpose in the overall architecture.
|
||
|
||
**High-Level Service Operations:**
|
||
|
||
The AiService class (in `mainServiceAi.py`) provides domain-specific methods that application features and workflows can invoke. These methods abstract away the complexity of AI operations, presenting simple interfaces like `callAiPlanning()` for task planning and `callAiDocuments()` for document processing.
|
||
|
||
When `callAiPlanning()` is invoked, it handles prompt construction by integrating placeholders and building a complete prompt string. It then creates an AiCallRequest object configured specifically for planning operations - with operation type set to PLAN, priority set to QUALITY (since planning requires accurate reasoning), and processing mode set to DETAILED (to ensure comprehensive analysis). This request is passed to `aiObjects.call()`, initiating the core AI processing chain.
|
||
|
||
The `callAiDocuments()` method follows a similar pattern but with more flexibility - it accepts custom options, handles document attachments, and can process various output formats. It manages document extraction, prompt building with continuation contexts, and result formatting, while delegating the actual AI communication to the aicore layer.
|
||
|
||
**Interface Layer Orchestration:**
|
||
|
||
The AiObjects class (in `interfaceAiObjects.py`) serves as the orchestration layer, coordinating between the service layer's high-level requests and the aicore's model selection and execution capabilities. When its `call()` method receives an AiCallRequest, it follows a three-phase process:
|
||
|
||
**Phase 1 - Model Selection:**
|
||
The interface queries the modelRegistry to retrieve all currently available models. It then invokes the modelSelector's `getFailoverModelList()` method, passing the request's prompt, context, and options. The selector returns a prioritized list of suitable models, ranked from most to least optimal for the specific request characteristics.
|
||
|
||
**Phase 2 - Failover Execution:**
|
||
The interface iterates through the failover list, attempting each model in sequence. For each attempt, it calls the internal `_callWithModel()` method, which constructs a standardized AiModelCall object and invokes the model's `functionCall` reference. This reference points to the connector's API communication method, which executes the actual HTTP request to the AI provider.
|
||
|
||
If the model call succeeds, the interface immediately returns the AiCallResponse to the service layer, completing the request. If an exception occurs (due to API errors, rate limits, or other issues), the interface logs the error with detailed context and proceeds to the next model in the failover list.
|
||
|
||
**Phase 3 - Completion or Failure:**
|
||
If any model succeeds, the operation completes successfully. If all models in the failover list fail (a rare but possible scenario during API outages or configuration errors), the interface returns an AiCallResponse with an error message and error count, allowing the service layer to handle the failure gracefully.
|
||
|
||
**Cross-Cutting Concerns:**
|
||
|
||
Throughout this communication flow, several cross-cutting concerns are handled automatically:
|
||
|
||
- **Metrics Collection**: Every AI call records timing, token usage, costs, and error counts for monitoring and optimization
|
||
- **Progress Tracking**: Long-running operations emit progress updates through callbacks for user feedback
|
||
- **Content Chunking**: Large content that exceeds model context limits is automatically chunked and processed in segments
|
||
- **Token Management**: The system calculates token usage estimates and reserves appropriate context space for prompts, system messages, and expected outputs
|
||
|
||
### Key Integration Points
|
||
|
||
1. **Model Selection**: `serviceAi` delegates to `modelSelector` for choosing the right model
|
||
2. **Failover Handling**: `AiObjects.call()` automatically tries multiple models if one fails
|
||
3. **Operation Types**: `serviceAi` defines operation types (PLAN, DATA_ANALYSE, etc.) that `aicore` uses for selection
|
||
4. **Standardized Interface**: All AI calls go through `AiCallRequest`/`AiCallResponse` regardless of provider
|
||
|
||
## Connection to the Application
|
||
|
||
### Application Flow
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant User
|
||
participant Route as FastAPI Route
|
||
participant Workflow as Workflow/Feature
|
||
participant Service as AiService
|
||
participant Objects as AiObjects
|
||
participant Registry as ModelRegistry
|
||
participant Selector as ModelSelector
|
||
participant Plugin as Plugin Connector
|
||
participant API as AI Provider API
|
||
|
||
User->>Route: HTTP Request
|
||
Route->>Workflow: Call workflow
|
||
Workflow->>Service: callAiDocuments()
|
||
Service->>Objects: aiObjects.call(request)
|
||
Objects->>Registry: getAvailableModels()
|
||
Registry-->>Objects: List of models
|
||
Objects->>Selector: getFailoverModelList()
|
||
Selector-->>Objects: Sorted model list
|
||
|
||
loop Try each model until success
|
||
Objects->>Plugin: model.functionCall()
|
||
Plugin->>API: HTTP Request
|
||
|
||
alt Success
|
||
API-->>Plugin: Response
|
||
Plugin-->>Objects: AiModelResponse
|
||
Objects-->>Service: AiCallResponse
|
||
else Error
|
||
API-->>Plugin: Error
|
||
Plugin-->>Objects: Exception
|
||
Objects->>Objects: Try next model
|
||
end
|
||
end
|
||
|
||
Service-->>Workflow: Result
|
||
Workflow-->>Route: Response
|
||
Route-->>User: HTTP Response
|
||
|
||
Note over Objects,Plugin: Automatic failover<br/>tries next best model
|
||
```
|
||
|
||
### Example: Chat Workflow
|
||
|
||
**User Request**: "Analyze this document and extract key information"
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant User
|
||
participant Route as Route Handler<br/>routeChatPlayground.py
|
||
participant Workflow as Workflow Layer
|
||
participant AiService as AiService<br/>mainServiceAi.py
|
||
participant AiObjects as AiObjects<br/>interfaceAiObjects.py
|
||
participant Registry as ModelRegistry
|
||
participant Selector as ModelSelector
|
||
participant Connector as aicorePluginOpenai.py
|
||
participant OpenAI as OpenAI API
|
||
|
||
User->>Route: POST /chat/message<br/>"Analyze document"
|
||
Route->>Workflow: featureWorkflow.run(request)
|
||
|
||
Workflow->>AiService: callAiDocuments()<br/>operationType=DATA_EXTRACT
|
||
Note over AiService: Build prompt with placeholders
|
||
|
||
AiService->>AiObjects: aiObjects.call(request)
|
||
AiObjects->>Registry: getAvailableModels()
|
||
Registry-->>AiObjects: List of models
|
||
|
||
AiObjects->>Selector: getFailoverModelList()
|
||
Note over Selector: Filter by DATA_EXTRACT<br/>Score and sort models
|
||
Selector-->>AiObjects: [GPT-3.5, GPT-4, ...]
|
||
|
||
AiObjects->>Connector: model.functionCall(AiModelCall)
|
||
Note over Connector: Format for OpenAI API
|
||
|
||
Connector->>OpenAI: HTTP POST with messages
|
||
OpenAI-->>Connector: JSON response
|
||
|
||
Connector-->>AiObjects: AiModelResponse
|
||
AiObjects-->>AiService: AiCallResponse
|
||
Note over AiService: Handle looping if needed
|
||
|
||
AiService-->>Workflow: Extracted content
|
||
Workflow-->>Route: Result with documents
|
||
Route-->>User: HTTP 200 + JSON response
|
||
|
||
Note over User,OpenAI: Full request/response cycle<br/>with automatic failover
|
||
```
|
||
|
||
**Detailed Flow Breakdown:**
|
||
|
||
**Step 1: HTTP Request Reception**
|
||
When a user sends a chat message through the frontend, it arrives as an HTTP POST request to the `/chat/message` endpoint defined in `routeChatPlayground.py`. The route handler receives a ChatMessageRequest containing the user's message, any attached documents, and conversation context. The handler immediately delegates to the workflow system by calling `featureWorkflow.run(request)`, which orchestrates the entire chat processing pipeline.
|
||
|
||
**Step 2: Workflow Orchestration**
|
||
The workflow layer (living between routes and services) analyzes the user's request to determine the appropriate processing strategy. For a document analysis request, it identifies that document extraction is needed and invokes `serviceCenter.ai.callAiDocuments()`. This call includes the constructed prompt ("Extract key information from documents"), the attached chat documents, and explicitly configured options specifying DATA_EXTRACT as the operation type - signaling that this is an information extraction task rather than generation or analysis.
|
||
|
||
**Step 3: Service Layer Processing**
|
||
The AiService receives the document processing request and performs several preparatory operations. It builds the complete prompt by replacing any placeholder markers with actual content (such as document titles, user context, or system instructions). It validates the documents and converts them into the appropriate format for AI processing. For lengthy responses that might span multiple AI generations, it sets up a looping mechanism that can handle continuation contexts. Finally, it creates an AiCallRequest object and passes it to `aiObjects.call()`, transitioning into the core AI layer.
|
||
|
||
**Step 4: Intelligent Model Selection**
|
||
The AiObjects interface queries the modelRegistry to retrieve all currently available and healthy models. It then invokes the modelSelector with the full request context - passing the prompt text, any additional context, and the configured options. The selector executes its multi-phase filtering and scoring algorithm, ultimately returning a prioritized failover list. For a DATA_EXTRACT operation, this list typically starts with fast, cost-efficient models (like GPT-3.5 Turbo or Claude Haiku) since extraction doesn't require the highest reasoning capabilities.
|
||
|
||
**Step 5: Model Execution with Failover**
|
||
AiObjects begins iterating through the failover list, attempting each model in sequence. For the first model (assume GPT-3.5 Turbo from OpenAI), it constructs an AiModelCall object containing the formatted messages and invokes the model's registered `functionCall`, which points to the OpenAI connector's API method. The connector transforms the standardized request into OpenAI's specific JSON format, adds authentication headers, and sends an HTTP POST request to `api.openai.com/v1/chat/completions`.
|
||
|
||
If the OpenAI API responds successfully, the connector parses the JSON response, extracts the generated text, calculates costs based on reported token usage, and wraps everything in an AiModelResponse object. This response flows back through AiObjects, which converts it to an AiCallResponse and returns it to the service layer.
|
||
|
||
If the API call fails (network timeout, rate limit, API error), the connector throws an exception. AiObjects catches this exception, logs detailed error information including the model name and error type, and immediately proceeds to the next model in the failover list. This process continues until either a model succeeds or the entire list is exhausted.
|
||
|
||
**Step 6: Response Assembly and Delivery**
|
||
Once the AiService receives a successful AiCallResponse, it processes the content according to the request specifications. For document extraction, this might involve parsing structured JSON from the AI's response, validating the extracted data against expected schemas, and formatting it for frontend consumption. The processed result flows back up through the workflow layer, which adds any workflow-specific metadata (execution time, step logs), and finally reaches the route handler. The handler constructs an HTTP response with appropriate status codes and headers, delivering the extracted information back to the waiting frontend client.
|
||
|
||
**Error Handling Throughout:**
|
||
At every step, comprehensive error handling ensures graceful degradation. If document processing fails, the workflow might retry with different parameters or return a helpful error message. If all AI models fail, the system returns a structured error response rather than crashing. Each failure point is logged with sufficient context for debugging and monitoring.
|
||
|
||
### Configuration
|
||
|
||
**Environment-Based Secrets Management:**
|
||
|
||
The aicore system loads all sensitive configuration through the application's central `APP_CONFIG` system, which reads from environment files (env_dev.env, env_int.env, env_prod.env) based on the deployment environment. Each AI provider connector requires its API key stored under a standardized naming convention: `Connector_Ai<Provider>_API_SECRET`. For example, the OpenAI connector looks for `Connector_AiOpenai_API_SECRET`, while Anthropic uses `Connector_AiAnthropic_API_SECRET`. This convention enables consistent configuration management across all providers and environments.
|
||
|
||
Additional provider-specific settings follow similar naming patterns with descriptive suffixes. The SECRET suffix indicates that these values contain sensitive information and should never be committed to version control or exposed in logs. Configuration loading happens during connector initialization, allowing different API keys per environment without code changes.
|
||
|
||
**Plugin-Level Model Configuration:**
|
||
|
||
Each plugin file contains hard-coded model definitions specifying technical and economic characteristics. These configurations include:
|
||
|
||
- **Capacity Parameters**: Context window sizes (in tokens) define maximum input lengths, while max token settings limit output generation length
|
||
- **Economic Metrics**: Input and output costs per 1000 tokens enable accurate cost tracking and budget management
|
||
- **Performance Characteristics**: Speed ratings (1-10 scale) indicate typical response time, while quality ratings reflect output sophistication and accuracy
|
||
- **Operational Capabilities**: Operation type ratings specify which tasks each model handles well, with ratings from 1-10 for supported operations
|
||
- **Strategic Classifications**: Priority tags (SPEED, QUALITY, COST, BALANCED) and processing mode designations (BASIC, ADVANCED, DETAILED) guide selection algorithms
|
||
|
||
These plugin-level configurations represent the static characteristics of models and change only when model capabilities are updated or new models are added. They're versioned with the code rather than stored in environment variables, since they're not environment-specific or sensitive.
|
||
|
||
## Key Features
|
||
|
||
### 1. **Dynamic Plugin Architecture**
|
||
|
||
```mermaid
|
||
graph LR
|
||
subgraph "Auto-Discovery Process"
|
||
Scan[Scan aicore folder<br/>for aicorePlugin*.py]
|
||
Import[Import module dynamically]
|
||
Find[Find BaseConnectorAi<br/>subclasses]
|
||
Instantiate[Instantiate connector]
|
||
Register[Register in ModelRegistry]
|
||
end
|
||
|
||
subgraph "Plugin Files"
|
||
P1[aicorePluginOpenai.py]
|
||
P2[aicorePluginAnthropic.py]
|
||
P3[aicorePluginPerplexity.py]
|
||
P4[aicorePluginTavily.py]
|
||
P5[aicorePlugin*.py<br/>Add new plugins here]
|
||
end
|
||
|
||
Scan --> P1
|
||
Scan --> P2
|
||
Scan --> P3
|
||
Scan --> P4
|
||
Scan --> P5
|
||
|
||
P1 --> Import
|
||
P2 --> Import
|
||
P3 --> Import
|
||
P4 --> Import
|
||
P5 --> Import
|
||
|
||
Import --> Find
|
||
Find --> Instantiate
|
||
Instantiate --> Register
|
||
|
||
Register --> Models[All Models Available<br/>in ModelRegistry]
|
||
|
||
style Scan fill:#e1f5ff
|
||
style Register fill:#c8e6c9
|
||
style P5 fill:#fff9c4
|
||
style Models fill:#e8f5e9
|
||
```
|
||
|
||
**Key Benefits:**
|
||
- New AI providers can be added by creating `aicorePlugin*.py` files
|
||
- No code changes needed in core logic
|
||
- Automatic discovery and registration
|
||
|
||
### 2. **Intelligent Model Selection**
|
||
|
||
The model selection engine goes far beyond simple rule-based routing by implementing a sophisticated multi-criteria decision system:
|
||
|
||
**Holistic Evaluation:**
|
||
Rather than selecting models based on a single factor, the selector considers operation type compatibility (can this model handle planning vs. extraction?), resource constraints (will the prompt fit?), performance preferences (does the user prioritize speed or quality?), and cost implications. Each factor contributes to a weighted score that reflects the model's overall suitability.
|
||
|
||
**Context-Aware Decisions:**
|
||
The selector analyzes not just what operation is requested, but also the size and complexity of the input. A simple data extraction from a small document might route to a fast, economical model like GPT-3.5 Turbo, while complex multi-document analysis with a large prompt routes to more capable models like GPT-4 or Claude Opus. This context-awareness optimizes the trade-off between cost and capability.
|
||
|
||
**Ranked Failover Lists:**
|
||
Instead of returning a single "best" model, the selector produces a complete ranked list representing a spectrum from optimal to acceptable. This ranked list serves as a failover chain - if the top model fails due to rate limits or transient errors, the system immediately tries the second-ranked model without user intervention or workflow delays. This approach significantly improves system reliability and reduces user-facing errors.
|
||
|
||
### 3. **Automatic Failover**
|
||
|
||
```mermaid
|
||
flowchart TD
|
||
Start[AI Call Request] --> GetList[Get Failover Model List<br/>Sorted by Score]
|
||
GetList --> Loop{Models<br/>Available?}
|
||
|
||
Loop -->|Yes| Try[Try Model #N]
|
||
Try --> Call[Call model.functionCall]
|
||
|
||
Call --> Success{Success?}
|
||
Success -->|Yes| Return[Return Response]
|
||
Success -->|No| Log[Log Error with Details]
|
||
|
||
Log --> More{More Models<br/>in List?}
|
||
More -->|Yes| Next[Try Next Model]
|
||
Next --> Loop
|
||
More -->|No| Fail[All Models Failed]
|
||
|
||
Loop -->|No| Error[Return Error Response]
|
||
Fail --> Error
|
||
Return --> End[Response to Caller]
|
||
Error --> End
|
||
|
||
style Start fill:#e1f5ff
|
||
style Try fill:#fff3e0
|
||
style Success fill:#f3e5f5
|
||
style Return fill:#c8e6c9
|
||
style Error fill:#ffcdd2
|
||
style Next fill:#fff9c4
|
||
```
|
||
|
||
**Key Benefits:**
|
||
- If primary model fails, automatically tries next best
|
||
- Logs each attempt with detailed error information
|
||
- Ensures high availability of AI operations
|
||
- No manual intervention required
|
||
|
||
### 4. **Model Caching**
|
||
|
||
```mermaid
|
||
stateDiagram-v2
|
||
[*] --> Empty: System Start
|
||
Empty --> Loading: First Request
|
||
Loading --> Cached: getModels() called
|
||
Cached --> Valid: Check TTL
|
||
Valid --> Cached: TTL < 5 min
|
||
Valid --> Expired: TTL >= 5 min
|
||
Expired --> Loading: Refresh
|
||
Loading --> Cached: Cache Updated
|
||
Cached --> [*]: Return Models
|
||
|
||
note right of Cached
|
||
Models cached for 5 minutes
|
||
Reduces API calls
|
||
Improves performance
|
||
end note
|
||
|
||
note right of Loading
|
||
Calls connector.getModels()
|
||
Updates _last_cache_update
|
||
Stores in _models_cache
|
||
end note
|
||
```
|
||
|
||
**Key Benefits:**
|
||
- 5-minute TTL cache for model metadata
|
||
- Reduces repeated API calls
|
||
- Improves performance
|
||
- Manual cache clearing available via `clearCache()`
|
||
|
||
### 5. **Unified Interface**
|
||
|
||
One of the aicore system's most powerful design principles is its provider-agnostic abstraction layer:
|
||
|
||
**Universal Request Format:**
|
||
Regardless of whether the eventual API call goes to OpenAI, Anthropic, Perplexity, or any other provider, the requesting code always uses the same AiCallRequest structure. This insulates application code from the complexity and variability of different provider APIs. Developers can write workflow logic once, and the system handles all provider-specific transformations transparently.
|
||
|
||
**Standardized Response Structure:**
|
||
Every AI operation returns an AiCallResponse object with the same structure and semantics, whether it came from GPT-4, Claude, or a specialized search model. This consistency simplifies response handling code - no need for provider-specific parsing logic or conditional handling based on which model was used.
|
||
|
||
**Consistent Error Semantics:**
|
||
Different AI providers report errors in vastly different formats - OpenAI uses different status codes and error structures than Anthropic, which differs from Perplexity. The aicore connectors translate all these provider-specific error formats into consistent error responses with standardized error counts and messages. This enables unified error handling logic throughout the application.
|
||
|
||
**Normalized Metrics:**
|
||
Cost calculations, timing measurements, and token usage reporting follow the same format regardless of provider. This enables apples-to-apples comparisons of different models' performance and economics, facilitating data-driven decisions about model selection strategies.
|
||
|
||
### 6. **Operation Type System**
|
||
|
||
The operation type taxonomy provides semantic categorization of AI tasks, enabling intelligent routing and specialized model selection:
|
||
|
||
**Task-Based Classification:**
|
||
Rather than selecting models based on generic "intelligence" levels, the system classifies each request by what it's trying to accomplish. This task-based approach recognizes that different models excel at different types of operations - a model optimized for rapid extraction might not be ideal for deep analytical reasoning, even if both are "capable" in an abstract sense.
|
||
|
||
**Operation Type Catalog:**
|
||
|
||
- **PLAN**: Strategic reasoning operations including task decomposition, action sequencing, and decision planning. These operations require strong logical reasoning and the ability to consider multiple factors simultaneously. Typically routed to high-capability models like GPT-4 or Claude Opus.
|
||
|
||
- **DATA_ANALYSE**: Analytical operations that examine data to identify patterns, draw insights, or make assessments. Requires good comprehension and reasoning but not necessarily creative generation. Often uses balanced models that provide good analysis without premium costs.
|
||
|
||
- **DATA_GENERATE**: Creative content generation including report writing, document creation, and structured output generation. Emphasizes coherent, well-structured output over analytical depth. Can often use mid-tier models effectively.
|
||
|
||
- **DATA_EXTRACT**: Information extraction and parsing operations that pull structured data from unstructured sources. Speed and accuracy matter more than sophisticated reasoning. Frequently routed to fast, economical models like GPT-3.5 Turbo or Claude Haiku.
|
||
|
||
- **IMAGE_ANALYSE**: Vision operations including image understanding, OCR, visual question answering, and scene description. Requires specialized vision-capable models with multimodal understanding. Automatically routes to GPT-4 Vision, Claude Vision, or similar models.
|
||
|
||
- **IMAGE_GENERATE**: Image creation and generation operations. Routes to specialized generative models like DALL-E or Stable Diffusion connectors.
|
||
|
||
- **WEB_SEARCH**: Real-time web search operations that query current information. Routes to search-specialized connectors like Perplexity that integrate web search APIs.
|
||
|
||
- **WEB_CRAWL**: Web content extraction and crawling operations. Routes to specialized web crawling connectors like Tavily that handle website traversal and content extraction.
|
||
|
||
**Performance Rating System:**
|
||
Each model declares not just which operations it supports, but how well it performs each operation on a 1-10 scale. A model might rate 9/10 for DATA_ANALYSE but only 6/10 for DATA_GENERATE, reflecting its strengths in analytical over creative tasks. These ratings form the primary sorting criterion in model selection, ensuring task-appropriate routing.
|
||
|
||
### 7. **Content-Aware Chunking**
|
||
|
||
When content exceeds a model's context capacity, the system employs sophisticated chunking strategies rather than simply failing:
|
||
|
||
**Model-Specific Chunk Sizing:**
|
||
Chunking decisions are based on each model's specific capabilities rather than using universal chunk sizes. A model with a 128K token context window receives much larger chunks than one with a 16K limit. The system calculates optimal chunk sizes by considering the model's total context length, subtracting reserved space for prompts and system messages, and applying a safety margin (typically 70-80% utilization).
|
||
|
||
**Comprehensive Token Accounting:**
|
||
Naive chunking might only consider content size, but the aicore system accounts for all token consumers: the user prompt (which repeats with each chunk), system message overhead (message formatting and instructions), output token reservation (space the model needs for its response), and protocol overhead (JSON structure and metadata). This comprehensive accounting prevents context overflow errors during generation.
|
||
|
||
**Intelligent Result Merging:**
|
||
After processing multiple chunks, their results must be intelligently combined. Simple concatenation can produce disjointed or redundant output. The system employs content-type-aware merging strategies - text chunks are merged with appropriate spacing and deduplication, structured data is merged while preserving relationships, and vision results are aggregated with context preservation. The merging system maintains coherence across chunk boundaries, producing results that read as unified responses rather than fragmented pieces.
|
||
|
||
**Progressive Processing:**
|
||
For very large documents, chunking enables progressive processing where each chunk can be processed as soon as it's prepared, rather than waiting for the entire document. This streaming approach reduces perceived latency and enables progress reporting to users, showing incremental completion rather than a black box wait.
|
||
|
||
## Data Models
|
||
|
||
### Core Data Models (`datamodelAi.py`)
|
||
|
||
```mermaid
|
||
classDiagram
|
||
class AiModel {
|
||
+string name
|
||
+string displayName
|
||
+string connectorType
|
||
+string apiUrl
|
||
+float temperature
|
||
+int maxTokens
|
||
+int contextLength
|
||
+float costPer1kTokensInput
|
||
+float costPer1kTokensOutput
|
||
+int speedRating
|
||
+int qualityRating
|
||
+callable functionCall
|
||
+PriorityEnum priority
|
||
+ProcessingModeEnum processingMode
|
||
+List~OperationTypeRating~ operationTypes
|
||
+string version
|
||
+callable calculatePriceUsd
|
||
}
|
||
|
||
class AiCallRequest {
|
||
+string prompt
|
||
+string context
|
||
+AiCallOptions options
|
||
+List~ContentPart~ contentParts
|
||
}
|
||
|
||
class AiCallOptions {
|
||
+OperationTypeEnum operationType
|
||
+PriorityEnum priority
|
||
+ProcessingModeEnum processingMode
|
||
+bool compressPrompt
|
||
+bool compressContext
|
||
}
|
||
|
||
class AiCallResponse {
|
||
+string content
|
||
+string modelName
|
||
+float priceUsd
|
||
+float processingTime
|
||
+int bytesSent
|
||
+int bytesReceived
|
||
+int errorCount
|
||
}
|
||
|
||
class OperationTypeEnum {
|
||
<<enumeration>>
|
||
PLAN
|
||
DATA_ANALYSE
|
||
DATA_GENERATE
|
||
DATA_EXTRACT
|
||
IMAGE_ANALYSE
|
||
IMAGE_GENERATE
|
||
WEB_SEARCH
|
||
WEB_CRAWL
|
||
}
|
||
|
||
class PriorityEnum {
|
||
<<enumeration>>
|
||
BALANCED
|
||
SPEED
|
||
QUALITY
|
||
COST
|
||
}
|
||
|
||
class ProcessingModeEnum {
|
||
<<enumeration>>
|
||
BASIC
|
||
ADVANCED
|
||
DETAILED
|
||
}
|
||
|
||
AiCallRequest --> AiCallOptions
|
||
AiCallOptions --> OperationTypeEnum
|
||
AiCallOptions --> PriorityEnum
|
||
AiCallOptions --> ProcessingModeEnum
|
||
AiModel --> PriorityEnum
|
||
AiModel --> ProcessingModeEnum
|
||
|
||
note for AiModel "Unique displayName required\nacross all connectors"
|
||
note for AiCallRequest "Input to AI system"
|
||
note for AiCallResponse "Output from AI system"
|
||
```
|
||
|
||
**Core Data Model Descriptions:**
|
||
|
||
**AiModel:** Represents a complete model configuration with all metadata required for selection, execution, and cost tracking. The `name` field contains the API-level identifier used in actual provider calls, while `displayName` serves as the globally unique identifier within the registry. Technical specifications like `contextLength` (maximum input tokens) and `maxTokens` (maximum output tokens) inform chunking and validation logic. Economic fields (`costPer1kTokensInput`, `costPer1kTokensOutput`) enable precise cost tracking across all operations. Performance metrics (`speedRating`, `qualityRating`) influence selection algorithms. The `functionCall` field holds a callable reference to the connector method that executes API communication. The `operationTypes` list defines which operation types this model supports and how well it performs each, using ratings from 1-10.
|
||
|
||
**AiCallRequest:** Encapsulates all information needed to execute an AI operation. The `prompt` contains the primary instruction or question, while optional `context` provides supporting information. The `options` object configures operation behavior including type, priority, and processing mode. For multi-modal requests (like vision operations), the `contentParts` list can contain multiple pieces of content with different MIME types.
|
||
|
||
**AiCallOptions:** Configures how an AI operation should be executed. The `operationType` determines what kind of operation this is (planning, analysis, generation, etc.), which drives model selection. The `priority` indicates whether to optimize for speed, quality, cost, or balance. The `processingMode` suggests the depth of processing required (basic for simple tasks, detailed for complex reasoning). Boolean flags like `compressPrompt` and `compressContext` control whether the system should attempt content compression to fit context limits.
|
||
|
||
**AiCallResponse:** Contains the complete result of an AI operation including the generated `content`, the `modelName` that produced it, and comprehensive metrics. Cost tracking is provided via `priceUsd`, calculated based on actual token usage reported by the provider. Performance metrics include `processingTime` (wall-clock time for the operation), `bytesSent` and `bytesReceived` (for network monitoring), and `errorCount` (zero for success, greater than zero indicating partial or complete failure).
|
||
|
||
## Best Practices
|
||
|
||
### Adding a New AI Provider
|
||
|
||
The plugin architecture makes adding new AI providers straightforward through a four-step process:
|
||
|
||
**Step 1: Create the Plugin File**
|
||
|
||
Create a new file in the `modules/aicore` directory following the naming convention `aicorePlugin<Provider>.py`, where `<Provider>` is a descriptive name for the AI service (e.g., `aicorePluginCohere` for Cohere AI). The filename itself triggers automatic discovery - the system scans for any file matching the `aicorePlugin*.py` pattern during initialization.
|
||
|
||
**Step 2: Implement the Connector Class**
|
||
|
||
Within your plugin file, create a class that inherits from BaseConnectorAi. This class must implement several required methods:
|
||
|
||
**Connector Identification:**
|
||
The `getConnectorType()` method returns a simple string identifier (lowercase, no spaces) that uniquely identifies this connector throughout the system. This identifier appears in logs, model metadata, and routing decisions.
|
||
|
||
**Model Catalog Definition:**
|
||
The `getModels()` method returns a list of AiModel instances, one for each model configuration you want to expose. Each AiModel requires comprehensive metadata including:
|
||
- A unique displayName that differs from all other models in the system (e.g., "Cohere Command-R Plus")
|
||
- The API model name used in actual API calls
|
||
- Technical specifications (context length, max output tokens, temperature)
|
||
- Economic data (input and output costs per 1000 tokens)
|
||
- Performance ratings (speed and quality on 1-10 scales)
|
||
- Operational capabilities defined via `createOperationTypeRatings()`, specifying which operation types the model supports and how well (rating 1-10 for each)
|
||
- A reference to the callable method that handles API communication (typically a method on your connector class)
|
||
|
||
**API Communication Method:**
|
||
Implement one or more async methods (like `callAi()`) that accept an AiModelCall object and return an AiModelResponse. This method handles the actual HTTP communication with your provider's API. It must:
|
||
- Extract messages from the AiModelCall
|
||
- Transform them into the provider's expected JSON format
|
||
- Execute the HTTP request with proper authentication and error handling
|
||
- Parse the provider's response format
|
||
- Extract the generated text and any usage statistics
|
||
- Calculate costs based on token usage
|
||
- Return everything wrapped in an AiModelResponse object
|
||
|
||
**Step 3: Configure Environment Variables**
|
||
|
||
Add the necessary configuration to your environment files (env_dev.env, env_int.env, env_prod.env). At minimum, this includes the API key for authentication, but might also include endpoint URLs, organization IDs, or other provider-specific settings. Use descriptive configuration key names following the convention `Connector_Ai<Provider>_<SettingName>_SECRET` for sensitive values.
|
||
|
||
**Step 4: Automatic Integration**
|
||
|
||
No manual registration or configuration code changes are required. When the application next starts, the modelRegistry's discovery mechanism automatically:
|
||
- Scans the aicore directory
|
||
- Finds your new plugin file
|
||
- Imports the module
|
||
- Instantiates your connector class
|
||
- Calls getModels() to retrieve available models
|
||
- Validates displayName uniqueness
|
||
- Registers all models in the global registry
|
||
|
||
Your new AI provider is now fully integrated and will participate in model selection for appropriate operation types. The system logs will show discovery and registration messages confirming successful integration.
|
||
|
||
### Model Selection Guidelines
|
||
|
||
- **PLAN operations**: Use high-quality models (GPT-4, Claude 3 Opus)
|
||
- **DATA_GENERATE**: Balanced models for quality/cost trade-off
|
||
- **DATA_EXTRACT**: Speed-optimized models for bulk processing
|
||
- **IMAGE_ANALYSE**: Vision-capable models only
|
||
- **WEB_SEARCH**: Specialized search connectors (Perplexity, Tavily)
|
||
|
||
### Error Handling Philosophy
|
||
|
||
The aicore system implements a comprehensive error handling strategy designed for resilience and observability:
|
||
|
||
**Automatic Failover:**
|
||
When you invoke `aiObjects.call()` with a request, the system automatically attempts multiple models from the failover list until one succeeds. Each failure is logged with detailed context (model name, error type, error message) but doesn't interrupt the execution flow. Only if all models in the failover list fail does the method return an error response.
|
||
|
||
**Graceful Degradation:**
|
||
Rather than throwing exceptions that crash workflows, the system returns AiCallResponse objects even in failure scenarios. These error responses have `errorCount` greater than zero and contain descriptive error messages in the `content` field. This allows calling code to inspect the errorCount property and decide how to handle partial failures - whether to retry with different parameters, fall back to alternative processing paths, or present user-friendly error messages.
|
||
|
||
**Comprehensive Logging:**
|
||
Every error is logged with sufficient context for debugging: the attempted model's displayName, the operation type, the error type (network timeout, API error, rate limit, etc.), and the full error message. This creates an audit trail for troubleshooting production issues without requiring verbose debug logging during normal operations.
|
||
|
||
**Error Classification:**
|
||
The system distinguishes between transient errors (network timeouts, temporary API issues) that warrant trying another model, and permanent errors (authentication failures, malformed requests) that indicate configuration problems requiring immediate attention. Transient errors trigger failover silently, while permanent errors are logged at higher severity levels.
|
||
|
||
## Performance Considerations
|
||
|
||
### Caching
|
||
- Model registry caches for 5 minutes
|
||
- Connector models cached individually
|
||
- Reduces discovery overhead
|
||
|
||
### Failover Strategy
|
||
- Models sorted by score (best first)
|
||
- Failed models logged with detailed errors
|
||
- Next best model tried automatically
|
||
|
||
### Chunking
|
||
- Large content automatically chunked based on model limits
|
||
- Conservative 70-80% utilization for safety
|
||
- Intelligent merging of chunk results
|
||
|
||
### Cost Optimization
|
||
- Model selector considers cost ratings
|
||
- Price calculated per call for tracking
|
||
- Can prioritize by cost with `PriorityEnum.COST`
|
||
|
||
## Troubleshooting
|
||
|
||
### Common Issues
|
||
|
||
1. **"No models available"**
|
||
- Check API keys in environment configuration
|
||
- Verify connector plugins exist in `aicore/` folder
|
||
- Check logs for connector initialization errors
|
||
|
||
2. **"No suitable model found"**
|
||
- Check if operation type is supported by any model
|
||
- Verify prompt size isn't too large for all models
|
||
- Review model filtering criteria in logs
|
||
|
||
3. **"All models failed"**
|
||
- Check API connectivity and keys
|
||
- Review model-specific error messages in logs
|
||
- Verify request format is correct
|
||
|
||
4. **"Duplicate displayName"**
|
||
- Each model must have unique `displayName`
|
||
- Check all plugin files for name conflicts
|
||
- Naming convention: `<Provider> <Model Name>`
|
||
|
||
## Future Enhancements
|
||
|
||
- **Streaming Support**: Real-time response streaming for chat interfaces
|
||
- **Model Health Monitoring**: Track success rates and performance metrics
|
||
- **Cost Budgets**: Automatic model selection based on budget constraints
|
||
- **Custom Scoring**: User-defined scoring functions for model selection
|
||
- **A/B Testing**: Compare different models for the same operation
|
||
- **Rate Limiting**: Built-in rate limit handling per provider
|
||
|
||
## Quick Reference
|
||
|
||
### Common Usage Patterns
|
||
|
||
**1. Making AI Calls:**
|
||
|
||
There are two primary approaches for invoking AI operations in the system:
|
||
|
||
**Via AiService (Recommended Approach):**
|
||
The recommended pattern uses the high-level service methods like `callAiPlanning()`, `callAiDocuments()`, or `callAiText()`. These methods are accessed through the serviceCenter and handle all complexity internally. For planning operations, you call `serviceCenter.ai.callAiPlanning()` with a prompt string and optional placeholder list. Placeholders allow dynamic content injection - the system replaces markers like `{TASK}` with actual content before sending to the AI. This approach provides automatic prompt building, placeholder resolution, and response formatting.
|
||
|
||
**Direct via AiObjects (Advanced Use):**
|
||
For specialized scenarios requiring fine-grained control, you can construct an AiCallRequest manually and invoke `aiObjects.call()` directly. This requires creating an AiCallOptions object with explicit operation type and priority settings, then awaiting the call. The response object contains the generated content plus metrics like token usage, processing time, and costs. This approach is typically used within service implementations or for custom AI workflows.
|
||
|
||
**2. Querying Available Models:**
|
||
|
||
The modelRegistry provides comprehensive model inventory access:
|
||
|
||
**Complete Inventory Access:**
|
||
Calling `modelRegistry.getAvailableModels()` returns all currently available and healthy models across all registered connectors. This list automatically excludes any models marked as unavailable due to configuration issues or connector errors.
|
||
|
||
**Connector-Specific Filtering:**
|
||
Use `modelRegistry.getModelsByConnector("openai")` to retrieve only models from a specific provider. This is useful when implementing provider-specific features or debugging connector issues. Pass the connector type string (openai, anthropic, perplexity, tavily) as the parameter.
|
||
|
||
**Direct Model Lookup:**
|
||
For retrieving a specific model's full metadata, use `modelRegistry.getModel("OpenAI GPT-4o")` with the exact displayName. This returns the complete AiModel object including capabilities, costs, ratings, and the functionCall reference.
|
||
|
||
**Statistical Overview:**
|
||
The `modelRegistry.getModelStats()` method provides aggregate statistics including total model count, availability counts, breakdowns by connector type, capability distribution, and priority classifications. This is valuable for monitoring system health and model distribution.
|
||
|
||
**3. Understanding Model Selection:**
|
||
|
||
To understand how the system selects models for specific requests:
|
||
|
||
**Generating Failover Lists:**
|
||
Invoke `modelSelector.getFailoverModelList()` with your prompt, context, options, and the list of available models. The selector executes its full filtering and scoring algorithm, returning a ranked list ordered from most to least suitable. The first element represents the optimal choice, while subsequent elements serve as fallback options.
|
||
|
||
**Analyzing Selection Results:**
|
||
Each model in the failover list has been validated for operation type compatibility and context size constraints. Their ordering reflects the composite score from operation ratings, size efficiency, processing mode alignment, and priority preferences. Examining this list helps understand why specific models were chosen or excluded for particular operations.
|
||
|
||
### Operation Types Reference
|
||
|
||
| Operation Type | Description | Best Models | Use Case |
|
||
|---------------|-------------|-------------|----------|
|
||
| `PLAN` | Task planning, action selection | GPT-4, Claude Opus | Workflow planning, decision making |
|
||
| `DATA_ANALYSE` | Data analysis and insights | GPT-4, Claude Sonnet | Document analysis, pattern detection |
|
||
| `DATA_GENERATE` | Content generation | GPT-4, Claude Sonnet | Report creation, document generation |
|
||
| `DATA_EXTRACT` | Information extraction | GPT-3.5, Claude Haiku | Text extraction, data parsing |
|
||
| `IMAGE_ANALYSE` | Image/vision analysis | GPT-4 Vision, Claude Vision | Image understanding, OCR |
|
||
| `IMAGE_GENERATE` | Image generation | DALL-E, Stable Diffusion | Image creation |
|
||
| `WEB_SEARCH` | Web search operations | Perplexity | Real-time web search |
|
||
| `WEB_CRAWL` | Web crawling | Tavily | Website content extraction |
|
||
|
||
### Priority Reference
|
||
|
||
| Priority | Description | Selection Behavior |
|
||
|----------|-------------|-------------------|
|
||
| `BALANCED` | Balance speed, quality, cost | Default selection |
|
||
| `SPEED` | Prioritize fast response | Favor high speedRating models |
|
||
| `QUALITY` | Prioritize high-quality output | Favor high qualityRating models |
|
||
| `COST` | Prioritize low cost | Favor low-cost models |
|
||
|
||
### Processing Mode Reference
|
||
|
||
| Mode | Description | When to Use |
|
||
|------|-------------|-------------|
|
||
| `BASIC` | Simple, straightforward processing | Quick tasks, simple questions |
|
||
| `ADVANCED` | Complex reasoning required | Multi-step tasks, analysis |
|
||
| `DETAILED` | Comprehensive, thorough output | Planning, detailed generation |
|
||
|
||
### Module Import Structure
|
||
|
||
The aicore system is organized across several module paths for clean separation of concerns:
|
||
|
||
**Core Infrastructure Components:**
|
||
- The base connector interface lives at `modules.aicore.aicoreBase` and exports BaseConnectorAi
|
||
- The global model registry singleton is imported from `modules.aicore.aicoreModelRegistry` as modelRegistry
|
||
- The global model selector singleton is imported from `modules.aicore.aicoreModelSelector` as modelSelector
|
||
|
||
**Data Model Definitions:**
|
||
All AI-related data models are centralized in `modules.datamodels.datamodelAi`, including:
|
||
- AiModel: Complete model metadata and configuration
|
||
- AiCallRequest and AiCallResponse: Request/response wrapper objects
|
||
- AiCallOptions: Configuration options for AI operations
|
||
- OperationTypeEnum, PriorityEnum, ProcessingModeEnum: Enumeration types for operation classification
|
||
|
||
**Interface and Service Layers:**
|
||
- The AiObjects interface class is available at `modules.interfaces.interfaceAiObjects`
|
||
- The high-level AiService class is located at `modules.services.serviceAi.mainServiceAi`
|
||
|
||
Most application code interacts with the service layer rather than importing core components directly, maintaining proper architectural separation.
|
||
|
||
## Summary
|
||
|
||
The `aicore` module is the **backbone of AI operations** in the application, providing:
|
||
- **Abstraction**: Single interface for multiple AI providers
|
||
- **Intelligence**: Smart model selection and automatic failover
|
||
- **Flexibility**: Plugin architecture for easy provider addition
|
||
- **Reliability**: Caching, failover, and error handling
|
||
- **Performance**: Context-aware chunking and optimization
|
||
|
||
It connects to `serviceAi` as the **foundation layer**, enabling high-level AI services to operate without knowledge of specific AI provider implementations. The entire system integrates seamlessly into the application through the service layer architecture.
|
||
|
||
---
|
||
|
||
**Related Documentation:**
|
||
- [Services API Reference](./services-api-reference.md)
|
||
- [Architecture Overview](./architecture-overview.md)
|
||
- [Security Component](./security-component.md)
|
||
|