# Neutralizer Module Structure This module provides DSGVO-compliant data anonymization for AI agent systems. The code has been refactored into specialized sub-modules for better maintainability and code reuse. ## Module Overview ### Core Module - **`neutralizer.py`** - Main DataAnonymizer class that orchestrates all processing ### Specialized Processors - **`subProcessText.py`** - Handles plain text processing without header information - **`subProcessList.py`** - Handles structured data with headers (CSV, JSON, XML) - **`subProcessBinary.py`** - Handles binary data types (images, audio, video, etc.) ### Utility Modules - **`subParseString.py`** - String parsing and replacement utilities for emails, phones, addresses, IDs and names - **`subProcessCommon.py`** - Common utilities and data structures shared across modules - **`patterns.py`** - Pattern definitions for data anonymization ## Key Features ### 1. Modular Architecture - **Separation of Concerns**: Each module handles a specific type of data processing - **Code Reuse**: Common functionality is centralized in utility modules - **Maintainability**: Easier to modify and extend individual components ### 2. Processing Order 1. **Pattern-based matches** (emails, phones, addresses, etc.) are processed FIRST 2. **Custom names** from the user list are processed SECOND 3. **Already anonymized content** (placeholders) is skipped ### 3. Supported Data Types - **Text**: Plain text documents, emails, etc. - **Structured Data**: CSV, JSON, XML with headers - **Binary Data**: Images, audio, video (framework ready, implementation pending) ### 4. Placeholder Protection - Prevents re-anonymization of already processed content - Uses format `[tag.uuid]` for placeholders - Validates placeholder format before processing ## Usage Example ```python from modules.neutralizer import DataAnonymizer # Initialize with custom names anonymizer = DataAnonymizer(names_to_parse=['John Doe', 'Jane Smith']) # Process content (auto-detects type) result = anonymizer.process_content(content, content_type='text') # Or specify content type explicitly result = anonymizer.process_content(content, content_type='csv') # Get mapping of original values to placeholders mapping = anonymizer.get_mapping() ``` ## Module Dependencies ``` neutralizer.py ├── subProcessCommon.py (ProcessResult, CommonUtils) ├── subProcessText.py (TextProcessor) ├── subProcessList.py (ListProcessor) ├── subProcessBinary.py (BinaryProcessor) └── patterns.py (Pattern definitions) subProcessText.py └── subParseString.py (StringParser) subProcessList.py ├── subParseString.py (StringParser) └── patterns.py (HeaderPatterns) subProcessBinary.py └── (standalone) subParseString.py └── patterns.py (DataPatterns) ``` ## Benefits of New Structure 1. **Single Responsibility**: Each module has one clear purpose 2. **DRY Principle**: No code duplication across modules 3. **Testability**: Individual modules can be tested in isolation 4. **Extensibility**: Easy to add new data types or processing methods 5. **Maintainability**: Changes to one module don't affect others 6. **Performance**: Specialized processors are optimized for their data types