| .. | ||
| neutralizer.py | ||
| readme.md | ||
| subParseString.py | ||
| subPatterns.py | ||
| subProcessBinary.py | ||
| subProcessCommon.py | ||
| subProcessList.py | ||
| subProcessText.py | ||
Neutralizer Module Structure
This module provides DSGVO-compliant data anonymization for AI agent systems. The code has been refactored into specialized sub-modules for better maintainability and code reuse.
Module Overview
Core Module
neutralizer.py- Main DataAnonymizer class that orchestrates all processing
Specialized Processors
subProcessText.py- Handles plain text processing without header informationsubProcessList.py- Handles structured data with headers (CSV, JSON, XML)subProcessBinary.py- Handles binary data types (images, audio, video, etc.)
Utility Modules
subParseString.py- String parsing and replacement utilities for emails, phones, addresses, IDs and namessubProcessCommon.py- Common utilities and data structures shared across modulespatterns.py- Pattern definitions for data anonymization
Key Features
1. Modular Architecture
- Separation of Concerns: Each module handles a specific type of data processing
- Code Reuse: Common functionality is centralized in utility modules
- Maintainability: Easier to modify and extend individual components
2. Processing Order
- Pattern-based matches (emails, phones, addresses, etc.) are processed FIRST
- Custom names from the user list are processed SECOND
- Already anonymized content (placeholders) is skipped
3. Supported Data Types
- Text: Plain text documents, emails, etc.
- Structured Data: CSV, JSON, XML with headers
- Binary Data: Images, audio, video (framework ready, implementation pending)
4. Placeholder Protection
- Prevents re-anonymization of already processed content
- Uses format
[tag.uuid]for placeholders - Validates placeholder format before processing
Usage Example
from modules.neutralizer import DataAnonymizer
# Initialize with custom names
anonymizer = DataAnonymizer(names_to_parse=['John Doe', 'Jane Smith'])
# Process content (auto-detects type)
result = anonymizer.process_content(content, content_type='text')
# Or specify content type explicitly
result = anonymizer.process_content(content, content_type='csv')
# Get mapping of original values to placeholders
mapping = anonymizer.get_mapping()
Module Dependencies
neutralizer.py
├── subProcessCommon.py (ProcessResult, CommonUtils)
├── subProcessText.py (TextProcessor)
├── subProcessList.py (ListProcessor)
├── subProcessBinary.py (BinaryProcessor)
└── patterns.py (Pattern definitions)
subProcessText.py
└── subParseString.py (StringParser)
subProcessList.py
├── subParseString.py (StringParser)
└── patterns.py (HeaderPatterns)
subProcessBinary.py
└── (standalone)
subParseString.py
└── patterns.py (DataPatterns)
Benefits of New Structure
- Single Responsibility: Each module has one clear purpose
- DRY Principle: No code duplication across modules
- Testability: Individual modules can be tested in isolation
- Extensibility: Easy to add new data types or processing methods
- Maintainability: Changes to one module don't affect others
- Performance: Specialized processors are optimized for their data types