new features

This commit is contained in:
ValueOn AG 2025-12-29 23:54:13 +01:00
parent 0eaaeb3550
commit b7f8bfe3e4
4 changed files with 3085 additions and 0 deletions

View file

@ -0,0 +1,319 @@
# Progress Reporting for Document Generation - Analysis & Proposal
## Current State Analysis
### Existing Progress Reporting System
**ProgressLogger** (`gateway/modules/shared/progressLogger.py`):
- Centralized progress logging with hierarchical support
- Methods: `startOperation()`, `updateOperation()`, `finishOperation()`
- Supports parent-child relationships via `parentOperationId`
- Creates `ChatLog` entries with progress (0.0-1.0), status, and hierarchical structure
**ChatLog Model** (`gateway/modules/datamodels/datamodelChat.py`):
- Fields: `progress` (0.0-1.0), `status`, `parentId`, `operationId`
- Hierarchical display support via `parentId` (references parent's `operationId`)
### Current Generation Progress Reporting
**Chapter Structure Generation** (`subStructureGeneration.py`):
- ✅ Has progress logging: `startOperation()` and `finishOperation()`
- ❌ No progress updates during generation
- ❌ No chapter-level granularity
**Structure Filling** (`subStructureFilling.py`):
- ✅ Overall operation: `startOperation()` and `finishOperation()` for "Chapter Content Generation"
- ✅ Individual sections: `startOperation()` and `finishOperation()` for each section
- ❌ **Missing**: Chapter-level progress operations
- ❌ **Missing**: Section progress updates (only start/finish, no intermediate progress)
- ❌ **Missing**: Overall progress calculation showing "Chapter X/Y, Section Z/W"
**Content Generation** (`mainServiceGeneration.py`):
- ✅ Has `contentProgressCallback` that reports section-level progress
- ✅ Maps section progress to overall progress (30-90%)
- ❌ But only at section level, not chapter level
- ❌ No hierarchical display of chapters and sections
## Gap Analysis
### What Users Currently See:
```
[Progress] Chapter Content Generation - Filling (0%)
[Progress] Section Generation - Section (0%)
[Progress] Section Generation - Section (0%)
[Progress] Section Generation - Section (0%)
...
[Progress] Chapter Content Generation - Filling (100%)
```
### What Users Should See:
```
[Progress] Chapter Content Generation - Filling (0%)
[Progress] Chapter 1/5: Introduction (0%)
[Progress] Section 1/3: Overview (0%)
[Progress] Section 1/3: Overview (50%) - Generating content...
[Progress] Section 1/3: Overview (100%)
[Progress] Section 2/3: Background (0%)
...
[Progress] Chapter 1/5: Introduction (100%)
[Progress] Chapter 2/5: Analysis (0%)
...
```
## Proposed Integration
### 1. Chapter-Level Progress Operations
**Location**: `subStructureFilling.py``_fillChapterSections()`
**Changes**:
- Create chapter-level operation IDs
- Start chapter operation before processing sections
- Update chapter progress as sections complete
- Finish chapter operation when all sections done
**Implementation**:
```python
# Before processing chapters
totalChapters = sum(len(doc.get("chapters", [])) for doc in chapterStructure.get("documents", []))
chapterIndex = 0
for doc in chapterStructure.get("documents", []):
for chapter in doc.get("chapters", []):
chapterIndex += 1
chapterId = chapter.get("id", "unknown")
chapterTitle = chapter.get("title", "Untitled")
# Start chapter operation
chapterOperationId = f"{fillOperationId}_chapter_{chapterId}"
self.services.chat.progressLogStart(
chapterOperationId,
"Chapter Generation",
f"Chapter {chapterIndex}/{totalChapters}",
f"{chapterTitle}",
parentOperationId=fillOperationId
)
# Process sections within chapter
sections = chapter.get("sections", [])
totalSections = len(sections)
for sectionIndex, section in enumerate(sections):
# ... existing section processing ...
# Update chapter progress after each section
chapterProgress = (sectionIndex + 1) / totalSections if totalSections > 0 else 1.0
self.services.chat.progressLogUpdate(
chapterOperationId,
chapterProgress,
f"Section {sectionIndex + 1}/{totalSections} completed"
)
# Finish chapter operation
self.services.chat.progressLogFinish(chapterOperationId, True)
```
### 2. Section Progress Updates
**Location**: `subStructureFilling.py``_fillChapterSections()`
**Changes**:
- Add progress updates during section processing
- Report progress at key stages:
- 0.2: Building prompt
- 0.4: Calling AI
- 0.6: Processing response
- 0.8: Validating content
- 1.0: Complete
**Implementation**:
```python
# Start section operation (existing)
sectionOperationId = f"{fillOperationId}_section_{sectionId}"
self.services.chat.progressLogStart(
sectionOperationId,
"Section Generation",
"Section",
f"Generating section {sectionId}",
parentOperationId=chapterOperationId # Parent is chapter, not fillOperationId
)
try:
# Update: Building prompt
self.services.chat.progressLogUpdate(sectionOperationId, 0.2, "Building generation prompt")
generationPrompt = self._buildSectionGenerationPrompt(...)
# Update: Calling AI
self.services.chat.progressLogUpdate(sectionOperationId, 0.4, "Calling AI for content generation")
aiResponse = await self.aiService.callAi(...)
# Update: Processing response
self.services.chat.progressLogUpdate(sectionOperationId, 0.6, "Processing AI response")
# Parse and validate
generatedElements = json.loads(...)
# Update: Validating content
self.services.chat.progressLogUpdate(sectionOperationId, 0.8, "Validating generated content")
elements.extend(generatedElements)
# Finish section (existing)
self.services.chat.progressLogFinish(sectionOperationId, True)
except Exception as e:
self.services.chat.progressLogFinish(sectionOperationId, False)
# ... error handling ...
```
### 3. Overall Progress Calculation
**Location**: `subStructureFilling.py``_fillChapterSections()`
**Changes**:
- Calculate overall progress based on chapters and sections
- Update parent operation (`fillOperationId`) with overall progress
**Implementation**:
```python
# Calculate overall progress
def calculateOverallProgress(chapterIndex, totalChapters, sectionIndex, totalSections):
"""Calculate overall progress: 0.0 to 1.0"""
if totalChapters == 0:
return 1.0
# Progress from completed chapters
completedChaptersProgress = chapterIndex / totalChapters
# Progress from current chapter
currentChapterProgress = (sectionIndex / totalSections) / totalChapters if totalSections > 0 else 0
return completedChaptersProgress + currentChapterProgress
# Update overall progress after each section
overallProgress = calculateOverallProgress(
chapterIndex - 1, # -1 because we're processing current chapter
totalChapters,
sectionIndex,
totalSections
)
self.services.chat.progressLogUpdate(
fillOperationId,
overallProgress,
f"Chapter {chapterIndex}/{totalChapters}, Section {sectionIndex + 1}/{totalSections}"
)
```
### 4. Chapter Structure Generation Progress
**Location**: `subStructureFilling.py``_generateChapterSectionsStructure()`
**Changes**:
- Add progress reporting for chapter structure generation
- Report progress per chapter
**Implementation**:
```python
# Count total chapters
totalChapters = sum(len(doc.get("chapters", [])) for doc in chapterStructure.get("documents", []))
chapterIndex = 0
for doc in chapterStructure.get("documents", []):
for chapter in doc.get("chapters", []):
chapterIndex += 1
chapterId = chapter.get("id", "unknown")
chapterTitle = chapter.get("title", "Untitled")
# Update progress
progress = chapterIndex / totalChapters if totalChapters > 0 else 1.0
self.services.chat.progressLogUpdate(
parentOperationId, # Use parent operation (structure generation)
progress,
f"Generating sections for Chapter {chapterIndex}/{totalChapters}: {chapterTitle}"
)
# ... existing chapter structure generation ...
```
## Implementation Plan
### Phase 1: Chapter-Level Progress (High Priority)
1. Add chapter operation tracking in `_fillChapterSections()`
2. Create chapter-level operations with proper parent hierarchy
3. Update chapter progress as sections complete
### Phase 2: Section Progress Updates (High Priority)
1. Add progress updates during section processing
2. Report progress at key stages (prompt building, AI call, response processing)
3. Update section parent to be chapter operation (not fill operation)
### Phase 3: Overall Progress Calculation (Medium Priority)
1. Implement overall progress calculation function
2. Update parent operation with overall progress
3. Include chapter/section counts in status messages
### Phase 4: Chapter Structure Generation Progress (Low Priority)
1. Add progress updates during chapter structure generation
2. Report progress per chapter
## Benefits
1. **User Visibility**: Users can see exactly which chapter and section is being processed
2. **Better UX**: Hierarchical progress display shows document structure
3. **Debugging**: Easier to identify where generation is stuck
4. **Performance Monitoring**: Can track time per chapter/section
5. **Consistency**: Uses existing ProgressLogger infrastructure
## Example Progress Display
**Before**:
```
[0%] Chapter Content Generation - Filling
[0%] Section Generation - Section
[100%] Section Generation - Section
[0%] Section Generation - Section
[100%] Section Generation - Section
...
[100%] Chapter Content Generation - Filling
```
**After**:
```
[0%] Chapter Content Generation - Filling
[0%] Chapter 1/5: Introduction
[0%] Section 1/3: Overview
[20%] Section 1/3: Overview - Building prompt
[40%] Section 1/3: Overview - Calling AI
[60%] Section 1/3: Overview - Processing response
[80%] Section 1/3: Overview - Validating content
[100%] Section 1/3: Overview
[33%] Chapter 1/5: Introduction - Section 1/3 completed
[0%] Section 2/3: Background
...
[100%] Chapter 1/5: Introduction
[20%] Chapter Content Generation - Chapter 1/5 completed
[0%] Chapter 2/5: Analysis
...
```
## Files to Modify
1. `gateway/modules/services/serviceAi/subStructureFilling.py`
- `_fillChapterSections()`: Add chapter and section progress
- `_generateChapterSectionsStructure()`: Add chapter structure progress
2. No changes needed to:
- `progressLogger.py` (already supports hierarchy)
- `datamodelChat.py` (already supports parentId)
- `mainServiceChat.py` (wrapper methods already exist)
## Testing Considerations
1. Test with documents containing multiple chapters
2. Test with chapters containing multiple sections
3. Verify hierarchical display in frontend
4. Test error handling (failed sections should not break progress)
5. Test with single chapter/section documents

View file

@ -0,0 +1,334 @@
# Critical Review: User Authentication Process Concept
## Issues Found
### 1. **CRITICAL: Data Model Inconsistency**
**Location**: Line 277 in concept document
**Issue**: `resetTokenExpires` is shown as `Optional[str]` but should be `Optional[float]`
```python
# WRONG (line 277):
resetTokenExpires: Optional[str] = Field(None, description="Reset token expiration (ISO datetime)")
# CORRECT:
resetTokenExpires: Optional[float] = Field(None, description="Reset token expiration (UTC timestamp in seconds)")
```
**Impact**: Implementation confusion, type errors
### 2. **CRITICAL: Config Value Inconsistency**
**Location**: Line 429 in concept document
**Issue**: Config shows `48` hours but requirement is `24` hours
```ini
# WRONG (line 429):
Auth_RESET_TOKEN_EXPIRY_HOURS = 48
# CORRECT:
Auth_RESET_TOKEN_EXPIRY_HOURS = 24
```
**Impact**: Wrong expiration time in implementation
### 3. **SECURITY: Missing Rate Limiting**
**Issue**: Password reset request endpoint needs stricter rate limiting than registration
**Current**: Registration has `@limiter.limit("10/minute")`
**Needed**: Password reset request should have stricter limits to prevent:
- Email enumeration attacks
- Spam/DoS attacks
- Brute force attempts
**Recommendation**:
- Password reset request: `@limiter.limit("5/minute")` or `@limiter.limit("3/minute")`
- Password reset (token usage): `@limiter.limit("10/minute")` (less critical, token is single-use)
### 4. **SECURITY: Token Invalidation Strategy** ✅ CLARIFIED
**Issue**: What happens when user requests password reset multiple times?
**Current**: Concept doesn't specify behavior
**Clarification**:
- Token is stored in the `UserInDB` record (single field: `resetToken`)
- When new reset token is generated, **old token is overwritten** in the database
- **No risk**: Only one token can exist per user at any time (the last one generated)
- The old token is automatically invalidated because it no longer exists in the database
**Status**: ✅ No risk - token overwriting ensures only one valid token exists
### 5. **SECURITY: Token Reuse Prevention** ✅ CLARIFIED
**Issue**: What prevents token from being used multiple times?
**Current**: Concept says "Tokens are single-use (cleared after password reset)" but doesn't specify validation
**Clarification**:
- Token is **cleared immediately after successful password reset** (atomic operation)
- Once cleared, token no longer exists in database
- **No risk**: If token is used again, validation will fail because token doesn't exist
- Token reuse is prevented by the fact that token is removed from database after use
**Status**: ✅ No risk - token clearing after use prevents reuse automatically
### 6. **SECURITY: Email Case Sensitivity**
**Issue**: Email uniqueness check - are emails case-sensitive?
**Current**: Concept doesn't specify
**Risk**:
- `User@Example.com` and `user@example.com` could be treated as different
- Email enumeration could reveal accounts
**Recommendation**:
- Normalize emails to lowercase before storage and comparison
- Use `email.lower().strip()` in all email operations
- Document this in the concept
### 7. **SECURITY: Email Enumeration Prevention** ✅ APPROACH SELECTED
**Issue**: Registration endpoint reveals if email exists
**Current**: Registration rejects if email exists → reveals email is registered
**Risk**: Attacker can enumerate registered emails
**Selected Approach**:
- ✅ **If email exists for LOCAL auth user**: Send password reset email to existing user, don't create duplicate account
- ✅ **Return generic success message**: "Registration successful. Please check your email to set your password."
- ✅ **Don't reveal**: Whether email already existed or new account was created
- This prevents email enumeration while providing good UX
**Status**: ✅ Approach selected - send reset email to existing user if email exists
### 8. **EDGE CASE: Multiple Reset Requests** ✅ CLARIFIED
**Issue**: User requests reset multiple times before using first token
**Current**: Concept doesn't specify behavior
**Clarification**:
- Token is stored in `UserInDB` record (single field)
- **Only one token can exist at a time** - the last one generated
- When new reset request comes in, **old token is overwritten** in database
- **Only the last token is valid** - previous tokens are automatically invalidated
- No special handling needed - database structure ensures single token
**Status**: ✅ No risk - database structure prevents multiple tokens
### 9. **EDGE CASE: User State Changes During Reset**
**Issue**: What happens if user state changes while reset token is active?
**Scenarios**:
- User is disabled (`enabled=False`) while reset token is active
- User changes email while reset token is active
- User account is deleted while reset token is active
- User sets password via admin while reset token is active
**Recommendation**:
- On password reset, check user still exists and is enabled
- If user is disabled, allow reset but keep `enabled=False` (admin must activate)
- If email changed, token should still work (token is tied to user, not email)
- If user deleted, token validation should fail (user not found)
### 10. **EDGE CASE: Expired Token Cleanup** ✅ APPROACH SELECTED
**Issue**: Expired tokens remain in database indefinitely
**Current**: Concept doesn't mention cleanup
**Risk**: Database bloat, potential security issues
**Selected Approach**:
- ✅ **Check token expiration during user authentication** (on each login attempt)
- ✅ **Clear expired token if found** during authentication check
- ✅ **Also check during token validation** (password reset endpoint) and clear if expired
- ✅ **Simplest approach**: No separate cleanup job needed, tokens cleaned on access
- This ensures expired tokens are cleaned automatically without additional jobs
**Status**: ✅ Approach selected - check and clean expired tokens during auth/validation
### 11. **IMPLEMENTATION: Cross-Mandate Search**
**Issue**: How exactly to search across all mandates?
**Current**: Concept says "use root interface" but doesn't specify implementation
**Questions**:
- Does `getRecordsetWithRBAC` work with root interface?
- Should we bypass RBAC for this search?
- How to ensure we search ALL mandates?
**Recommendation**:
- Use root interface with `mandateId=None` or empty filter
- Use direct database query if RBAC filtering interferes
- Document exact implementation approach
### 12. **IMPLEMENTATION: Email Search Method**
**Issue**: No existing method to find user by email across mandates
**Current**: Concept mentions `findUserByEmailLocalAuth()` but doesn't specify how it works
**Questions**:
- Does database support case-insensitive email search?
- Should we use exact match or LIKE?
- How to handle NULL emails?
**Recommendation**:
- Implement method that:
1. Uses root interface
2. Searches with `recordFilter={"email": email.lower(), "authenticationAuthority": AuthAuthority.LOCAL}`
3. Returns first match or None
4. Handles NULL emails gracefully
### 13. **USER EXPERIENCE: Email Delivery Failure** ✅ MESSAGE TO ADD
**Issue**: What if user never receives email?
**Current**: Concept doesn't address this
**Scenarios**:
- Email goes to spam
- Email server is down
- User enters wrong email
- Email delivery is delayed
**Selected Approach**:
- ✅ **Show message about spam folder**: "If you don't receive an email within 5 minutes, please check your spam folder"
- ✅ Display this message after registration and password reset request
- Future enhancement: Consider "Resend email" functionality (with rate limiting)
**Status**: ✅ Message to add - show spam folder reminder
### 14. **USER EXPERIENCE: Password Reset After Registration** ✅ NO SPECIAL HANDLING
**Issue**: User registers, sets password, then immediately forgets it
**Current**: User must use password reset flow
**Decision**:
- ✅ **No special handling required** - use standard password reset flow
- ✅ User can request password reset like any other user
- ✅ Same token expiration (24 hours) applies to all reset tokens
**Status**: ✅ No special handling needed
### 15. **MISSING: Token Validation Order**
**Issue**: Order of validation checks not specified
**Current**: Concept lists validations but not order
**Risk**: Inefficient validation, potential security issues
**Recommendation**:
- Validate in this order:
1. Token format (UUID format check)
2. Token exists in database
3. User exists and is not deleted
4. Token matches user's resetToken
5. Token not expired (compare timestamps)
6. Password strength
7. Set password and clear token (atomic operation)
### 16. **MISSING: Atomic Operations** ✅ CONFIRMED
**Issue**: Password reset involves multiple database operations
**Current**: Concept doesn't specify transaction handling
**Risk**:
- Partial updates if operation fails
- Token cleared but password not set
- Race conditions
**Confirmation**:
- ✅ **Use database transaction** for password reset operation
- ✅ **Atomic operation** ensures all-or-nothing:
- Setting passwordHash
- Clearing resetToken and resetTokenExpires
- Setting enabled=True
- ✅ All operations succeed or all fail together
**Status**: ✅ Confirmed - atomic operations required
### 17. **MISSING: Audit Logging**
**Issue**: No mention of audit logging for security events
**Current**: Concept doesn't specify logging
**Risk**: No audit trail for security incidents
**Recommendation**:
- Log all password reset requests (with email hash for privacy)
- Log successful password resets
- Log failed token validations
- Log token generation events
### 18. **MISSING: Error Messages**
**Issue**: Error messages could reveal too much information
**Current**: Some error messages are specified, but not all
**Risk**: Information leakage
**Recommendation**:
- Use generic error messages:
- "Invalid or expired reset token" (don't specify which)
- "Registration failed" (don't specify why)
- "Password reset request processed" (always same message)
- Log detailed errors server-side only
### 19. **MISSING: Frontend Token Validation**
**Issue**: Frontend should validate token format before API call
**Current**: Concept doesn't specify frontend validation
**Risk**: Unnecessary API calls, poor UX
**Recommendation**:
- Validate UUID format in frontend before calling API
- Show error immediately if token format is invalid
- Extract token from URL and validate on page load
### 20. **MISSING: Magic Link Base URL** ✅ LOCATION SPECIFIED
**Issue**: Magic link generation needs frontend base URL
**Current**: Concept mentions "Should use frontend base URL from config" but doesn't specify
**Selected Approach**:
- ✅ **Add frontend URL to frontend env files**: `frontend_agents/public/config/env_*.env`
- ✅ **Files to update**:
- `env_int.env` - Integration environment URL
- `env_prod.env` - Production environment URL
- (Add `env_dev.env` if exists)
- ✅ **Variable name**: `APP_FRONTEND_URL` or `Frontend_BASE_URL`
- ✅ **Usage**: Backend reads from frontend config or gets from request origin
- ✅ **Environment-specific**: Each env file has its own URL
**Status**: ✅ Location specified - add to frontend env files
## Recommendations Summary
### High Priority (Security & Correctness)
1. ✅ Fix data model inconsistency (resetTokenExpires type)
2. ✅ Fix config value (24 hours, not 48)
3. ✅ Add rate limiting to password reset endpoints
4. ✅ Implement token invalidation on new reset request
5. ✅ Normalize emails to lowercase
6. ✅ Prevent token reuse (clear immediately after use)
7. ✅ Add atomic transaction for password reset
### Medium Priority (Edge Cases)
8. ✅ Handle user state changes during reset
9. ✅ Implement expired token cleanup
10. ✅ Add email delivery failure handling
11. ✅ Specify cross-mandate search implementation
12. ✅ Add audit logging
### Low Priority (UX & Polish)
13. ✅ Add frontend token validation
14. ✅ Specify magic link base URL configuration
15. ✅ Improve error messages (generic, secure)
16. ✅ Add resend email functionality (future enhancement)
## Security Checklist
- [ ] Rate limiting on all endpoints
- [ ] Token invalidation strategy implemented
- [ ] Token reuse prevention
- [ ] Email normalization (lowercase)
- [ ] Email enumeration prevention
- [ ] Atomic database operations
- [ ] Audit logging
- [ ] Generic error messages
- [ ] Token expiration validation
- [ ] User state validation during reset
- [ ] Cross-mandate search security (RBAC bypass only for root operations)
## Testing Checklist (Additional)
- [ ] Multiple reset requests (old token invalidated)
- [ ] Token reuse attempt (should fail)
- [ ] Expired token usage (should fail)
- [ ] User disabled during reset (should handle gracefully)
- [ ] Email case variations (User@Example.com vs user@example.com)
- [ ] Concurrent reset requests (race condition handling)
- [ ] Email delivery failure scenarios
- [ ] Invalid token format handling
- [ ] Cross-mandate email search accuracy
- [ ] Atomic operation rollback on failure

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff