gateway

History

Ida dff3d41845 fix(rag): stable ingestion idempotency across re-extractions (AC4) Re-indexing the same file always triggered a full embedding run — ingestion.skipped.duplicate never fired. Two independent causes: 1. _computeIngestionHash included contentObjectId in its payload, but extractors generate fresh uuid4() per run, making the hash a per-run nonce. Now hashed over (contentType, data) in extractor order — stable across re-extractions, sensitive to content, ordering, and type changes. 2. _autoIndexFile upserted the fresh pre-scan FileContentIndex before requestIngestion's duplicate check, wiping structure._ingestion and status=indexed from the prior run. The pre-upsert now merges the existing _ingestion metadata and preserves the indexed status. Verified end-to-end: second PATCH /scope on an already-indexed file logs and returns in ~2s with zero embedding API calls. Adds test_ingestion_hash_stability.py (5 cases).		2026-04-29 14:39:40 +02:00
..
test_extraction_merge_strategy.py	fix(rag): preserve per-page granularity + remove on-demand extraction fallbacks	2026-04-29 14:39:40 +02:00
test_featureDataAgent_schema.py	trustee agent fix	2026-04-27 08:07:37 +02:00
test_ingestion_hash_stability.py	fix(rag): stable ingestion idempotency across re-extractions (AC4)	2026-04-29 14:39:40 +02:00
test_json_extraction_merging.py	fixes	2026-04-23 23:09:38 +02:00
test_renderer_pdf_smoke.py	Graph and data class falignment strict	2026-04-26 22:53:44 +02:00