Re-indexing the same file always triggered a full embedding run —
ingestion.skipped.duplicate never fired. Two independent causes:
1. _computeIngestionHash included contentObjectId in its payload, but
extractors generate fresh uuid4() per run, making the hash a
per-run nonce. Now hashed over (contentType, data) in extractor
order — stable across re-extractions, sensitive to content,
ordering, and type changes.
2. _autoIndexFile upserted the fresh pre-scan FileContentIndex before
requestIngestion's duplicate check, wiping structure._ingestion
and status=indexed from the prior run. The pre-upsert now merges
the existing _ingestion metadata and preserves the indexed status.
Verified end-to-end: second PATCH /scope on an already-indexed file
logs and returns in ~2s
with zero embedding API calls.
Adds test_ingestion_hash_stability.py (5 cases).