- connection.established/revoked callbacks from OAuth routes and
connection management endpoints
- KnowledgeIngestionConsumer dispatches bootstrap job (established)
and synchronous purge (revoked)
- FileContentIndex: add connectionId + sourceKind columns
- SharePoint bootstrap with @odata.nextLink pagination and eTag-based
idempotency
- Outlook bootstrap treats messages as virtual documents with
cleanEmailBody for HTML/quote/signature stripping
- fix(rag): lower buildAgentContext minScore thresholds from
0.55/0.65/0.70 to 0.35 — previous values blocked all real matches
from text-embedding-3-small
- 24 new unit tests covering purge, consumer dispatch, email cleaning
and both bootstrap paths
The default MergeStrategy concatenates every extracted text part into a
single ContentPart, collapsing a 500-page PDF into one chunk with a
blurred average embedding — RAG retrieval was effectively broken.
- ExtractionOptions.mergeStrategy is now Optional[MergeStrategy]; passing
None preserves per-part granularity. Default factory kept for
backward compatibility.
- routeDataFiles._autoIndexFile, _workspaceTools.readFile, and
_documentTools.describeImage explicitly pass mergeStrategy=None.
- Agent tools no longer carry redundant extraction + requestIngestion
fallback paths: the unified ingestion lane owns all corpus writes,
and readFile/describeImage are pure consumers of the knowledge store.
- Unit test asserts runExtraction(mergeStrategy=None) keeps every part.