i18n tags tools

This commit is contained in:
ValueOn AG 2026-04-13 09:43:56 +02:00
parent 61f04a6049
commit e952b4c9ee

View file

@ -90,16 +90,24 @@ class NeutralizationService:
_NEUT_INSTRUCTION = ( _NEUT_INSTRUCTION = (
"Analyze the following text and identify ALL sensitive content that must be neutralized:\n" "Analyze the following text and identify ALL sensitive content that must be neutralized:\n"
"1. Personal data (PII): names of persons, email addresses, phone numbers, " "1. Personal data (PII):\n"
"physical addresses, ID numbers, dates of birth, financial data (IBAN, account numbers), " " - Full names of persons\n"
"social security numbers\n" " - Email addresses\n"
" - Phone numbers\n"
" - Physical addresses (street, city, postal code)\n"
" - ID numbers (passport, driver license, AHV/SSN)\n"
" - Dates of birth (e.g. '14.03.1982', '1982-03-14', 'March 14, 1982', 'born in 1982')\n"
" - Age when it identifies a person\n"
" - Financial data (IBAN, account numbers, salary, balances)\n"
" - Nationality, citizenship, place of origin\n"
"2. Protected business logic: proprietary algorithms, trade secrets, confidential " "2. Protected business logic: proprietary algorithms, trade secrets, confidential "
"processes, internal procedures, code snippets that reveal implementation details\n" "processes, internal procedures, code snippets that reveal implementation details\n"
"3. Named entities: company names, product names, project names, brand names\n\n" "3. Named entities: company names, product names, project names, brand names\n\n"
"Return ONLY a JSON array (no markdown, no explanation):\n" "Return ONLY a JSON array (no markdown, no explanation):\n"
'[{"text":"exact substring","type":"name|email|phone|address|id|financial|logic|company|product|location|other"}]\n\n' '[{"text":"exact substring","type":"name|email|phone|address|id|dob|financial|nationality|logic|company|product|location|other"}]\n\n'
"Rules:\n" "Rules:\n"
"- Every entry's 'text' must be an exact, verbatim substring of the input.\n" "- Every entry's 'text' must be an exact, verbatim substring of the input.\n"
"- Dates of birth MUST always be captured — use type 'dob'.\n"
"- Do NOT include generic words, common language constructs or non-sensitive terms.\n" "- Do NOT include generic words, common language constructs or non-sensitive terms.\n"
"- If nothing is sensitive, return [].\n\n" "- If nothing is sensitive, return [].\n\n"
) )