Changelog

All notable changes to this project will be documented in this file.

unreleased

Analyzer

Added

Canadian SIN (CA_SIN) recognizer for the Canadian Social Insurance Number, using regex pattern matching, context words (English and French), and Luhn checksum validation. Disabled by default.
Swedish PII recognizers for SE_PERSONNUMMER to identify Swedish Personal ID Numbers using pattern match and checksum. The recognizer also supports Swedish coordination numbers (samordningsnummer), issued to individuals who are not registered residents in Sweden but require identification. All disabled by default.
German PII recognizers for DE_TAX_ID (Steueridentifikationsnummer, §§ 139a–139e AO, ISO 7064 Mod 11,10 checksum), DE_TAX_NUMBER (Steuernummer, § 139a AO, ELSTER and slash formats), DE_PASSPORT (Reisepassnummer, PassG § 4, ICAO Doc 9303), DE_ID_CARD (Personalausweisnummer, PAuswG), DE_SOCIAL_SECURITY (Rentenversicherungsnummer, § 147 SGB VI, DRV checksum), DE_HEALTH_INSURANCE (Krankenversicherungsnummer/KVNR, § 290 SGB V, GKV checksum), DE_KFZ (KFZ-Kennzeichen, FZV § 8), DE_HANDELSREGISTER (Handelsregisternummer HRA/HRB, §§ 9/14 HGB), and DE_PLZ (Postleitzahl, very low base confidence, context-only). All disabled by default.
Added recognizer for Swedish Organisationsnummer, ID number for all Swedish oragnisations.
Added recognizer for Spanish Passport (ES_PASSPORT).

Fixed

Fixed incorrect Prüfziffer algorithm in DeHealthInsuranceRecognizer (KVNR); now uses alternating factors [1,2,…,1,2] per § 290 SGB V Anlage 1 (#1972).
Fixed incorrect check-digit weights in DeSocialSecurityRecognizer (RVNR); now uses VKVV § 4 weights [2,1,2,5,7,1,2,1,2,1,2,1]. Previous weights diverged from the Deutsche Rentenversicherung specification and rejected the canonical DRV example 15070649C103.
Fixed incorrect check-digit algorithm in DeLanrRecognizer; now uses KBV Arztnummern-Richtlinie weights [4,9,4,9,4,9] without the spurious Quersumme step, and the complement-to-10 formula (10 − sum mod 10) mod 10. Previous weights and formula were internally self-consistent only.
Enforced post-2016 BZSt repetition rule in DeTaxIdRecognizer (no digit may appear more than three times in positions 1–10).
Registered DeLanrRecognizer, DeBsnrRecognizer, DeVatIdRecognizer and DeFuehrerscheinRecognizer in the default registry (previously imported but missing from conf/default_recognizers.yaml, so they were unreachable via the default registry).

Added

ISO 7064 Mod 11,10 structural checksum in DeVatIdRecognizer. Algorithm identical to DeTaxIdRecognizer; widely used by community validators (python-stdnum, VIES-adjacent).
ICAO Doc 9303 MRZ checksum validation in DePassportRecognizer and DeIdCardRecognizer (weights 7, 3, 1 repeating; letters A=10…Z=35; sum mod 10).
Structural validation improvements in DeBsnrRecognizer per KBV Arztnummern-Richtlinie Anlage 1; valid KV regional codes are defined for defense-in-depth/documentation purposes, but unknown prefixes are not currently rejected (no public checksum exists for BSNR).
Turkish PII recognizer for TR_NATIONAL_ID (TCKN) to identify Turkish National Identification Numbers using pattern match, context, and NVI checksum validation. Disabled by default.
Turkish PII recognizer for TR_LICENSE_PLATE (plaka) to identify Turkish vehicle license plates using pattern match, context, and province code validation (01-81). Disabled by default.

2.2.362 - 2026-03-15

General

Added

Published presidio as a PyPI meta-package that installs presidio-analyzer and presidio-anonymizer, making pip install presidio work as expected. Inspired by and thanks to Sakthi Santhosh Anumand and Harsha Vardhan for the original idea. (#1889) (Thanks @Copilot)

Changed

Pinned all CI/CD GitHub Actions and Docker base images to commit SHAs to mitigate supply chain attacks (#1861) (Thanks @Copilot)
Pinned ruff and build pip installs with SHA256 hashes for OSSF scorecard compliance (#1864) (Thanks @Copilot)
Updated GitHub Actions dependencies (actions/checkout, actions/setup-python, actions/setup-dotnet, actions/cache, actions/github-script, actions/dependency-review-action, azure/login, docker/setup-buildx-action, github/codeql-action, microsoft/security-devops-action) and base Python Docker images (#1870, #1871, #1872, #1873, #1874, #1875, #1876, #1877, #1878, #1879, #1885, #1886, #1887, #1895, #1896, #1897, #1898) (Thanks @dependabot)
Updated README to clarify Presidio's no-authentication-by-design stance with security guidance (#1903) (Thanks @Copilot)

Fixed

Broken documentation links (#1856) (Thanks @andyjessen)

Security

Fixed CVE-2024-47874 and CVE-2025-54121 (Starlette vulnerabilities) (#1860) (Thanks @SharonHart)
Fixed CVE-2025-2953 and CVE-2025-3730 (#1859) (Thanks @SharonHart)

Analyzer

Added

UK Driving Licence Number (UK_DRIVING_LICENCE) recognizer with pattern matching and context support
HuggingFaceNerRecognizer for direct NER model inference using HuggingFace pipelines without requiring spaCy (#1834) (Thanks @ultramancode)
Transformer-based MedicalNERRecognizer as a subclass of HuggingFaceNerRecognizer for clinical entity detection (#1853) (Thanks @stevenelliottjr)
US NPI (National Provider Identifier) recognizer with Luhn checksum validation and context support (#1847) (Thanks @stevenelliottjr)
UK Postcode (UK_POSTCODE) recognizer with pattern matching and context support (#1858) (Thanks @tee-jagz)
UK Passport (UK_PASSPORT) and Vehicle Registration (UK_VEHICLE_REGISTRATION) recognizers (#1862) (Thanks @tee-jagz)
Nigerian National Identification Number (NG_NIN) recognizer with Verhoeff checksum validation and Nigerian Vehicle Registration (NG_VEHICLE_REGISTRATION) recognizer (#1863) (Thanks @tee-jagz)
ONNX Runtime backend support for GLiNERRecognizer via load_onnx_model=True parameter, resolving crashes on CPUs without AVX2 support (#1884) (Thanks @Copilot)
Configurable regex execution timeout (default 60 seconds) via REGEX_TIMEOUT_SECONDS environment variable to prevent catastrophic backtracking (#1904) (Thanks @Copilot)
GPU device control via environment variable for explicit GPU/CPU selection (#1844) (Thanks @RonShakutai)
LLM-as-a-judge evaluation integration for assessing PII detection quality (#1900) (Thanks @RonShakutai)
Sampling support for the evaluation framework (#1894) (Thanks @RonShakutai)
Dataset interface for the evaluation framework (#1893) (Thanks @RonShakutai)

Fixed

Erroneous anchor in Italian driver license regex that caused missed matches (#1899) (Thanks @Br1an67)
validation_result type annotation in API docs and type hints (#1869) (Thanks @akios-ai)
Bare except clauses replaced with except Exception for proper exception handling (#1881) (Thanks @haosenwang1018)
Context enhancement substring matching bug where context words were incorrectly matched as substrings (#1827) (Thanks @ravi-jindal)

Image Redactor

Fixed

_process_names unconditionally treating all DICOM metadata as PHI; now correctly filters using both is_patient and is_name checks (#1855) (Thanks @Mr-Neutr0n)

2.2.361 - 2026-02-12

Analyzer

Changed

Fixed context enhancement substring matching bug where context words were incorrectly matched as substrings (e.g., 'lic' matching 'duplicate'). Added configurable context_matching_mode parameter to LemmaContextAwareEnhancer with two options: "substring" (default, maintains backward compatibility for compound words like "creditcard"), and "whole_word" (prevents false positives like 'lic' matching 'duplicate') (#1061)

Added

US_MBI recognizer for Medicare Beneficiary Identifier with pattern matching and context support (#1821) (Thanks @chrisvoncsefalvay)
MAC address recognizer for detecting MAC addresses in various formats (#1829) (Thanks @kyoungbinkim)
Korean Business Registration Number (KR_BRN) recognizer (#1822) (Thanks @RektPunk)
Korean Foreigner Registration Number (KR_FRN) recognizer (#1825) (Thanks @RektPunk)
Korean Driver License (KR_DRIVER_LICENSE) recognizer (#1820) (Thanks @RektPunk)
Korean Passport (KR_PASSPORT) recognizer (#1814) (Thanks @kyoungbinkim)
Thai National ID Number (TH_TNIN) recognizer with format and checksum validation (#1713) (Thanks @pangchewe)
Configurable LangExtract recognizer supporting any LLM provider with custom YAML configurations (#1815) (Thanks @telackey)
Azure OpenAI support for LangExtract recognizer with managed identity authentication for GPT-4o, GPT-4, etc. (#1801) (Thanks @dorlugasigal)
Batch processing support in REST API - accepts arrays of texts and returns arrays of results with backward compatibility (#1806) (Thanks @telackey)
GPU device control via PRESIDIO_DEVICE environment variable for explicit GPU/CPU selection (#1843) (Thanks @RonShakutai)
Support for multiple recognizer instances from same class via class_name parameter (#1819) (Thanks @RonShakutai)
Pydantic-based YAML configuration validation with ConfigurationValidator class for improved reliability and error reporting (#1780) (Thanks @omri374)
Japanese and Chinese mobile number test cases for PhoneRecognizer (#1808) (Thanks @WenwenHLF)

Changed

GPU optimizations with DeviceDetector singleton providing 4-10x performance improvements for GLiNER, Transformers, and Stanza engines (#1812) (Thanks @RonShakutai)
Configurable extraction parameters for LangExtract recognizers via YAML (max_char_buffer, timeout, num_ctx, fence_output, use_schema_constraints) (#1811) (Thanks @RonShakutai)
Lazy initialization for device detector singleton (#1831) (Thanks @RonShakutai)
Simplified IBAN regex pattern from 8 to 3 capture groups for better performance (#1818) (Thanks @Copilot)
Improved Korean RRN regex pattern with negative lookahead/lookbehind and gender digit validation (#1807) (Thanks @kyoungbinkim)

Fixed

GLiNER GPU inference by properly passing map_location parameter (#1813) (Thanks @eveningcafe)
GLiNER text truncation issue during processing (#1805) (Thanks @jedheaj314)
IBAN regex trailing character handling to prevent false matches (#1818) (Thanks @Copilot)
Python 3.10 build compatibility by pinning onnxruntime <1.24.1 for Python 3.10 (#1848) (Thanks @SharonHart)
TypeError in third-party recognizers by removing invalid **kwargs from init methods (#1800) (Thanks @RonShakutai)
Pattern recognizer example language specification (#1835) (Thanks @andyjessen)

Anonymizer

Changed

BREAKING CHANGE: Hash operator now uses random salt by default to prevent brute-force and dictionary attacks. Same PII values will produce different hashes unless a salt parameter is explicitly provided. Users requiring referential integrity must provide their own salt. Minimum salt length: 16 bytes. See documentation for migration guide. (#1846) (Thanks @Copilot)
Updated cryptography dependency to >=46.0.4 to address CVE-2025-15467 security vulnerability (#1841) (Thanks @Copilot)

General

Added

GPU acceleration documentation guide with setup and usage instructions (#1826) (Thanks @dilshad-aee)
Telemetry redaction sample demonstrating PII removal from telemetry data (#1824) (Thanks @Jakob-98)

Changed

Migrated CI workflows (lint, dependency review, release) to ubuntu-slim runners for improved efficiency (#1840) (Thanks @Copilot)
Updated actions/cache from v4 to v5 with Node.js 24 runtime support (#1817) (Thanks @dependabot)

Image Redactor

Changed

DICOM: use_metadata will now use both is_patient and is_name to generate the PHI list of words via change to _make_phi_list.
Image Redactor: Added redact_and_return_bbox method to ImageRedactorEngine, which returns both the redacted image and the detected bounding boxes for redacted regions.

2.2.360 - 2025-09-09

Analyzer

Added

Korean Resident Registration Number (RRN) recognizer with checksum validation for numbers issued prior to October 2020 (#1675) (Thanks @siwoo-jung)
Azure Health Data Services (AHDS) de-identification service integration as a remote recognizer with Entra ID authentication (#1624) (Thanks @rishasurana)
Comprehensive input validation methods for NlpEngineProvider to ensure valid arguments for engines, configuration, and file paths (#1653) (Thanks @siwoo-jung)

Changed

Updated Indian Aadhaar recognizer to support contextual delimiters (-, :, space) for improved detection accuracy (#1677) (Thanks @K3y5tr0ke)
Fixed Italian Driver License recognizer regex to include missing characters per government requirements, excluding only A, O, Q, I (#1651) (Thanks @K3y5tr0ke)
Refactored recognizers folder structure for better organization and maintainability (#1670) (Thanks @omri374)

Anonymizer

Added

Azure Health Data Services (AHDS) Surrogate anonymization operator with medical domain expertise for realistic PHI surrogate generation (#1672) (Thanks @rishasurana)

Changed

Fixed code indentation issues in encrypt.py for better code quality (#1660) (Thanks @aliyss)

General

Added

Comprehensive GitHub Copilot instructions with development guidelines, build processes, and e2e testing procedures (#1693) (Thanks @Copilot)
New GitHub Actions CI & release workflows with multi-platform Docker image support for AMD64 and ARM64 architectures (#1697) (Thanks @tamirkamara)
Dual-path CI workflow to fix GitHub Actions failures for external contributors by auto-detecting fork vs. main repository PRs (#1708) (Thanks @Copilot)
OIDC trusted publishing for PyPI releases eliminating manual API token management and enhancing security (#1702) (Thanks @Copilot)
Comprehensive YAML and Python examples for context-aware recognizers documentation (#1710) (Thanks @MRADULTRIPATHI)

Changed

Updated actions/checkout from v4 to v5 to support Node.js 24 runtime (#1699) (Thanks @dependabot)
Fixed PR template to use proper GitHub issue linking syntax for automatic issue association and closing (#1701) (Thanks @Copilot)
Updated LiteLLM documentation with detailed guide links for better integration guidance (#1698) (Thanks @BhargavDT)
Fixed broken links in CONTRIBUTING.md and developing recognizers documentation after recognizers refactoring (#1674) (Thanks @siwoo-jung)
Fixed OpenSSF badge embedding in README.MD for proper display (#1673) (Thanks @SharonHart)
Removed Terrascan from Microsoft Defender for DevOps workflow to eliminate false positives on non-IAC repository (#1691) (Thanks @Copilot)

Security

Updated Streamlit and PyTorch dependency versions to fix CVE vulnerabilities (#1685) (Thanks @SharonHart)
Updated requests library to mitigate security vulnerability GHSA-9hjg-9r4m-mvj7 (#1683) (Thanks @SharonHart)
Locked pandas dependency in Streamlit to prevent version conflicts (#1689) (Thanks @SharonHart)

2.2.359 - 2025-07-06

Analyzer

Allow loading of StanzaRecognizer when StanzaNlpEngine is configured, improving NLP engine flexibility (#1643) (Thanks @omri374)
Excluded recognition_metadata attribute from REST Analyze Response DTO to clean up API responses (#1627) (Thanks @SharonHart)
Added ISO 8601 support to DateRecognizer for improved date parsing (#1621) (Thanks @StefH)
Prevented misidentification of 13-digit timestamps as credit cards (#1609) (Thanks @eagle-p)
Updated analyzer_engine_provider.md for clarity and completeness (#1590) (Thanks @AvinandanBandyopadhyay)
Bumped python from 3.9 to 3.12 in presidio-analyzer Dockerfile (#1583) (Thanks @dependabot)
Bumped phonenumbers version for improved validation and parsing (#1579) (Thanks @omri374)
Refactored InstanceCounterAnonymizer to simplify index retrieval logic (#1577) (Thanks @ShakutaiGit)
Fixed issue #1574 to support as_tuples in relevant functions (#1575) (Thanks @omri374)
Updated initial scores in IN_PAN for better recognition performance (#1565) (Thanks @omri374)
Added accelerate as a missing build dependency to fix build failures (#1564) (Thanks @SharonHart)
Don't set a default for LABELS_TO_IGNORE if not specified, to avoid unintended behavior (#1563) (Thanks @SharonHart)
Updated 08_no_code.md for documentation improvements (#1561) (Thanks @alan-insam)
Added the ability to disable the NLP recognizer via configuration (#1558) (Thanks @omri374)
Removed 'class' from API documentation for clarity (#1554) (Thanks @omri374)
Set country-specific default recognizers to enabled=false for safer defaults (#1586) (Thanks @omri374)
Most country specific recognizers that expect English were put as optional to avoid false positives, and would not work out-of-the-box (#1586). Specifically:
- SgFinRecognizer
- AuAbnRecognizer
- AuAcnRecognizer
- AuTfnRecognizer
- AuMedicareRecognizer
- InPanRecognizer
- InAadhaarRecognizer
- InVehicleRegistrationRecognizer
- InPassportRecognizer
- EsNifRecognizer
- InVoterRecognizer
To re-enable them, either change the default YAML to have them as enabled: true, or via code, add them to the recognizer registry manually.
- Yaml based: see more here: YAML based configuration.
- Code based:
```
from presidio_analyzer import AnalyzerEngine
from presidio_analyzer.predefined_recognizers import AuAbnRecognizer

# Initialize an analyzer engine with the recognizer registry
analyzer = AnalyzerEngine()

# Create an instance of the AuAbnRecognizer
au_abn_recognizer = AuAbnRecognizer()

# Add the recognizer to the registry
analyzer.registry.add_recognizer(au_abn_recognizer)
```

Anonymizer

Update python base image to 3.13 (#1612) (Thanks @dependabot[bot])
Bumped python from 3.12-windowsservercore to 3.13-windowsservercore in presidio-anonymizer Dockerfile (#1612) (Thanks @dependabot)
Ensured anonymizer sorts analyzer results input by start and end for correct whitespace merging (#1588) (Thanks @mkh1991)
Bumped python from 3.9 to 3.12 in presidio-anonymizer Dockerfile (#1582) (Thanks @dependabot)

Image Redactor

Bumped python from 3.12-slim to 3.13-slim in presidio-image-redactor Dockerfile (#1611) (Thanks @dependabot)
Bumped python from 3.10 to 3.12 in presidio-image-redactor Dockerfile (#1581) (Thanks @dependabot)

General

Fixed typographical errors in documentation files for better clarity (#1637) (Thanks @kilavvy)
Corrected spelling mistakes across code comments and documentation for improved readability (#1636) (Thanks @leopardracer)
Fixed typos in documentation and test descriptions, enhancing clarity and consistency in the codebase (#1631) (Thanks @zeevick10)
Corrected typos in docstrings and comments to maintain documentation quality (#1630) (Thanks @kilavvy)
Fixed typos in documentation and test descriptions, ensuring accurate references and descriptions (#1628) (Thanks @leopardracer)
Removed unnecessary run.bat script from the repository (#1626) (Thanks @SharonHart)
Added "/TestResults" to .gitignore file to prevent test result artifacts from being committed (#1622) (Thanks @StefH)
Added links to the discussion board about Docker prebuilt images to documentation (#1614) (Thanks @omri374)
Fixed spelling, grammar, and style issues in Presidio V2 documentation (#1610) (Thanks @Vruddhi18)
Updated .gitignore to include the .vs folder (#1608) (Thanks @StefH)
Fixed typo in api-docs.yml to improve documentation accuracy (#1602) (Thanks @StefH)
Reverted a previous update to codeql-analysis.yml to restore earlier configuration (#1595) (Thanks @SharonHart)
Updated codeql-analysis.yml for improved code scanning configuration (#1594) (Thanks @SharonHart)
Fixed paths-ignore in codeql-analysis.yml to refine scanning scope (#1593) (Thanks @SharonHart)
Ignored docs/ directory in CodeQL analysis to prevent unnecessary scanning (#1592) (Thanks @SharonHart)
Fixed minor typos in code and documentation (#1585) (Thanks @omahs)
Restored dependabot scanning for security and dependency updates (#1580) (Thanks @SharonHart)
Added SUPPORT.md file to provide support information to users (#1568) (Thanks @omri374)

2.2.358 - 2025-03-18

Analyzer

Fixed: Updated URL regex pattern to correctly exclude trailing single (') and double (") quotes from matched URLs.
Drop dependency of spacy_stanza package, and add supporting code to stanza_nlp_engine, to support recent stanza versions
Add parameters to allow users to define the number of processes and batch size when running BatchAnalyzerEngine.
Fix InPassportRecognizer regex recognizer

Anonymizer

Changed: Deprecate MD5 hash type option, defaulting into sha256.
Replace crypto package dependency from pycryptodom to cryptography
Remove azure-core dependency from anonymizer

Image Redactor

Changed: Updated the return type annotation of ocr_bboxes in verify_dicom_instance() from dict to list.

Presidio Structured

General

Updated the Evaluating DICOM Redaction documentation to reflect changes in verify_dicom_instance() within the DicomImagePiiVerifyEngine class.

2.2.357 - 2025-01-13

Analyzer

Example GLiNER integration (#1504)

General

Docs revamp and docstring bug fixes (#1500)
Minor updates to the mkdocstrings config (#1503)

2.2.356 - 2024-12-15

Analyzer

Added logic to handle phone numbers with country code (#1426) (Thanks @kauabh)
Added UK National Insurance Number Recognizer (#1446) (Thanks @hhobson)
Fixed regex match_time output (#1488) (Thanks @andrewisplinghoff)
Added fix to ensure configuration files are closed properly when loading them (#1423) (Thanks @saulbein)
Closing handles for YAML file (#1424) (Thanks @roeybc)
Reduce memory usage of Analyzer test suite (#1429) (Thanks @hhobson)
Added batch_size parameter to BatchAnalyzerEngine (#1449) (Thanks @roeybc)
Remove ignored labels from supported entities (#1454) (Thanks @omri374)
Update US_SSN CONTEXT and unit test (#1455) (Thanks @claesmk)
Fixed bug with Azure AI language context (#1458) (Thanks @omri374)
Add support for allow_list, allow_list_match, regex_flags in REST API (#1484) (Thanks @hdw868)
Add a link to model classes to simplify configuration (#1472) (Thanks @omri374)
Restricting spacy.cli for version 3.7.0 (#1495) (Thanks @kshitijcode)

Anonymizer

No changes specified for Anonymizer in this release.

Presidio-Structured

Fix presidio-structured build - lock numpy version (#1465) (Thanks @SharonHart)

Image Redactor

Fix bug with image conversion (#1445) (Thanks @omri374)

General

Removed Python 3.8 support (EOL) and added 3.12 (#1479) (Thanks @omri374)
Update Docker build to use gunicorn for containers (#1497) (Thanks @RKapadia01)
New Dev containers for analyzer, analyzer+transformers, anonymizer (#1459) (Thanks @roeybc)
Added dev containers for: analyzer, analyzer+transformers, anonymizer, and image redaction (#1450) (Thanks @roeybc)
Added support for allow_list, allow_list_match, regex_flags in REST API (#1488) (Thanks @hdw868)
Typo fix in if condition (#1419) (Thanks @omri374)
Minor notebook changes (#1420) (Thanks @omri374)
Do not release presidio-cli as part of the release pipeline (#1422) (Thanks @SharonHart)
(Docs) Use Presidio across Anthropic, Bedrock, VertexAI, Azure OpenAI, etc. with LiteLLM Proxy (#1421) (Thanks @krrishdholakia)
Update CI due to DockerCompose project name issue (#1428) (Thanks @omri374)
Update docker-compose installation docs (#1439) (Thanks @MWest2020)
Fix space typo in docs (#1459) (Thanks @artfuldev)
Unlock numpy after dropping 3.8 (#1480) (Thanks @SharonHart)

2.2.355 - 2024-10-28

Added

Docs

Add a link to HashiCorp vault operator resource (#1468) (Thanks Akshay Karle)

Changed

Analyzer

Updates to the transformers conf docs and yaml file (#1467)

Docs

docs: clarify the docs on deploying presidio to k8s (#1453) (Thanks Roel Fauconnier)

2.2.355 - July 9th 2024

Note: A new YAML based mechanism has been added to support no-code customization and creation of recognizers. The default recognizers are now automatically loaded from file.

Added

Analyzer

Recognizer for Spanish Foreigners Identity Code (NIE Numero de Identificacion de Extranjeros).
Recognizer for Finnish Personal Identity Codes (Henkilötunnus) (#1394) (Thanks honderr).
New Predefined Recognizer for Indian Passport #1350 (#1351) (Thanks Hiten-98)
Add new recognizer for IN_VOTER #1344 (#1345) (Thanks kjdeveloper8)
Spanish NIE (Foreigners ID card) recognizer (#1359) (Thanks areyesfalcon)
Added regex functionality for allow lists in the analyzer (#1357) (Thanks NarekAra)
Loading analyzer engine & recognizer registry from configuration file (#1367)
Align ports with documentation and postman collection. (#1375) (Thanks ungana)
Analyzer documentation (#1384)
Fix the entity filtering of the transformer_recognizer.py analzye function (#1403) (Thanks andreas-eberle)

Changed

Analyzer

Update conf files location (#1358)
Fix OverflowError in crypto_recognizer (#1377)
Improve url detector (#1398) (Thanks afogel)
Update Dockerfile.windows (#1413) (thanks markvantilburg)
Changing predefined recognizers to use the config file (#1393) (Thanks RoeyBC)

Anonymizer

Update Dockerfile.windows (#1414) (thanks markvantilburg)

General

Add Ruff linter + Apply Ruff fix (#1379)
Auto-formatting, fix D rules (#1381)
Fix N818, E721 (#1382)
Migrate Python Packaging to pyproject.toml (#1383)
From Pipenv to Poetry (#1391)
Fix ports in docs (#1408)

2.2.353 - March 31st 2024

Added

Analyzer

Support 'M' prefix in SG_NRIC_FIN Recognizer and expand tests (#1304) (Thanks @miltonsim)
Add Bech32 and Bech32m Bitcoin Address Validation in Crypto Recognizer and expand tests (#1307) (Thanks @miltonsim)
Predefined pattern recognizer : IN_VEHICLE_REGISTRATION (#1288) (Thanks @devopam)
Addition of leniency parameter in predefined PhoneRecognizer (#1311) (Thanks @VMD7)
Add Singapore UEN Recognizer (#1315) (Thanks @miltonsim)
Update spacy_stanza.md (#1325) (Thanks @AndreasThinks)
Adding Span Marker Recognizer Sample (#1321) (Thanks @VMD7)
Cache compiled regexes in analyzer (#1335) (Thanks @Edward-Upton)

Anonymizer

Added pseudonimyzation sample (#1296)

Image redactor

Added tesseract to installation (#1312)

Structured

Analysis builder improvements (#1295) (Thanks @ebotiab)
Implement user-defined entity selection strategies in Presidio Structured (#1319) (Thanks @miltonsim)

Changed

Analyzer

Fix for incorrectly referenced recognizer in analysis_explaination using PhoneRecognizer (#1330) *Thanks @egillv021)
Fix bug where "bank" and "check" wouldn't work (#1333) (Thanks @usr-ein and @Samuel Prevost)
Bugfix in tutorial (#1310)
Changed default aggregation_strategy to max (#1342)

Image Redactor

Fixed wrong condition for dicom metadata (#1347)

2.2.353 - Feb 12th 2024

Added

Analyzer

Add predefined_recognizer: IN_AADHAAR (#1256)

Anonymizer

Added the option to add custom operators + pseudonymization sample (#1284)

Changed

Analyzer

Fix failing test due to optional package (#1258)
Update publish-to-pypi.yml (#1259)
Allow local Spacy Models to be loaded in NLP Engine (#1269)
Upgrade pip in windows containers (#1272)

Image Redactor

Bugfix in ImageAnalyzerEngine #1274

2.2.352 - Jan 22nd 2024

Added

Structured

Added alpha of presidio-structured, a library (presidio-structured) which re-uses existing logic from existing presidio components to allow anonymization of (semi-)structured data. (#1192)

Analyzer

Add PL PESEL recognizer (#1209)
Azure AI language recognizer (#1228)
Add_conf_to_package_data (#1243)

Anonymizer

Add keep operator as deanonymizer (#1255)
Update anonymize_list type hints and document that sometimes items will be ignored. (#1252)

General

Add Dockerfile for Windows containers (#1194)

Changed

Analyzer

Drop WA driver license number (#1214)
Change ner_model_configuration from list to map (#1222)
Bugfix in SpacyRecognizer (#1221)
Bugfix in NerModelConfiguration (#1230)
Add_conf_to_package_data (#1243)

Anonymizer

Improved the logic of conflict handling in AnonymizerEngine (#1196)

Image Redactor

Change default score threshold in image redactor (#1210)
fixes bug #1227 (#1231)
Added missing dependencies for opencv-python and azure forms recognizer (#1257)

General

Remove inclusive-lint step (#1207)
Updates to demo website with new NLP Engine (#1181)

2.2.351 - Nov. 6th 2024

Changed

Analyzer

Hotfix for NerModelConfiguration not created correctly (#1208)

2.2.350 - Nov. 2nd 2024

Changed

Analyzer

Hotfix: default.yaml is not parsed correctly (#1202)

2.2.35 - Nov. 2nd 2024

Changed

Analyzer

Put org in ignore as it has many FPs (#1200)

2.2.34 - Oct. 30th 2024

Added

Analyzer

New Predefined Recognizer: IN_PAN (#1100)

Anonymizer

Anonymizer - Pass bytes key to Encrypt / Decrypt (#1147)

Image redactor

DICOM redactor improvement: Enabling more photometric interpretations (#1103)
DICOM redactor improvement: Adding exceptions for when DICOM file does not have pixel data (#1104)
Small reordering of kwargs as prereq for allow list functionality (#1110)
DICOM redactor improvement: Preventing distortion when multiple sets of pixels are in one instance (#1109)
DICOM redactor improvement: Enabling compatibility with compressed images (#1105)
DICOM redactor improvement: Enable return of redacted bboxes (#1111)
DICOM redactor improvement: Enable selection of redact approach (#1113)
Enable toggle of printing output location after redacting from file (#1144)
Changing test exception type check (#1148)
Enabling allow list approach with all image redaction (#1145)
Improve process names method in DICOM image redactor (#1150)
Adding examples of toggling metadata usage and saving bboxes (#1158)
Updating verification engines to include latest updates to redactor engines (#1162)
Improved bbox processor (#1163)
Updating verification engines and enable plotting of custom bboxes (#1164)
Added image processing class to preprocess the image before running OCR (#1166)
Added support for Microsoft's document intelligence OCR

Changed

Analyzer

Refactored the NlpEngine and Ner recognizers (SpacyRecognizer, TransformersRecognizer, StanzaRecognizer) to allow simpler integration of huggingface and transformers models (#1159). This includes:
- Changes in how NER results flow through Presidio (see docs)
- NER/model definition is now defined using a conf file or a NerModelConfiguration object.
- Integrated spacy-huggingface-pipelines for a more robust integration of huggingface models.
As a result, SpacyRecognizer logic has changed, please see #1159. Some fields within the class are now deprecated.
Updated type checks (#1175)
Enabled regex flags manipulation (#1193)

Anonymizer

Initial logic check for merging 2 entities (#1092)
Fix Sphinx warning in OperatorConfig (#1143)
Fix type mismatch in check_label_groups parameter in spacy_recognizer (#1130)
anonymize_list return type hint fix (#1178)

General

We no longer use Pipenv.lock. Locking happens as part of the CI. (#1152)
Changed the ACR instance (#1089)
Updated to Cred Scan V3 (#1154)

2.2.33 - June 1st 2023

Added

Anonymizer

Added keep, an no-op anonymizer that allows preserving some types of PII while keeping track of its position in anonymized output. (#1062)
Added BatchAnonymizerEngine to complement the BatchAnalyzerEngine for lists, and dicts (#993)

General

Drop support for Python 3.7
Add support for Python 3.11
New demo app for Presidio, based on Streamlit (#1054)
GPT based synthetic data generation (#1051)

2.2.32 - 25.01.2023

Changed

General

Updated dependencies

Analyzer

Fixed exception on whitespace in AU recognizers
Updated API version for Text Analytics in sample

Anonymizer

Fixed merge entity from the same type

Image redactor

Modified ImagePiiVerifyEngine to allow passing of kwargs
Updated template for building image redactor yaml
Updated all image redactor engines and OCR classes to allow passing of an OCR confidence threshold and other OCR parameters
Moved general bounding box operations to new class BboxProcessor
Updated presidio-image-redactor version from 0.0.45 to 0.0.46

Added

Analyzer

Added revised example for transformer recognizer

Image redactor

Added evaluation code for the DICOM image redaction capabilities
REST API to support web applications payload

General

Updated documentation to include instructions on using DICOM evaluation code
Updated documentation to mention OCR thresholding

2.2.31 - 14.12.2022

Changed

Image-Redactor

Added DICOM image redaction capabilities (DicomImageRedactorEngine class and tests)
Updated setup.py to include new required packages for DICOM capabilities
Updated Pipfile and Pipfile.lock
Updated presidio-image-redactor version from 0.0.44 to 0.0.45
Updated the ImagePiiVerifyEngine class to allow use of custom analyzer engines

General

Updated NOTICE to include licenses of added packages
Updated docs with getting started code for new DicomImageRedactorEngine

2.2.30 - 25.10.2022

Added

Analyzer

Added Italian fiscal code recognizer
Added Italian driver license recognizer
Added Italian identity card recognizer
Added Italian passport recognizer
Added TransformersNlpEngine to support transformer based NER models within spaCy pipelines
Added pattern for next gen US passport in presidio-analyzer/presidio_analyzer/predefined_recognizers/us_passport_recognizer.py

Changed

Analyzer

Improved MEDICAL_LICENSE pattern and fixed checksum verification
Bugfix for context handling by aligning results to recognizers using a unique identifier and not recognizer name
Updated Pipfile.lock

Anonymizer

Removed constraint on empty texts

Image-Redactor

Updated Pipfile.lock

General

Updated pipenv version
Updated black and flake8 in pre-commit scripts
Updated docs for NLP engine

2.2.29 - 12.07.2022

Added

General

Added Presidio to OSSF (Open Source Security Foundation)
Added CodeQL scanning

Analyzer

Introduced BatchAnalyzerEngine
Added allow-list functionality to ignore specific strings
Added notebook on anonymizing known values
Added sample for using transformers models in Presidio

Changed

Anonymizer

Bug fix for getting the text before anonymizing (#890)

Image redactor

Deps update

2.2.28 - 04.05.2022

Changed

Analyzer

Improved deny-list regex and customizability
Added documentation for existing spaCy models
Bugfix in analysis explanation scores

Image redactor

PIL version updated to 9.0.1

Added

Analyzer

Recognizers can be loaded from YAML

2.2.27 - 08.03.2022

Changed

Analyzer

Improved context mechanisms to support recognizer level context enhacenement and cross-entity context support

2.2.26 - 23.02.2022

Changed

Analyzer

Bug fix in context support

2.2.25 - 21.02.2022

Changed

Analyzer

Added a URL recognizer
Added a new capability for creating new logic for context detection. See ContextAwareEnhancer and LemmaContextAwareEnhancer. Documentation would be added on a future release. Furthermore, it is now possible to pass context words thruogh the analyze method (or via API) and those would be taken into account for context enhancement.

Anonymizer

Bug fix for entities at the end of a sentence.

Docs

Formatted (black/flake8) the Python examples.

Removed

Analyzer

Removed the DOMAIN_NAME recognizer. This change means that the DOMAIN_NAME entity is no longer returned by Presidio. URL would be returned instead, and would catch full addresses and not just domain names (https://www.microsoft.com/a/b.html and not just www.microsoft.com)

2.2.24 - 23.01.2022

Changed

Fixed issue when IBAN followed by all caps can't be recognized
Updated dependencies in Pipfile.lock
Removed official Python 3.6 support and added support for 3.10
Added docs for creating a streamlit app
Added docs for using Flair

Removed

Deprecated

2.2.23 - 16.11.2021

Changed

Analyzer:

Added multi-regional phone number recognizer.
Fixed duplicated entities removal.
Added sample for structured / semi-structured data in batch.
Dependencies version bumps.

Anonymizer:

Added sample for getting an identified entity value using a custom Operator.
Changed packages/imports .
Added repr to classes.
Added encryption and decryption samples.
Remove AnonymizerResult in favor of OperatorResult, for an easier anonymization-deanonymization.
Anonymizaer and Deanonymizaer to return operator_name instead of operator in OperatorResult.

2.2.2 - 09.06.2021

Changed

Analyzer:

Databricks based template in Azure Data Factory docs
Adding ORGANIZATION recognizer docs
Bumped pydantic from 1.7.3 to 1.7.4
Updated call to stanza via spacy-stanza
Added DATE_TIME recognizer
Added Medical Licence recognizer
Bumped spacy from 3.0.5 to 3.0.6

2.2.1 - 10.05.2021

Changed

Analyzer:

Create CODE_OF_CONDUCT
ADF templates docs
Fix spark sample to run presidio in broadcast
Ad-hoc recognizers
Text Analytics Integration Sample
Documentation update and samples validation
Adding tagger to the spaCy model pipeline
Sample notebook for remote recognizer (using Text Analytics)
Add matplotlib to image-redactor
Added custom lambda anonymizer
Added add pii_verify_engine to the image-redactor

[2.2.0] - 12.04.2021

Changed

Analyzer:

Upgrade Analyzer spacy version to 3.0.5

Anonymizer Engine:

Request entity AnonymizerConfig renamed OperatorConfig
- In OperatorConfig: anonymizer_name -> operator_name
Response entity AnonymizerResult renamed to EngineResult
- In EngineResult: List[AnonymizedEntity] -> List[OperatorResult]
- In OperatorResult:
  - anonymizer -> operator
  - anonymized_text -> text

Anonymize API:

Response entity anonymizer renamed to operator.
Response entity anonymizer_text renamed to text.

Deanonymize:

New endpoint for deanonymizing encrypted entities by the anonymizer.

Unreleased

Fixed

Fixed an issue where the CreditCardRecognizer regex could incorrectly identify 13-digit Unix timestamps as credit card numbers. Validated that 13 digit numbers that start with 1 and have no separators (e.g. 1748503543012) are not flagged as credit cards.
Enhance NlpEngineProvider with validation methods for NLP engines, configuration, and conf file path.
Added Korean Resident Registration Number (RRN) recognizer (KrRrnRecognizer).
Added Thai National ID Number (TNIN) recognizer (ThTninRecognizer).

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

unreleased

Analyzer

Added

Fixed

Added

2.2.362 - 2026-03-15

General

Added

Changed

Fixed

Security

Analyzer

Added

Fixed

Image Redactor

Fixed

2.2.361 - 2026-02-12

Analyzer

Changed

Added

Changed

Fixed

Anonymizer

Changed

General

Added

Changed

Image Redactor

Changed

2.2.360 - 2025-09-09

Analyzer

Added

Changed

Anonymizer

Added

Changed

General

Added

Changed

Security

2.2.359 - 2025-07-06

Analyzer

Anonymizer

Image Redactor

General

2.2.358 - 2025-03-18

Analyzer

Anonymizer

Image Redactor

Presidio Structured

General

2.2.357 - 2025-01-13

Analyzer

General

2.2.356 - 2024-12-15

Analyzer

Anonymizer

Presidio-Structured

Image Redactor

General

2.2.355 - 2024-10-28

Added

Docs

Changed

Analyzer

Docs

2.2.355 - July 9th 2024

Added

Analyzer

Changed

Analyzer

Anonymizer

General