All notable changes to this project will be documented in this file.
-
Canadian SIN (
CA_SIN) recognizer for the Canadian Social Insurance Number, using regex pattern matching, context words (English and French), and Luhn checksum validation. Disabled by default. -
Swedish PII recognizers for
SE_PERSONNUMMERto identify Swedish Personal ID Numbers using pattern match and checksum. The recognizer also supports Swedish coordination numbers (samordningsnummer), issued to individuals who are not registered residents in Sweden but require identification. All disabled by default. -
German PII recognizers for
DE_TAX_ID(Steueridentifikationsnummer, §§ 139a–139e AO, ISO 7064 Mod 11,10 checksum),DE_TAX_NUMBER(Steuernummer, § 139a AO, ELSTER and slash formats),DE_PASSPORT(Reisepassnummer, PassG § 4, ICAO Doc 9303),DE_ID_CARD(Personalausweisnummer, PAuswG),DE_SOCIAL_SECURITY(Rentenversicherungsnummer, § 147 SGB VI, DRV checksum),DE_HEALTH_INSURANCE(Krankenversicherungsnummer/KVNR, § 290 SGB V, GKV checksum),DE_KFZ(KFZ-Kennzeichen, FZV § 8),DE_HANDELSREGISTER(Handelsregisternummer HRA/HRB, §§ 9/14 HGB), andDE_PLZ(Postleitzahl, very low base confidence, context-only). All disabled by default. -
Added recognizer for Swedish Organisationsnummer, ID number for all Swedish oragnisations.
-
Added recognizer for Spanish Passport (
ES_PASSPORT).
- Fixed incorrect Prüfziffer algorithm in
DeHealthInsuranceRecognizer(KVNR); now uses alternating factors [1,2,…,1,2] per § 290 SGB V Anlage 1 (#1972). - Fixed incorrect check-digit weights in
DeSocialSecurityRecognizer(RVNR); now uses VKVV § 4 weights [2,1,2,5,7,1,2,1,2,1,2,1]. Previous weights diverged from the Deutsche Rentenversicherung specification and rejected the canonical DRV example 15070649C103. - Fixed incorrect check-digit algorithm in
DeLanrRecognizer; now uses KBV Arztnummern-Richtlinie weights [4,9,4,9,4,9] without the spurious Quersumme step, and the complement-to-10 formula(10 − sum mod 10) mod 10. Previous weights and formula were internally self-consistent only. - Enforced post-2016 BZSt repetition rule in
DeTaxIdRecognizer(no digit may appear more than three times in positions 1–10). - Registered
DeLanrRecognizer,DeBsnrRecognizer,DeVatIdRecognizerandDeFuehrerscheinRecognizerin the default registry (previously imported but missing fromconf/default_recognizers.yaml, so they were unreachable via the default registry).
- ISO 7064 Mod 11,10 structural checksum in
DeVatIdRecognizer. Algorithm identical toDeTaxIdRecognizer; widely used by community validators (python-stdnum, VIES-adjacent). - ICAO Doc 9303 MRZ checksum validation in
DePassportRecognizerandDeIdCardRecognizer(weights 7, 3, 1 repeating; letters A=10…Z=35; sum mod 10). - Structural validation improvements in
DeBsnrRecognizerper KBV Arztnummern-Richtlinie Anlage 1; valid KV regional codes are defined for defense-in-depth/documentation purposes, but unknown prefixes are not currently rejected (no public checksum exists for BSNR). - Turkish PII recognizer for
TR_NATIONAL_ID(TCKN) to identify Turkish National Identification Numbers using pattern match, context, and NVI checksum validation. Disabled by default. - Turkish PII recognizer for
TR_LICENSE_PLATE(plaka) to identify Turkish vehicle license plates using pattern match, context, and province code validation (01-81). Disabled by default.
2.2.362 - 2026-03-15
- Published
presidioas a PyPI meta-package that installspresidio-analyzerandpresidio-anonymizer, makingpip install presidiowork as expected. Inspired by and thanks to Sakthi Santhosh Anumand and Harsha Vardhan for the original idea. (#1889) (Thanks @Copilot)
- Pinned all CI/CD GitHub Actions and Docker base images to commit SHAs to mitigate supply chain attacks (#1861) (Thanks @Copilot)
- Pinned
ruffandbuildpip installs with SHA256 hashes for OSSF scorecard compliance (#1864) (Thanks @Copilot) - Updated GitHub Actions dependencies (
actions/checkout,actions/setup-python,actions/setup-dotnet,actions/cache,actions/github-script,actions/dependency-review-action,azure/login,docker/setup-buildx-action,github/codeql-action,microsoft/security-devops-action) and base Python Docker images (#1870, #1871, #1872, #1873, #1874, #1875, #1876, #1877, #1878, #1879, #1885, #1886, #1887, #1895, #1896, #1897, #1898) (Thanks @dependabot) - Updated README to clarify Presidio's no-authentication-by-design stance with security guidance (#1903) (Thanks @Copilot)
- Broken documentation links (#1856) (Thanks @andyjessen)
- Fixed CVE-2024-47874 and CVE-2025-54121 (Starlette vulnerabilities) (#1860) (Thanks @SharonHart)
- Fixed CVE-2025-2953 and CVE-2025-3730 (#1859) (Thanks @SharonHart)
- UK Driving Licence Number (UK_DRIVING_LICENCE) recognizer with pattern matching and context support
HuggingFaceNerRecognizerfor direct NER model inference using HuggingFace pipelines without requiring spaCy (#1834) (Thanks @ultramancode)- Transformer-based
MedicalNERRecognizeras a subclass ofHuggingFaceNerRecognizerfor clinical entity detection (#1853) (Thanks @stevenelliottjr) - US NPI (National Provider Identifier) recognizer with Luhn checksum validation and context support (#1847) (Thanks @stevenelliottjr)
- UK Postcode (UK_POSTCODE) recognizer with pattern matching and context support (#1858) (Thanks @tee-jagz)
- UK Passport (UK_PASSPORT) and Vehicle Registration (UK_VEHICLE_REGISTRATION) recognizers (#1862) (Thanks @tee-jagz)
- Nigerian National Identification Number (NG_NIN) recognizer with Verhoeff checksum validation and Nigerian Vehicle Registration (NG_VEHICLE_REGISTRATION) recognizer (#1863) (Thanks @tee-jagz)
- ONNX Runtime backend support for
GLiNERRecognizerviaload_onnx_model=Trueparameter, resolving crashes on CPUs without AVX2 support (#1884) (Thanks @Copilot) - Configurable regex execution timeout (default 60 seconds) via
REGEX_TIMEOUT_SECONDSenvironment variable to prevent catastrophic backtracking (#1904) (Thanks @Copilot) - GPU device control via environment variable for explicit GPU/CPU selection (#1844) (Thanks @RonShakutai)
- LLM-as-a-judge evaluation integration for assessing PII detection quality (#1900) (Thanks @RonShakutai)
- Sampling support for the evaluation framework (#1894) (Thanks @RonShakutai)
- Dataset interface for the evaluation framework (#1893) (Thanks @RonShakutai)
- Erroneous anchor in Italian driver license regex that caused missed matches (#1899) (Thanks @Br1an67)
validation_resulttype annotation in API docs and type hints (#1869) (Thanks @akios-ai)- Bare
exceptclauses replaced withexcept Exceptionfor proper exception handling (#1881) (Thanks @haosenwang1018) - Context enhancement substring matching bug where context words were incorrectly matched as substrings (#1827) (Thanks @ravi-jindal)
_process_namesunconditionally treating all DICOM metadata as PHI; now correctly filters using bothis_patientandis_namechecks (#1855) (Thanks @Mr-Neutr0n)
2.2.361 - 2026-02-12
- Fixed context enhancement substring matching bug where context words were incorrectly matched as substrings (e.g., 'lic' matching 'duplicate'). Added configurable
context_matching_modeparameter toLemmaContextAwareEnhancerwith two options: "substring" (default, maintains backward compatibility for compound words like "creditcard"), and "whole_word" (prevents false positives like 'lic' matching 'duplicate') (#1061)
- US_MBI recognizer for Medicare Beneficiary Identifier with pattern matching and context support (#1821) (Thanks @chrisvoncsefalvay)
- MAC address recognizer for detecting MAC addresses in various formats (#1829) (Thanks @kyoungbinkim)
- Korean Business Registration Number (KR_BRN) recognizer (#1822) (Thanks @RektPunk)
- Korean Foreigner Registration Number (KR_FRN) recognizer (#1825) (Thanks @RektPunk)
- Korean Driver License (KR_DRIVER_LICENSE) recognizer (#1820) (Thanks @RektPunk)
- Korean Passport (KR_PASSPORT) recognizer (#1814) (Thanks @kyoungbinkim)
- Thai National ID Number (TH_TNIN) recognizer with format and checksum validation (#1713) (Thanks @pangchewe)
- Configurable LangExtract recognizer supporting any LLM provider with custom YAML configurations (#1815) (Thanks @telackey)
- Azure OpenAI support for LangExtract recognizer with managed identity authentication for GPT-4o, GPT-4, etc. (#1801) (Thanks @dorlugasigal)
- Batch processing support in REST API - accepts arrays of texts and returns arrays of results with backward compatibility (#1806) (Thanks @telackey)
- GPU device control via
PRESIDIO_DEVICEenvironment variable for explicit GPU/CPU selection (#1843) (Thanks @RonShakutai) - Support for multiple recognizer instances from same class via
class_nameparameter (#1819) (Thanks @RonShakutai) - Pydantic-based YAML configuration validation with ConfigurationValidator class for improved reliability and error reporting (#1780) (Thanks @omri374)
- Japanese and Chinese mobile number test cases for PhoneRecognizer (#1808) (Thanks @WenwenHLF)
- GPU optimizations with DeviceDetector singleton providing 4-10x performance improvements for GLiNER, Transformers, and Stanza engines (#1812) (Thanks @RonShakutai)
- Configurable extraction parameters for LangExtract recognizers via YAML (max_char_buffer, timeout, num_ctx, fence_output, use_schema_constraints) (#1811) (Thanks @RonShakutai)
- Lazy initialization for device detector singleton (#1831) (Thanks @RonShakutai)
- Simplified IBAN regex pattern from 8 to 3 capture groups for better performance (#1818) (Thanks @Copilot)
- Improved Korean RRN regex pattern with negative lookahead/lookbehind and gender digit validation (#1807) (Thanks @kyoungbinkim)
- GLiNER GPU inference by properly passing map_location parameter (#1813) (Thanks @eveningcafe)
- GLiNER text truncation issue during processing (#1805) (Thanks @jedheaj314)
- IBAN regex trailing character handling to prevent false matches (#1818) (Thanks @Copilot)
- Python 3.10 build compatibility by pinning onnxruntime <1.24.1 for Python 3.10 (#1848) (Thanks @SharonHart)
- TypeError in third-party recognizers by removing invalid **kwargs from init methods (#1800) (Thanks @RonShakutai)
- Pattern recognizer example language specification (#1835) (Thanks @andyjessen)
- BREAKING CHANGE: Hash operator now uses random salt by default to prevent brute-force and dictionary attacks. Same PII values will produce different hashes unless a
saltparameter is explicitly provided. Users requiring referential integrity must provide their own salt. Minimum salt length: 16 bytes. See documentation for migration guide. (#1846) (Thanks @Copilot) - Updated cryptography dependency to >=46.0.4 to address CVE-2025-15467 security vulnerability (#1841) (Thanks @Copilot)
- GPU acceleration documentation guide with setup and usage instructions (#1826) (Thanks @dilshad-aee)
- Telemetry redaction sample demonstrating PII removal from telemetry data (#1824) (Thanks @Jakob-98)
- Migrated CI workflows (lint, dependency review, release) to ubuntu-slim runners for improved efficiency (#1840) (Thanks @Copilot)
- Updated actions/cache from v4 to v5 with Node.js 24 runtime support (#1817) (Thanks @dependabot)
- DICOM: use_metadata will now use both is_patient and is_name to generate the PHI list of words via change to _make_phi_list.
- Image Redactor: Added redact_and_return_bbox method to ImageRedactorEngine, which returns both the redacted image and the detected bounding boxes for redacted regions.
2.2.360 - 2025-09-09
- Korean Resident Registration Number (RRN) recognizer with checksum validation for numbers issued prior to October 2020 (#1675) (Thanks @siwoo-jung)
- Azure Health Data Services (AHDS) de-identification service integration as a remote recognizer with Entra ID authentication (#1624) (Thanks @rishasurana)
- Comprehensive input validation methods for NlpEngineProvider to ensure valid arguments for engines, configuration, and file paths (#1653) (Thanks @siwoo-jung)
- Updated Indian Aadhaar recognizer to support contextual delimiters (-, :, space) for improved detection accuracy (#1677) (Thanks @K3y5tr0ke)
- Fixed Italian Driver License recognizer regex to include missing characters per government requirements, excluding only A, O, Q, I (#1651) (Thanks @K3y5tr0ke)
- Refactored recognizers folder structure for better organization and maintainability (#1670) (Thanks @omri374)
- Azure Health Data Services (AHDS) Surrogate anonymization operator with medical domain expertise for realistic PHI surrogate generation (#1672) (Thanks @rishasurana)
- Fixed code indentation issues in encrypt.py for better code quality (#1660) (Thanks @aliyss)
- Comprehensive GitHub Copilot instructions with development guidelines, build processes, and e2e testing procedures (#1693) (Thanks @Copilot)
- New GitHub Actions CI & release workflows with multi-platform Docker image support for AMD64 and ARM64 architectures (#1697) (Thanks @tamirkamara)
- Dual-path CI workflow to fix GitHub Actions failures for external contributors by auto-detecting fork vs. main repository PRs (#1708) (Thanks @Copilot)
- OIDC trusted publishing for PyPI releases eliminating manual API token management and enhancing security (#1702) (Thanks @Copilot)
- Comprehensive YAML and Python examples for context-aware recognizers documentation (#1710) (Thanks @MRADULTRIPATHI)
- Updated actions/checkout from v4 to v5 to support Node.js 24 runtime (#1699) (Thanks @dependabot)
- Fixed PR template to use proper GitHub issue linking syntax for automatic issue association and closing (#1701) (Thanks @Copilot)
- Updated LiteLLM documentation with detailed guide links for better integration guidance (#1698) (Thanks @BhargavDT)
- Fixed broken links in CONTRIBUTING.md and developing recognizers documentation after recognizers refactoring (#1674) (Thanks @siwoo-jung)
- Fixed OpenSSF badge embedding in README.MD for proper display (#1673) (Thanks @SharonHart)
- Removed Terrascan from Microsoft Defender for DevOps workflow to eliminate false positives on non-IAC repository (#1691) (Thanks @Copilot)
- Updated Streamlit and PyTorch dependency versions to fix CVE vulnerabilities (#1685) (Thanks @SharonHart)
- Updated requests library to mitigate security vulnerability GHSA-9hjg-9r4m-mvj7 (#1683) (Thanks @SharonHart)
- Locked pandas dependency in Streamlit to prevent version conflicts (#1689) (Thanks @SharonHart)
2.2.359 - 2025-07-06
-
Allow loading of StanzaRecognizer when StanzaNlpEngine is configured, improving NLP engine flexibility (#1643) (Thanks @omri374)
-
Excluded recognition_metadata attribute from REST Analyze Response DTO to clean up API responses (#1627) (Thanks @SharonHart)
-
Added ISO 8601 support to DateRecognizer for improved date parsing (#1621) (Thanks @StefH)
-
Prevented misidentification of 13-digit timestamps as credit cards (#1609) (Thanks @eagle-p)
-
Updated analyzer_engine_provider.md for clarity and completeness (#1590) (Thanks @AvinandanBandyopadhyay)
-
Bumped python from 3.9 to 3.12 in presidio-analyzer Dockerfile (#1583) (Thanks @dependabot)
-
Bumped phonenumbers version for improved validation and parsing (#1579) (Thanks @omri374)
-
Refactored InstanceCounterAnonymizer to simplify index retrieval logic (#1577) (Thanks @ShakutaiGit)
-
Fixed issue #1574 to support as_tuples in relevant functions (#1575) (Thanks @omri374)
-
Updated initial scores in IN_PAN for better recognition performance (#1565) (Thanks @omri374)
-
Added accelerate as a missing build dependency to fix build failures (#1564) (Thanks @SharonHart)
-
Don't set a default for LABELS_TO_IGNORE if not specified, to avoid unintended behavior (#1563) (Thanks @SharonHart)
-
Updated 08_no_code.md for documentation improvements (#1561) (Thanks @alan-insam)
-
Added the ability to disable the NLP recognizer via configuration (#1558) (Thanks @omri374)
-
Removed 'class' from API documentation for clarity (#1554) (Thanks @omri374)
-
Set country-specific default recognizers to enabled=false for safer defaults (#1586) (Thanks @omri374)
-
Most country specific recognizers that expect English were put as optional to avoid false positives, and would not work out-of-the-box (#1586). Specifically:
- SgFinRecognizer
- AuAbnRecognizer
- AuAcnRecognizer
- AuTfnRecognizer
- AuMedicareRecognizer
- InPanRecognizer
- InAadhaarRecognizer
- InVehicleRegistrationRecognizer
- InPassportRecognizer
- EsNifRecognizer
- InVoterRecognizer
To re-enable them, either change the default YAML to have them as
enabled: true, or via code, add them to the recognizer registry manually.- Yaml based: see more here: YAML based configuration.
- Code based:
from presidio_analyzer import AnalyzerEngine from presidio_analyzer.predefined_recognizers import AuAbnRecognizer # Initialize an analyzer engine with the recognizer registry analyzer = AnalyzerEngine() # Create an instance of the AuAbnRecognizer au_abn_recognizer = AuAbnRecognizer() # Add the recognizer to the registry analyzer.registry.add_recognizer(au_abn_recognizer)
- Update python base image to 3.13 (#1612) (Thanks @dependabot[bot])
- Bumped python from 3.12-windowsservercore to 3.13-windowsservercore in presidio-anonymizer Dockerfile (#1612) (Thanks @dependabot)
- Ensured anonymizer sorts analyzer results input by start and end for correct whitespace merging (#1588) (Thanks @mkh1991)
- Bumped python from 3.9 to 3.12 in presidio-anonymizer Dockerfile (#1582) (Thanks @dependabot)
- Bumped python from 3.12-slim to 3.13-slim in presidio-image-redactor Dockerfile (#1611) (Thanks @dependabot)
- Bumped python from 3.10 to 3.12 in presidio-image-redactor Dockerfile (#1581) (Thanks @dependabot)
- Fixed typographical errors in documentation files for better clarity (#1637) (Thanks @kilavvy)
- Corrected spelling mistakes across code comments and documentation for improved readability (#1636) (Thanks @leopardracer)
- Fixed typos in documentation and test descriptions, enhancing clarity and consistency in the codebase (#1631) (Thanks @zeevick10)
- Corrected typos in docstrings and comments to maintain documentation quality (#1630) (Thanks @kilavvy)
- Fixed typos in documentation and test descriptions, ensuring accurate references and descriptions (#1628) (Thanks @leopardracer)
- Removed unnecessary run.bat script from the repository (#1626) (Thanks @SharonHart)
- Added "/TestResults" to .gitignore file to prevent test result artifacts from being committed (#1622) (Thanks @StefH)
- Added links to the discussion board about Docker prebuilt images to documentation (#1614) (Thanks @omri374)
- Fixed spelling, grammar, and style issues in Presidio V2 documentation (#1610) (Thanks @Vruddhi18)
- Updated .gitignore to include the .vs folder (#1608) (Thanks @StefH)
- Fixed typo in api-docs.yml to improve documentation accuracy (#1602) (Thanks @StefH)
- Reverted a previous update to codeql-analysis.yml to restore earlier configuration (#1595) (Thanks @SharonHart)
- Updated codeql-analysis.yml for improved code scanning configuration (#1594) (Thanks @SharonHart)
- Fixed paths-ignore in codeql-analysis.yml to refine scanning scope (#1593) (Thanks @SharonHart)
- Ignored docs/ directory in CodeQL analysis to prevent unnecessary scanning (#1592) (Thanks @SharonHart)
- Fixed minor typos in code and documentation (#1585) (Thanks @omahs)
- Restored dependabot scanning for security and dependency updates (#1580) (Thanks @SharonHart)
- Added SUPPORT.md file to provide support information to users (#1568) (Thanks @omri374)
2.2.358 - 2025-03-18
- Fixed: Updated URL regex pattern to correctly exclude trailing single (') and double (") quotes from matched URLs.
- Drop dependency of spacy_stanza package, and add supporting code to stanza_nlp_engine, to support recent stanza versions
- Add parameters to allow users to define the number of processes and batch size when running BatchAnalyzerEngine.
- Fix InPassportRecognizer regex recognizer
- Changed: Deprecate
MD5hash type option, defaulting intosha256. - Replace crypto package dependency from pycryptodom to cryptography
- Remove azure-core dependency from anonymizer
- Changed: Updated the return type annotation of
ocr_bboxesinverify_dicom_instance()fromdicttolist.
- Updated the
Evaluating DICOM Redactiondocumentation to reflect changes in verify_dicom_instance() within the DicomImagePiiVerifyEngine class.
2.2.357 - 2025-01-13
- Example GLiNER integration (#1504)
- Docs revamp and docstring bug fixes (#1500)
- Minor updates to the mkdocstrings config (#1503)
2.2.356 - 2024-12-15
- Added logic to handle phone numbers with country code (#1426) (Thanks @kauabh)
- Added UK National Insurance Number Recognizer (#1446) (Thanks @hhobson)
- Fixed regex match_time output (#1488) (Thanks @andrewisplinghoff)
- Added fix to ensure configuration files are closed properly when loading them (#1423) (Thanks @saulbein)
- Closing handles for YAML file (#1424) (Thanks @roeybc)
- Reduce memory usage of Analyzer test suite (#1429) (Thanks @hhobson)
- Added
batch_sizeparameter toBatchAnalyzerEngine(#1449) (Thanks @roeybc) - Remove ignored labels from supported entities (#1454) (Thanks @omri374)
- Update US_SSN CONTEXT and unit test (#1455) (Thanks @claesmk)
- Fixed bug with Azure AI language context (#1458) (Thanks @omri374)
- Add support for allow_list, allow_list_match, regex_flags in REST API (#1484) (Thanks @hdw868)
- Add a link to model classes to simplify configuration (#1472) (Thanks @omri374)
- Restricting spacy.cli for version 3.7.0 (#1495) (Thanks @kshitijcode)
- No changes specified for Anonymizer in this release.
- Fix presidio-structured build - lock numpy version (#1465) (Thanks @SharonHart)
- Fix bug with image conversion (#1445) (Thanks @omri374)
- Removed Python 3.8 support (EOL) and added 3.12 (#1479) (Thanks @omri374)
- Update Docker build to use gunicorn for containers (#1497) (Thanks @RKapadia01)
- New Dev containers for analyzer, analyzer+transformers, anonymizer (#1459) (Thanks @roeybc)
- Added dev containers for: analyzer, analyzer+transformers, anonymizer, and image redaction (#1450) (Thanks @roeybc)
- Added support for allow_list, allow_list_match, regex_flags in REST API (#1488) (Thanks @hdw868)
- Typo fix in if condition (#1419) (Thanks @omri374)
- Minor notebook changes (#1420) (Thanks @omri374)
- Do not release
presidio-clias part of the release pipeline (#1422) (Thanks @SharonHart) - (Docs) Use Presidio across Anthropic, Bedrock, VertexAI, Azure OpenAI, etc. with LiteLLM Proxy (#1421) (Thanks @krrishdholakia)
- Update CI due to DockerCompose project name issue (#1428) (Thanks @omri374)
- Update docker-compose installation docs (#1439) (Thanks @MWest2020)
- Fix space typo in docs (#1459) (Thanks @artfuldev)
- Unlock numpy after dropping 3.8 (#1480) (Thanks @SharonHart)
2.2.355 - 2024-10-28
- Add a link to HashiCorp vault operator resource (#1468) (Thanks Akshay Karle)
- Updates to the transformers conf docs and yaml file (#1467)
- docs: clarify the docs on deploying presidio to k8s (#1453) (Thanks Roel Fauconnier)
2.2.355 - July 9th 2024
Note: A new YAML based mechanism has been added to support no-code customization and creation of recognizers. The default recognizers are now automatically loaded from file.
- Recognizer for Spanish Foreigners Identity Code (NIE Numero de Identificacion de Extranjeros).
- Recognizer for Finnish Personal Identity Codes (Henkilötunnus) (#1394) (Thanks honderr).
- New Predefined Recognizer for Indian Passport #1350 (#1351) (Thanks Hiten-98)
- Add new recognizer for IN_VOTER #1344 (#1345) (Thanks kjdeveloper8)
- Spanish NIE (Foreigners ID card) recognizer (#1359) (Thanks areyesfalcon)
- Added regex functionality for allow lists in the analyzer (#1357) (Thanks NarekAra)
- Loading analyzer engine & recognizer registry from configuration file (#1367)
- Align ports with documentation and postman collection. (#1375) (Thanks ungana)
- Analyzer documentation (#1384)
- Fix the entity filtering of the transformer_recognizer.py analzye function (#1403) (Thanks andreas-eberle)
- Update conf files location (#1358)
- Fix OverflowError in crypto_recognizer (#1377)
- Improve url detector (#1398) (Thanks afogel)
- Update Dockerfile.windows (#1413) (thanks markvantilburg)
- Changing predefined recognizers to use the config file (#1393) (Thanks RoeyBC)
- Update Dockerfile.windows (#1414) (thanks markvantilburg)
- Add Ruff linter + Apply Ruff fix (#1379)
- Auto-formatting, fix D rules (#1381)
- Fix N818, E721 (#1382)
- Migrate Python Packaging to pyproject.toml (#1383)
- From Pipenv to Poetry (#1391)
- Fix ports in docs (#1408)
2.2.353 - March 31st 2024
- Support 'M' prefix in SG_NRIC_FIN Recognizer and expand tests (#1304) (Thanks @miltonsim)
- Add Bech32 and Bech32m Bitcoin Address Validation in Crypto Recognizer and expand tests (#1307) (Thanks @miltonsim)
- Predefined pattern recognizer : IN_VEHICLE_REGISTRATION (#1288) (Thanks @devopam)
- Addition of leniency parameter in predefined PhoneRecognizer (#1311) (Thanks @VMD7)
- Add Singapore UEN Recognizer (#1315) (Thanks @miltonsim)
- Update spacy_stanza.md (#1325) (Thanks @AndreasThinks)
- Adding Span Marker Recognizer Sample (#1321) (Thanks @VMD7)
- Cache compiled regexes in analyzer (#1335) (Thanks @Edward-Upton)
- Added pseudonimyzation sample (#1296)
- Added tesseract to installation (#1312)
- Analysis builder improvements (#1295) (Thanks @ebotiab)
- Implement user-defined entity selection strategies in Presidio Structured (#1319) (Thanks @miltonsim)
- Fix for incorrectly referenced recognizer in analysis_explaination using PhoneRecognizer (#1330) *Thanks @egillv021)
- Fix bug where "bank" and "check" wouldn't work (#1333) (Thanks @usr-ein and @Samuel Prevost)
- Bugfix in tutorial (#1310)
- Changed default aggregation_strategy to max (#1342)
- Fixed wrong condition for dicom metadata (#1347)
2.2.353 - Feb 12th 2024
- Add predefined_recognizer: IN_AADHAAR (#1256)
- Added the option to add custom operators + pseudonymization sample (#1284)
- Fix failing test due to optional package (#1258)
- Update publish-to-pypi.yml (#1259)
- Allow local Spacy Models to be loaded in NLP Engine (#1269)
- Upgrade pip in windows containers (#1272)
- Bugfix in ImageAnalyzerEngine #1274
2.2.352 - Jan 22nd 2024
- Added alpha of presidio-structured, a library (presidio-structured) which re-uses existing logic from existing presidio components to allow anonymization of (semi-)structured data. (#1192)
- Add PL PESEL recognizer (#1209)
- Azure AI language recognizer (#1228)
- Add_conf_to_package_data (#1243)
- Add keep operator as deanonymizer (#1255)
- Update anonymize_list type hints and document that sometimes items will be ignored. (#1252)
- Add Dockerfile for Windows containers (#1194)
- Drop WA driver license number (#1214)
- Change ner_model_configuration from list to map (#1222)
- Bugfix in SpacyRecognizer (#1221)
- Bugfix in NerModelConfiguration (#1230)
- Add_conf_to_package_data (#1243)
- Improved the logic of conflict handling in AnonymizerEngine (#1196)
- Change default score threshold in image redactor (#1210)
- fixes bug #1227 (#1231)
- Added missing dependencies for opencv-python and azure forms recognizer (#1257)
- Remove inclusive-lint step (#1207)
- Updates to demo website with new NLP Engine (#1181)
2.2.351 - Nov. 6th 2024
- Hotfix for NerModelConfiguration not created correctly (#1208)
2.2.350 - Nov. 2nd 2024
- Hotfix: default.yaml is not parsed correctly (#1202)
2.2.35 - Nov. 2nd 2024
- Put org in ignore as it has many FPs (#1200)
2.2.34 - Oct. 30th 2024
- New Predefined Recognizer: IN_PAN (#1100)
- Anonymizer - Pass bytes key to Encrypt / Decrypt (#1147)
- DICOM redactor improvement: Enabling more photometric interpretations (#1103)
- DICOM redactor improvement: Adding exceptions for when DICOM file does not have pixel data (#1104)
- Small reordering of kwargs as prereq for allow list functionality (#1110)
- DICOM redactor improvement: Preventing distortion when multiple sets of pixels are in one instance (#1109)
- DICOM redactor improvement: Enabling compatibility with compressed images (#1105)
- DICOM redactor improvement: Enable return of redacted bboxes (#1111)
- DICOM redactor improvement: Enable selection of redact approach (#1113)
- Enable toggle of printing output location after redacting from file (#1144)
- Changing test exception type check (#1148)
- Enabling allow list approach with all image redaction (#1145)
- Improve process names method in DICOM image redactor (#1150)
- Adding examples of toggling metadata usage and saving bboxes (#1158)
- Updating verification engines to include latest updates to redactor engines (#1162)
- Improved bbox processor (#1163)
- Updating verification engines and enable plotting of custom bboxes (#1164)
- Added image processing class to preprocess the image before running OCR (#1166)
- Added support for Microsoft's document intelligence OCR
- Refactored the
NlpEngineand Ner recognizers (SpacyRecognizer,TransformersRecognizer,StanzaRecognizer) to allow simpler integration of huggingface and transformers models (#1159). This includes:- Changes in how NER results flow through Presidio (see docs)
- NER/model definition is now defined using a conf file or a
NerModelConfigurationobject. - Integrated
spacy-huggingface-pipelinesfor a more robust integration of huggingface models.
- As a result,
SpacyRecognizerlogic has changed, please see #1159. Some fields within the class are now deprecated. - Updated type checks (#1175)
- Enabled regex flags manipulation (#1193)
- Initial logic check for merging 2 entities (#1092)
- Fix Sphinx warning in OperatorConfig (#1143)
- Fix type mismatch in check_label_groups parameter in spacy_recognizer (#1130)
- anonymize_list return type hint fix (#1178)
- We no longer use Pipenv.lock. Locking happens as part of the CI. (#1152)
- Changed the ACR instance (#1089)
- Updated to Cred Scan V3 (#1154)
2.2.33 - June 1st 2023
- Added
keep, an no-op anonymizer that allows preserving some types of PII while keeping track of its position in anonymized output. (#1062) - Added
BatchAnonymizerEngineto complement theBatchAnalyzerEnginefor lists, and dicts (#993)
- Drop support for Python 3.7
- Add support for Python 3.11
- New demo app for Presidio, based on Streamlit (#1054)
- GPT based synthetic data generation (#1051)
2.2.32 - 25.01.2023
- Updated dependencies
- Fixed exception on whitespace in AU recognizers
- Updated API version for Text Analytics in sample
- Fixed merge entity from the same type
- Modified
ImagePiiVerifyEngineto allow passing of kwargs - Updated template for building image redactor yaml
- Updated all image redactor engines and OCR classes to allow passing of an OCR confidence threshold and other OCR parameters
- Moved general bounding box operations to new class
BboxProcessor - Updated
presidio-image-redactorversion from 0.0.45 to 0.0.46
- Added revised example for transformer recognizer
- Added evaluation code for the DICOM image redaction capabilities
- REST API to support web applications payload
- Updated documentation to include instructions on using DICOM evaluation code
- Updated documentation to mention OCR thresholding
2.2.31 - 14.12.2022
- Added DICOM image redaction capabilities (
DicomImageRedactorEngineclass and tests) - Updated
setup.pyto include new required packages for DICOM capabilities - Updated Pipfile and Pipfile.lock
- Updated
presidio-image-redactorversion from 0.0.44 to 0.0.45 - Updated the
ImagePiiVerifyEngineclass to allow use of custom analyzer engines
- Updated
NOTICEto include licenses of added packages - Updated docs with getting started code for new
DicomImageRedactorEngine
2.2.30 - 25.10.2022
- Added Italian fiscal code recognizer
- Added Italian driver license recognizer
- Added Italian identity card recognizer
- Added Italian passport recognizer
- Added
TransformersNlpEngineto support transformer based NER models within spaCy pipelines - Added pattern for next gen US passport in
presidio-analyzer/presidio_analyzer/predefined_recognizers/us_passport_recognizer.py
- Improved MEDICAL_LICENSE pattern and fixed checksum verification
- Bugfix for context handling by aligning results to recognizers using a unique identifier and not recognizer name
- Updated Pipfile.lock
- Removed constraint on empty texts
- Updated Pipfile.lock
- Updated
pipenvversion - Updated
blackandflake8in pre-commit scripts - Updated docs for NLP engine
2.2.29 - 12.07.2022
- Added Presidio to OSSF (Open Source Security Foundation)
- Added CodeQL scanning
- Introduced BatchAnalyzerEngine
- Added allow-list functionality to ignore specific strings
- Added notebook on anonymizing known values
- Added sample for using
transformersmodels in Presidio
- Bug fix for getting the text before anonymizing (#890)
- Deps update
2.2.28 - 04.05.2022
- Improved deny-list regex and customizability
- Added documentation for existing spaCy models
- Bugfix in analysis explanation scores
- PIL version updated to 9.0.1
- Recognizers can be loaded from YAML
2.2.27 - 08.03.2022
- Improved context mechanisms to support recognizer level context enhacenement and cross-entity context support
2.2.26 - 23.02.2022
Bug fix in context support
2.2.25 - 21.02.2022
- Added a URL recognizer
- Added a new capability for creating new logic for context detection. See ContextAwareEnhancer and LemmaContextAwareEnhancer. Documentation would be added on a future release.
Furthermore, it is now possible to pass context words thruogh the
analyzemethod (or via API) and those would be taken into account for context enhancement.
- Bug fix for entities at the end of a sentence.
- Formatted (black/flake8) the Python examples.
- Removed the DOMAIN_NAME recognizer. This change means that the
DOMAIN_NAMEentity is no longer returned by Presidio.URLwould be returned instead, and would catch full addresses and not just domain names (https://www.microsoft.com/a/b.htmland not justwww.microsoft.com)
2.2.24 - 23.01.2022
- Fixed issue when IBAN followed by all caps can't be recognized
- Updated dependencies in Pipfile.lock
- Removed official Python 3.6 support and added support for 3.10
- Added docs for creating a streamlit app
- Added docs for using Flair
2.2.23 - 16.11.2021
- Added multi-regional phone number recognizer.
- Fixed duplicated entities removal.
- Added sample for structured / semi-structured data in batch.
- Dependencies version bumps.
- Added sample for getting an identified entity value using a custom Operator.
- Changed packages/imports .
- Added repr to classes.
- Added encryption and decryption samples.
- Remove AnonymizerResult in favor of OperatorResult, for an easier anonymization-deanonymization.
- Anonymizaer and Deanonymizaer to return
operator_nameinstead ofoperatorin OperatorResult.
2.2.2 - 09.06.2021
- Databricks based template in Azure Data Factory docs
- Adding ORGANIZATION recognizer docs
- Bumped pydantic from 1.7.3 to 1.7.4
- Updated call to stanza via spacy-stanza
- Added DATE_TIME recognizer
- Added Medical Licence recognizer
- Bumped spacy from 3.0.5 to 3.0.6
2.2.1 - 10.05.2021
- Create CODE_OF_CONDUCT
- ADF templates docs
- Fix spark sample to run presidio in broadcast
- Ad-hoc recognizers
- Text Analytics Integration Sample
- Documentation update and samples validation
- Adding tagger to the spaCy model pipeline
- Sample notebook for remote recognizer (using Text Analytics)
- Add matplotlib to image-redactor
- Added custom lambda anonymizer
- Added add pii_verify_engine to the image-redactor
Upgrade Analyzer spacy version to 3.0.5
- Request entity AnonymizerConfig renamed OperatorConfig
- In OperatorConfig: anonymizer_name -> operator_name
- Response entity AnonymizerResult renamed to EngineResult
- In EngineResult: List[AnonymizedEntity] -> List[OperatorResult]
- In OperatorResult:
- anonymizer -> operator
- anonymized_text -> text
- Response entity anonymizer renamed to operator.
- Response entity anonymizer_text renamed to text.
New endpoint for deanonymizing encrypted entities by the anonymizer.
- Fixed an issue where the CreditCardRecognizer regex could incorrectly identify 13-digit Unix timestamps as credit card numbers. Validated that 13 digit numbers that start with
1and have no separators (e.g.1748503543012) are not flagged as credit cards. - Enhance NlpEngineProvider with validation methods for NLP engines, configuration, and conf file path.
- Added Korean Resident Registration Number (RRN) recognizer (KrRrnRecognizer).
- Added Thai National ID Number (TNIN) recognizer (ThTninRecognizer).