scrubadub¶
Remove personally identifiable information from free text. Sometimes we have additional metadata about the people we wish to anonymize. Other times we don’t. This package makes it easy to seamlessly scrub personal information from free text, without compromising the privacy of the people we are trying to protect.
scrubadub
currently supports removing:
Names
Email addresses
Addresses/Postal codes (US, GB, CA)
Credit card numbers
Dates of birth
URLs
Phone numbers
Username and password combinations
Skype/twitter usernames
Social security numbers (US and GB national insurance numbers)
Tax numbers (GB)
Driving licence numbers (GB)
Quick start¶
Getting started with scrubadub
is as easy as pip install scrubadub
and
incorporating it into your python scripts like this:
>>> import scrubadub
# My cat may be more tech-savvy than most, but he doesn't want other people to know it.
>>> text = "My cat can be contacted on example@example.com, or 1800 555-5555"
# Replaces the phone number and email addresse with anonymous IDs.
>>> scrubadub.clean(text)
'My cat can be contacted on {{EMAIL}}, or {{PHONE}}'
There are many ways to tailor the behavior of scrubadub
using
different Detectors and PostProcessors.
Scrubadub is highly configurable and supports localisation for different languages and regions.
Installation¶
To install scrubadub using pip, simply type:
pip install scrubadub
There are several other packages that can optionally be installed to enable extra detectors. These scrubadub_address, scrubadub_spacy and scrubadub_stanford, see the relevant documentation (address detector documentation and name detector documentation) for more info on these as they require additional dependencies. This package requires at least python 3.6. For python 2.7 or 3.5 support use v1.2.2 which is the last version with support for these versions.
New maintainers¶
LeapBeyond are excited to be supporting scrubadub with ongoing maintenance and development. Thanks to all of the contributors who made this package a success, but especially @deanmalmgren, IDEO and Datascope.
Contents¶
Documentation
API Reference
- scrubadub
- scrubadub.detectors
- Base classes
- scrubadub.detectors.Detector
- scrubadub.detectors.RegexDetector
- scrubadub.detectors.RegionLocalisedRegexDetector
- Detectors enabled by default
- scrubadub.detectors.CredentialDetector
- scrubadub.detectors.CreditCardDetector
- scrubadub.detectors.DriversLicenceDetector
- scrubadub.detectors.EmailDetector
- scrubadub.detectors.en_GB.NationalInsuranceNumberDetector
- scrubadub.detectors.PhoneDetector
- scrubadub.detectors.PostalCodeDetector
- scrubadub.detectors.en_US.SocialSecurityNumberDetector
- scrubadub.detectors.en_GB.TaxReferenceNumberDetector
- scrubadub.detectors.TwitterDetector
- scrubadub.detectors.UrlDetector
- scrubadub.detectors.VehicleLicencePlateDetector
- Optional detectors
- scrubadub.detectors.DateOfBirthDetector
- scrubadub.detectors.SkypeDetector
- scrubadub.detectors.TaggedEvaluationFilthDetector
- scrubadub.detectors.TextBlobNameDetector
- scrubadub.detectors.UserSuppliedFilthDetector
- External detectors
- scrubadub_address.detectors.AddressDetector
- scrubadub_spacy.detectors.SpacyEntityDetector
- scrubadub_spacy.detectors.SpacyNameDetector
- scrubadub_stanford.detectors.StanfordEntityDetector
- Catalogue functions
- scrubadub.detectors.register_detector
- scrubadub.detectors.remove_detector
- scrubadub.filth
- scrubadub.post_processors
- scrubadub.comparison