Warning

This package is a work in progress and is not yet available on pypi. This documentation should be considered more of a design document for what scrubadub will do someday rather than a specification of what it can do today.

scrubadub

Remove personally identifiable information from free text. Sometimes we have additional metadata about the people we wish to anonymize. Other times we don’t. This package makes it easy to seamlessly scrub personal information from free text, without comprimising the privacy of the people we are trying to protect.

scrubadub currently supports removing:

  • Names (proper nouns) via textblob
  • Email addresses
  • URLs
  • Phone numbers via phonenumbers
  • username / password combinations
  • Skype usernames

Quick start

Getting started with scrubadub is as easy as pip install scrubadub and incorporating it into your python scripts like this:

>>> import scrubadub

# John may be a cat, but he doesn't want other people to know it.
>>> text = u"John is a cat"

# Replace names with {{NAME}} placeholder. This is the scrubadub default
# because it maximally omits any information about people.
>>> placeholder_text = scrubadub.clean_with_placeholders(text)
>>> placeholder_text
u"{{NAME}} is a cat"

As a python package, scrubadub also has several more advanced features to allow users to fine-tune the manner in which scrubadub cleans dirty dirty text.

Indices and tables