An approach: Solving PII detection in Unstructured Data with AI/ML

Written by Teck Wu | Apr 4, 2023 12:00:00 AM

Challenges in Unstructured data

A lot of PII is contained in unstructured forms of information and communication in any organization. A major challenge here is the format of these communications are informal most of the time. As a result the traditional approaches of identifying PII becomes a challenging task. Standard regular expressions are constrained by their coverage as it depends on rules and corpus mapping. The off the shelf machine learning approaches are also tuned to formal grammatical text and hence they don't perform well with informal texts. Owing to fewer PII in every document it is also a challenge to build solutions which can effectively detect PII without detecting a lot of false positives.

View full post