Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shallow scan should recognize phone, credit card, person and location from column names #68

Open
vrajat opened this issue Feb 14, 2020 · 1 comment
Labels
good first issue Good for newcomers

Comments

@vrajat
Copy link
Member

vrajat commented Feb 14, 2020

It is not surprising that deep and shallow scan show different results. Shallow scan only looks at column names. Deep scan looks at a sample of the data. I've even noticed that two different runs of deep scan show different results as sample rows are different. This is the challenge with not scanning all of the data. Its a trade-off between performance/cost and accuracy. There is no right answer.

W.R.T the output in particular, my observations are:

  1. Shallow scan should recognize phone, credit card, person and location from column names
  2. Deep scan did not recognize PII in a few columns. I need to look at the data to figure out if thats a bug or the column did not have any relevant data.
  3. Deep scan should also scan column names for candidates
  4. Along with an array, PIICatcher should add confidence numbers.

Originally posted by @vrajat in #67 (comment)

@vrajat vrajat added the good first issue Good for newcomers label Feb 14, 2020
@vrajat
Copy link
Member Author

vrajat commented Jul 20, 2020

Add birthdate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant