-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User PII dictionary table #529
Conversation
fix: add missing external_id to list of models fix: add username for event sink chore: reformat for quality checks
Thanks for the pull request, @pomegranited! Please note that it may take us up to several weeks or months to complete a review and merge your PR. Feel free to add as much of the following information to the ticket as you can:
All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here. Please let us know once your PR is ready for our review and all tests are green. |
bcd308f
to
3c50bd7
Compare
tutoraspects/templates/aspects/apps/aspects/migrations/alembic/versions/0028_user_pii.py
Show resolved
Hide resolved
93c5f06
to
7622363
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll approve, but I think @bmtcril would have an opinion on this one
run off an in-memory dictionary joining the PII tables.
password '{{ CLICKHOUSE_ADMIN_PASSWORD }}' | ||
db '{{ ASPECTS_EVENT_SINK_DATABASE }}' | ||
query " | ||
with most_recent_user_profile as ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will fan out, I don't see any limiting that only pulls the most recent profile or most recent external id in either of these queries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @bmtcril -- I lost this clause somewhere along the line, but I've re-added it, see this line and this one.
I'm only pulling the most recent user_profile
row, and all the external_id
rows, since a single user may have more than one external ID (depending on the external_id_type
, or if external IDs get regenerated with a new secret key).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I didn't know external_id's could be changed. I'm reasonably sure they are currently immutable (or at least nothing updates them) 1:1 with user id / type. If nothing else we want to limit the external_id_type
to xapi
, but if there's a place where secret key is involved can you point it out? That would complicate several things in unexpected ways. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I didn't know external_id's could be changed. I'm reasonably sure they are currently immutable (or at least nothing updates them) 1:1 with user id / type.
Ah, my apologies.. I was thinking of the old anonymous user ids, which were hashed on the Django app SECRET_KEY
. External IDs are immutable, as you expected.
If nothing else we want to limit the external_id_type to xapi
Ok that makes sense. I wasn't sure if there were any other types of external_ids used in our events. Will update the query that generates the dict/table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see 57a4f56.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I also added a dataset for the new user_pii table with 9ceb57b.
7622363
to
7ced3f7
Compare
so that user retirements can delete user PII records.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rebase on main?
FYI @Ian2012 I've rebased , but I want to address this comment before this gets merged. |
so the event sink plugin knows which PII tables are in use.
or NULL if the user is not yet assigned an external_id.
sourced from the event_sink.user_pii table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, looks good to me! 🎉
@pomegranited 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future. |
Adds
event_sink.user_pii
dictionary table created by joiningexternal_id
anduser_profile
tables, using the most recentuser_profile
row. This dictionary is cached in memory, refreshing everyASPECTS_PII_CACHE_LIFETIME
seconds.Also partitions the
user_profile
(andexternal_id
) tables to prepare for user retirement deletions.Note: A user may have more than one
external_user_id
, and soevent_sink.user_pii
may show more than one row per user.Related information
Addresses #449
Depends on openedx-unsupported/openedx-event-sink-clickhouse#63
Based off #521 which adds the
event_sink.external_id
table.Testing instructions
tutor config save -s ASPECTS_ENABLE_PII=True
tutor config save -s ASPECTS_PII_CACHE_LIFETIME=10
tutor dev do alembic -c "upgrade head"
user_pii
table is empty:select * from event_sink.user_pii
tutor dev do createuser any [email protected]
external_id
row).external_user_id
for the user within 10s.tutor dev do alembic -c "downgrade -3"
To do before merge: