Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create association between GA and CKAN to group page views by agency #4567

Closed
4 tasks
dlennox24 opened this issue Dec 21, 2023 · 6 comments
Closed
4 tasks
Assignees
Labels
metrics Stats, metrics and data visualizations of catalog

Comments

@dlennox24
Copy link

dlennox24 commented Dec 21, 2023

User Story

In order to improve transparency to agency partners and the public, Data.gov needs to integrate unique tracking groups of agency partners and their datasets. This will allow Data.gov to publish statistics on the agency partners dataset viewership.

Acceptance Criteria

  • GIVEN current CKAN version and Google Analytics status for Data.gov
    THEN Data.gov integrate Google Analytics 4 and CKAN to group datasets into separate analytic data groups to determine overall agency viewership

Background

With the completed migration to GA4 and the integration into CKAN, Data.gov can now set up unique tracking groups for each agency.

Sketch

  • Determine how to associate a data set with a tracking group in CKAN
  • Create an agency tag in GA4
  • Validate the agency data is being populated with the correct tags
@dlennox24 dlennox24 self-assigned this Dec 21, 2023
@dlennox24 dlennox24 converted this from a draft issue Dec 21, 2023
@dlennox24 dlennox24 added the metrics Stats, metrics and data visualizations of catalog label Dec 21, 2023
@btylerburton
Copy link
Contributor

I believe we would implement this at the harvest source level so we can associate datasets with each unique ID.

@tdlowden
Copy link
Member

@dlennox24 attempted to do this by adding the values to the datalayer. When testing the solution we found:

  1. The values were not populating by the time the GA code ran, which meant at pageload, we could not collect the values.
  2. Those values were possibly susceptible to being translated by browser translation capabilities, which would scatter the values in reporting

I ended up working with a colleague to identify a spot where publisher and organization appeared in the DOM, but would not be translated. We thought we found this in the breadcrumbs:

Image

So we created CSS Selector variables to capture that URL, and then parse it to separate the query params for publisher and organization

Unfortunately, I then discovered that some pages do not have a publisher, and in that case, the entire li we were capturing does not appear.

So, I re-wrote a bunch of the variables to look the the last child li of the breadcrumb and created and IF/ELSE variable that would either get the query param for organization if it existed, or capture the last page path of the URL if not (which is the org when publisher is missing. The final setup in GTM does the following:

  1. Capture the URL found in the las-child li of the breadcrumbs
  2. Look to see if it has a query parameter (?)
  3. If it does, parses the URL to get the organization param
  4. If it doesn't, grab the last page path or the url, which is the org
  5. Parse the URL for publisher if it's there, otherwise return NO PUBLISHER
  6. Use Regex lookup to see if it's a dataset page by looking for /dataset/ to output both organization and publisher, and if not, returns NOT DATASET

I added DATAGOV_dataset_organization and DATAGOV_dataset_publisher as custom dimensions in GA and published the GTM container to prod. Testing looked good on this in debug mode, but tomorrow when I can check GA will be the real QA.

@tdlowden tdlowden moved this from 🏗 In Progress [8] to 👀 Needs Review [2] in data.gov team board Apr 25, 2024
@tdlowden tdlowden self-assigned this Apr 25, 2024
@tdlowden
Copy link
Member

tdlowden commented May 8, 2024

This is working, but not sufficiently. We are getting 60% of pageviews with org and publisher and 40% appearing as (not set).

Going to create a separate ticket to troubleshoot.

@tdlowden tdlowden moved this from 👀 Needs Review [2] to ✔ Done in data.gov team board May 8, 2024
@hkdctol hkdctol moved this from ✔ Done to 🗄 Closed in data.gov team board May 8, 2024
@btylerburton
Copy link
Contributor

There's a draft PR attached to this ticket that should be addressed. @robert-bryson can you take a look at that and update the status? Thanks.

GSA/ckanext-datagovtheme#193

@btylerburton btylerburton moved this from 🗄 Closed to 👀 Needs Review [2] in data.gov team board Jun 26, 2024
@btylerburton
Copy link
Contributor

moving to in review until the above gets resolved so we don't lose track of it.

@jbrown-xentity
Copy link
Contributor

As @tdlowden mentioned, the work was complete as designed. However, in practice the data flow was inconsistent. So #4743 was created to troubleshoot.
That draft PR is kind of an in-between mitigation step after this ticket was done, but before 4743 was created. It will be closed in favor of the work being done by Robby.

@github-project-automation github-project-automation bot moved this from 👀 Needs Review [2] to ✔ Done in data.gov team board Jun 28, 2024
@hkdctol hkdctol moved this from ✔ Done to 🗄 Closed in data.gov team board Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
metrics Stats, metrics and data visualizations of catalog
Projects
Archived in project
Development

No branches or pull requests

5 participants