Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PxWeb and Data Citation Principles #480

Open
pitkant opened this issue Aug 1, 2023 · 0 comments
Open

PxWeb and Data Citation Principles #480

pitkant opened this issue Aug 1, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@pitkant
Copy link

pitkant commented Aug 1, 2023

Is your feature request related to a problem? Please describe.
When referencing official statistical products e.g. in research papers, it requires extra steps to make the data at the time of referencing it available for readers. One way is to save the dataset and associated metadata as a separate file and include it as additional materials to the article. Another way would be possible if PxWeb product provided implicit support to FORCE11 Data Citation Principles.

(Somewhat similar list is provided by Research Data Alliance Working Group on Data Citations and I've discussed it in this GitHub issue for pxweb R package: rOpenGov/pxweb#266 )

Describe the solution you'd like
Data Citation Principles lists the 8 following principles:

  1. Importance
  2. Credit and Attribution
  3. Evidence
  4. Unique Identification
  5. Access
  6. Persistence
  7. Specificity and Verifiability
  8. Interoperability and Flexibility

Of these, 1-3 are related to practices in academic publishing and are not the responsibility of official statistics authorities and PxWeb dissemination portal - although technical solutions can make citing data easier and therefore more common. PxWeb provides tools for citing the datasets, although they could be always made a bit easier. Statistics Finland provides a readily available text string in each statistical product's homepage that can be used for referencing different statistical products but SCB seems to not offer such ready-made citations?

Official Statistics of Finland (OSF):  Subject choices of students [online publication]. ISSN=1799-1056. Helsinki: Statistics Finland [Referenced: 1.8.2023]. Access method: https://stat.fi/en/statistics/ava

Items 4-8 are more related to PxWeb offerings.

4 "Unique identification" could be solved by providing an identifier like DOI to a specific dataset.

5 "Access" includes access to not only data but metadata, documentation, and other materials that are needed to understand the dataset. PxWeb has "About table" section that provides some information about the data table (such as units of measurement, when it was last updated etc.) and information can also be stored in Footnotes (only in free text?). Additionally, Statistics Finland has separate landing pages / Homepages for different statistics (such as https://stat.fi/en/statistics/ava) that has links to the statistical database but also Documentation. Could this be integrated into PxWeb somehow? Is it even viable / desirable?

6 "Persistence" is about: "Unique identifiers, and metadata describing the data, and its disposition, should persist — even beyond the lifespan of the data they describe". I understand this so that even out-of-date persistent identifiers should continue to be accessible and maybe be redirected to a newer version of the same dataset.

7 "Specificity and Verifiability" is about facilitating access to specific data subset / slice used in research. To my knowledge, if I save a query to PxWeb API with certain time and other filters, this does not guarantee that if I run the same saved query years later the returned dataset would be identical (or that the dataset would even be available if it was archived in the meantime).

8 "Interoperability and flexibility" is about providing some means of citation that is sufficiently interoperable and flexible to accommodate a wide variety of conventions. I think what many PxWeb implementations are currently doing is good enough.

Additional context
I'm not entirely sure about the legal framework and conventions related to statistical authorities and publishing official statistical products and I realise my suggestion might be incompatible with a modern understanding of open data, FAIR data and citable data. I acknowledge that official statistics may be different from other types of "dynamic data" in the sense that there is only one official version of each statistics and if there were some changes made, they were due to correcting errors and erroneous datasets are not stored anywhere. However, by opening this issue I would like to open a discussion if PxWeb as a technical solution for publishing statistics could offer some form of data versioning functionalities.

It would also seem that the implementation of PxWeb portal differs from one country or statistics provider to another. Also there seems to be many implementations that use slightly older versions of PxWeb. I may not be aware of how much features there are that are not made available in each implementation or if there are features that are custom-made for some specific instance. For example, Statistics Finland's implementation seems to have more features in the form of Statistics homepages and readily available citations that are not part of the core-spec.

@pitkant pitkant added the enhancement New feature or request label Aug 1, 2023
@likp likp self-assigned this Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants