You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
When referencing official statistical products e.g. in research papers, it requires extra steps to make the data at the time of referencing it available for readers. One way is to save the dataset and associated metadata as a separate file and include it as additional materials to the article. Another way would be possible if PxWeb product provided implicit support to FORCE11 Data Citation Principles.
(Somewhat similar list is provided by Research Data Alliance Working Group on Data Citations and I've discussed it in this GitHub issue for pxweb R package: rOpenGov/pxweb#266 )
Describe the solution you'd like
Data Citation Principles lists the 8 following principles:
Importance
Credit and Attribution
Evidence
Unique Identification
Access
Persistence
Specificity and Verifiability
Interoperability and Flexibility
Of these, 1-3 are related to practices in academic publishing and are not the responsibility of official statistics authorities and PxWeb dissemination portal - although technical solutions can make citing data easier and therefore more common. PxWeb provides tools for citing the datasets, although they could be always made a bit easier. Statistics Finland provides a readily available text string in each statistical product's homepage that can be used for referencing different statistical products but SCB seems to not offer such ready-made citations?
Official Statistics of Finland (OSF): Subject choices of students [online publication]. ISSN=1799-1056. Helsinki: Statistics Finland [Referenced: 1.8.2023]. Access method: https://stat.fi/en/statistics/ava
Items 4-8 are more related to PxWeb offerings.
4 "Unique identification" could be solved by providing an identifier like DOI to a specific dataset.
5 "Access" includes access to not only data but metadata, documentation, and other materials that are needed to understand the dataset. PxWeb has "About table" section that provides some information about the data table (such as units of measurement, when it was last updated etc.) and information can also be stored in Footnotes (only in free text?). Additionally, Statistics Finland has separate landing pages / Homepages for different statistics (such as https://stat.fi/en/statistics/ava) that has links to the statistical database but also Documentation. Could this be integrated into PxWeb somehow? Is it even viable / desirable?
6 "Persistence" is about: "Unique identifiers, and metadata describing the data, and its disposition, should persist — even beyond the lifespan of the data they describe". I understand this so that even out-of-date persistent identifiers should continue to be accessible and maybe be redirected to a newer version of the same dataset.
7 "Specificity and Verifiability" is about facilitating access to specific data subset / slice used in research. To my knowledge, if I save a query to PxWeb API with certain time and other filters, this does not guarantee that if I run the same saved query years later the returned dataset would be identical (or that the dataset would even be available if it was archived in the meantime).
8 "Interoperability and flexibility" is about providing some means of citation that is sufficiently interoperable and flexible to accommodate a wide variety of conventions. I think what many PxWeb implementations are currently doing is good enough.
Additional context
I'm not entirely sure about the legal framework and conventions related to statistical authorities and publishing official statistical products and I realise my suggestion might be incompatible with a modern understanding of open data, FAIR data and citable data. I acknowledge that official statistics may be different from other types of "dynamic data" in the sense that there is only one official version of each statistics and if there were some changes made, they were due to correcting errors and erroneous datasets are not stored anywhere. However, by opening this issue I would like to open a discussion if PxWeb as a technical solution for publishing statistics could offer some form of data versioning functionalities.
It would also seem that the implementation of PxWeb portal differs from one country or statistics provider to another. Also there seems to be many implementations that use slightly older versions of PxWeb. I may not be aware of how much features there are that are not made available in each implementation or if there are features that are custom-made for some specific instance. For example, Statistics Finland's implementation seems to have more features in the form of Statistics homepages and readily available citations that are not part of the core-spec.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
When referencing official statistical products e.g. in research papers, it requires extra steps to make the data at the time of referencing it available for readers. One way is to save the dataset and associated metadata as a separate file and include it as additional materials to the article. Another way would be possible if PxWeb product provided implicit support to FORCE11 Data Citation Principles.
(Somewhat similar list is provided by Research Data Alliance Working Group on Data Citations and I've discussed it in this GitHub issue for pxweb R package: rOpenGov/pxweb#266 )
Describe the solution you'd like
Data Citation Principles lists the 8 following principles:
Of these, 1-3 are related to practices in academic publishing and are not the responsibility of official statistics authorities and PxWeb dissemination portal - although technical solutions can make citing data easier and therefore more common. PxWeb provides tools for citing the datasets, although they could be always made a bit easier. Statistics Finland provides a readily available text string in each statistical product's homepage that can be used for referencing different statistical products but SCB seems to not offer such ready-made citations?
Items 4-8 are more related to PxWeb offerings.
4 "Unique identification" could be solved by providing an identifier like DOI to a specific dataset.
5 "Access" includes access to not only data but metadata, documentation, and other materials that are needed to understand the dataset. PxWeb has "About table" section that provides some information about the data table (such as units of measurement, when it was last updated etc.) and information can also be stored in Footnotes (only in free text?). Additionally, Statistics Finland has separate landing pages / Homepages for different statistics (such as https://stat.fi/en/statistics/ava) that has links to the statistical database but also Documentation. Could this be integrated into PxWeb somehow? Is it even viable / desirable?
6 "Persistence" is about: "Unique identifiers, and metadata describing the data, and its disposition, should persist — even beyond the lifespan of the data they describe". I understand this so that even out-of-date persistent identifiers should continue to be accessible and maybe be redirected to a newer version of the same dataset.
7 "Specificity and Verifiability" is about facilitating access to specific data subset / slice used in research. To my knowledge, if I save a query to PxWeb API with certain time and other filters, this does not guarantee that if I run the same saved query years later the returned dataset would be identical (or that the dataset would even be available if it was archived in the meantime).
8 "Interoperability and flexibility" is about providing some means of citation that is sufficiently interoperable and flexible to accommodate a wide variety of conventions. I think what many PxWeb implementations are currently doing is good enough.
Additional context
I'm not entirely sure about the legal framework and conventions related to statistical authorities and publishing official statistical products and I realise my suggestion might be incompatible with a modern understanding of open data, FAIR data and citable data. I acknowledge that official statistics may be different from other types of "dynamic data" in the sense that there is only one official version of each statistics and if there were some changes made, they were due to correcting errors and erroneous datasets are not stored anywhere. However, by opening this issue I would like to open a discussion if PxWeb as a technical solution for publishing statistics could offer some form of data versioning functionalities.
It would also seem that the implementation of PxWeb portal differs from one country or statistics provider to another. Also there seems to be many implementations that use slightly older versions of PxWeb. I may not be aware of how much features there are that are not made available in each implementation or if there are features that are custom-made for some specific instance. For example, Statistics Finland's implementation seems to have more features in the form of Statistics homepages and readily available citations that are not part of the core-spec.
The text was updated successfully, but these errors were encountered: