From ce5ffb1522611c478f24ff43081a88933b7fb8ff Mon Sep 17 00:00:00 2001 From: Najko Jahn Date: Tue, 7 Jan 2020 17:18:56 +0100 Subject: [PATCH] update docs --- docs/about.Rmd | 1 + docs/about.html | 66 ++++++++++++++++++++++++------------------------- docs/about.md | 34 ++++++++++++------------- 3 files changed, 51 insertions(+), 50 deletions(-) diff --git a/docs/about.Rmd b/docs/about.Rmd index d096ebf..a7b4386 100644 --- a/docs/about.Rmd +++ b/docs/about.Rmd @@ -43,6 +43,7 @@ jn_facets <- jsonlite::stream_in(file("../data/jn_facets_df.json"), verbose = FA #' get hybrid journals that have open licensing information in the period 2013-18 hybrid_cr <- readr::read_csv("../data/hybrid_publications.csv") %>% mutate(license = fct_infreq(license)) %>% + mutate(publisher = fct_infreq(publisher)) %>% mutate(year = factor(issued, levels = c("2013", "2014", "2015","2016", "2017", "2018", "2019"))) %>% arrange(desc(yearly_publisher_volume)) diff --git a/docs/about.html b/docs/about.html index c3adf2b..75551c7 100644 --- a/docs/about.html +++ b/docs/about.html @@ -2774,14 +2774,14 @@

About the Hybrid OA Dashboard

-

First published in November 2017, updated 2019-10-06

+

First published in November 2017, updated 2020-01-07

Summary

-

This open source dashboard presents the uptake of hybrid open access for 4,635 subscription journals from 64 publishers. Since 2013, these journals have published 6,791,045 articles, of which 226,942 were made openly available without delay, representing a hybrid open access share of 3.3%.

+

This open source dashboard presents the uptake of hybrid open access for 4,970 subscription journals from 68 publishers. Since 2013, these journals have published 7,390,888 articles, of which 250,087 were made openly available without delay, representing a hybrid open access share of 3.4%.

Hybrid open access journals are included when they share the following two characteristics:

  1. Academic institutions sponsored immediate open access publication of individual articles according to the Open APC initiative,
  2. @@ -2793,10 +2793,10 @@

    Summary

    Background and motivation

    Many publishers offer hybrid open access journals (Suber 2012). However, because of non-standardized reporting practices, it is hard to keep track of how many articles these journals provided in open access, and to what extent these figures relate to the overall article volume (Björk 2017; Laakso and Björk 2016). In particular, determining subscription-based journals that already did publish open access articles immediately, as well as obtaining licensing information about access and re-use rights is challenging (Laakso and Björk 2016; Piwowar et al. 2018). Another question around analyzing hybrid open access is how to distinguish it from delayed open access, a publishing model, which a considerable number of journals practise as well (Laakso and Björk 2013). In the delayed model, some or all articles of an issue are made openly available to all readers after an embargo period.

    -

    There are varying attempts to investigate the uptake of hybrid open access. While earlier bibliometric studies examined reports from publishers (Björk 2017) or obtained articles manually from publisher websites (Laakso and Björk 2016), Unpaywall has recently become a widely used service to discover open access articles including hybrid open acccess (Piwowar et al. 2018, @Else_2018). Integrated in major bibliometric databases like the Web of Science, Unpaywall links DOIs from Crossref, a DOI registration agency for scholarly works, to free full-texts. Several bibliometric studies and monitoring services investigated the prevalence of hybrid open access based on Unpaywall data (e.g. Bosman and Kramer (2018), Robinson-Garcia, Costas, and Leeuwen (2019) or the German Open Access Monitor). Following Piwowar et al. (2018), these approaches determined hybrid open access articles as publisher provided open access not being published in fully open access journals listed in the Directory of Open Access Journals (DOAJ). Moreover, articles must be published under an open content license to be characterized as hybrid open access, an information, which Unpaywall retrieved from Crossref and publisher websites.

    +

    There are varying attempts to investigate the uptake of hybrid open access. While earlier bibliometric studies examined reports from publishers (Björk 2017) or obtained articles manually from publisher websites (Laakso and Björk 2016), Unpaywall has recently become a widely used service to discover open access articles including hybrid open acccess (Piwowar et al. 2018, @Else_2018). Integrated in major bibliometric databases like the Web of Science, Unpaywall links DOIs from Crossref, a DOI registration agency for scholarly works, to free full-texts. Several bibliometric studies and monitoring services investigated the prevalence of hybrid open access based on Unpaywall data (e.g. Bosman and Kramer (2018), Robinson-Garcia, Costas, and Leeuwen (2019) or the German Open Access Monitor). Following Piwowar et al. (2018), these approaches determined hybrid open access articles as publisher provided open access not being published in fully open access journals listed in the Directory of Open Access Journals (DOAJ). Moreover, articles must be published under an open content license to be characterized as hybrid open access, an information, which Unpaywall retrieved from Crossref and publisher websites.

    Despite the considerable efforts to extend the evidence base around open access, Unpaywall’s approach to determine hybrid open access is limited in two respects. First, there are some fully open access journals, which are not indexed in the DOAJ, presumably because these journals did not comply with its comprehensive inclusion criteria. In a large-scale study, Crawford (2017) identified 7,743 non-DOAJ indexed fully open access journals that published at least one article beween 2012 and mid 2016. Second, Unpaywall focuses on current open access provision, but does not track when an article was made openly available, which, in turn, makes it hard to distinguish between immediate and delayed open access provision using this data source alone.1 Consequently, Piwowar et al. (2018), in contrast to most definitions and earlier studies (Laakso and Björk 2016), described hybrid open access articles tagged by Unpaywall as being “not necessarily immediately available (i.e., they may only be freely available after an embargo)”. For these reasons, facilitating bibliometric studies about hybrid open access remains challenging.

    This above-described lack of standardized methods and publicly available data about hybrid open access publishing limits not only its quantitative study, but also informed policy-making around open access (Laakso 2019). Particularly, the business model of hybrid open access journals is disputed, because publishers often charge publication fees, also known as article processing charges (APC), to provide immediate open access to individual articles in addition to subscriptions (Suber 2012). Although it was initially envisioned that with growing funding opportunities for publication fees publishers would progressively transition hybrid open access journals to fully open access (Prosser 2003), it remains unclear to which extent the increasing willingness to pay for open access contributed to it, and how cost-effective these spendings were (Björk and Solomon 2014; Pinfield, Salter, and Bath 2017).

    -

    Funders and libraries have responded to the problems of missing evidence around hybrid open access publishing in the last years. To make expenditures more transparent, a growing number of institutions have started to disclose individual articles they supported as open data. The Wellcome Trust, the Austrian Science Fund FWF and British universities were among the first who shared their spendings for hybrid open access articles (Kiley 2014; Reckling and Kenzian 2014; Lawson 2015). The Open APC Initiative collects and standardizes these openly available spending data together with crowd-sourced expenditures. Because Crossref indexes most articles where institutions sponsored publication fees, Open APC can use its metadata services to make open access expenditures comparable at the level of institutions, publishers and journals (Jahn and Tullney 2016; Pieper and Broschinski 2018). So far, the Open APC Initiative disclosed 66,304 hybrid open access articles supported by 298 research performing organisations and funders between 2013 - 2019.

    +

    Funders and libraries have responded to the problems of missing evidence around hybrid open access publishing in the last years. To make expenditures more transparent, a growing number of institutions have started to disclose individual articles they supported as open data. The Wellcome Trust, the Austrian Science Fund FWF and British universities were among the first who shared their spendings for hybrid open access articles (Kiley 2014; Reckling and Kenzian 2014; Lawson 2015). The Open APC Initiative collects and standardizes these openly available spending data together with crowd-sourced expenditures. Because Crossref indexes most articles where institutions sponsored publication fees, Open APC can use its metadata services to make open access expenditures comparable at the level of institutions, publishers and journals (Jahn and Tullney 2016; Pieper and Broschinski 2018). So far, the Open APC Initiative disclosed 73,292 hybrid open access articles supported by 305 research performing organisations and funders between 2013 - 2019.

    Additionally, funders and libraries have developed compliance criteria including machine-readable Creative Commons license statements to improve the discoverability of open access content. The Sponsoring Consortium for Open Access Publishing in Particle Physics – SCOAP\(^3\), for example, requires CC-BY licenses. It also archives the full-text of funded articles in several formats in a dedicated repository. In Europe, the Wellcome Trust refers to the life science repository Europe PubMed Central (Europe PMC) for depositing funded articles along with a comprehensive set of metadata. Moreover, the funder automatically checks if authors and publishers comply with these obligations (Kiley 2015). In its contract with the publisher Wiley, the German DEAL consortium stated comprehensive metadata obligations to be implemented using Crossref and its metadata profile (Sander et al. 2019). Likewise, the German Deutsche Forschungsgemeinschaft (DFG) referred to Crossref as metadata service for hybrid open access articles facilitated through its funding programme “Open Access Transition Agreements”. In the US, Chorus, a non-profit serving more than 50 publishers, analyzes Crossref metadata and assesses open access compliance for dedicated funders using interactive dashboards.

    In the light of a perceived slow and ineffective growth of open access, more and more funders and libraries alter their spending on subscription-based journals and individual open access articles published in these outlets. Notably, the Open Access 2020 Initiative calls for a transparent approach to re-allocate budgets currently spent for subscriptions to open access business models. Likewise, the cOAlition S, an international network of research funders, released its widely discussed Plan S in September 2018. Starting from 2021, members of the cOAlition S intend to discontinue funding of publication fees for individual open access articles in subscription-based journals. However, hybrid open access articles published under a transformative agreement, in which spendings for subscription and open access publication are considered together, will remain eligible for funding. Nonetheless, the Plan S requires a commitment to full open access transition from the publishers. Journals providing delayed open access are not compatible with the Plan S.

    In its Plan S implementation planning the cOAlition S refers to the ESAC Initiative, which aims at standardizing open access workflows between publishers and research institutions, to observe the transition process of journals under transformative agreements. In 2017, ESAC, based at the German Max Planck Digital Library (MPDL), released guidance about how to implement transformative agreements. According to these guidelines, publishers have to ensure that only corresponding authors affiliated with the agreement institution are eligible for open access support. Furthermore, publishers need to provide comprehensive metadata to Crossref including open content license information. They are also obliged to report sponsored articles to the agreement institutions (Geschuhn and Stone 2017). Complementary to spendings for individual hybrid open access articles, some library consortia like the Swedish Bibsam or the British Jisc have begun to make these reports openly available with the Open APC Initiative (Lundén, Smith, and Wideberg 2018; Pieper and Broschinski 2018). To support the monitoring of transformative agreements at the publisher-level, ESAC has started an agreement registry targeting national consortia to disclose their contracts.

    @@ -2818,9 +2818,9 @@

    Background and motivation

    Data and methods

    Methods follow the Wickham-Grolemund approach to practice data science (Wickham and Grolemund 2017). After importing, cleaning (“tidying”) and transforming data from various sources, a process called “data wrangling”, summary statistics were calculated and visualized to understand and communicate the uptake of hybrid open access publishing. For the latter, we created a dashboard, which allows visual interaction with our data. The workflow, illustrated in Figure 1 and described more detailed in this section, was implemented in R using open data and tools, making it transparent and re-usable.

    -

    This project, which is under active developement, started in November 2017, and is updated on a regular basis. Data used in the current version were gathered on 2019-10-06. Along with the data, methods are shared in the source code repository of this project, which is hosted on GitHub.

    +

    This project, which is under active developement, started in November 2017, and is updated on a regular basis. Data used in the current version were gathered on 2020-01-07. Along with the data, methods are shared in the source code repository of this project, which is hosted on GitHub.

    -Summary of data and methods used, following the Wickham-Grolemund approach to practice data science (Wickham and Grolemund 2017) +

    Summary of data and methods used, following the Wickham-Grolemund approach to practice data science (Wickham and Grolemund 2017)

    @@ -2841,25 +2841,25 @@

    Data Accuracy

    To assess data accuracy of this dashboard, we first investigated how many hybrid open journals covered by the Open APC initiative provided license statements to Crossref between 2013 - 2019. Next, we evaluated the accuracy of the automated retrieval using article random samples.3

    Coverage accuracy

    -

    In the case of hybrid open access journals represented in the Open APC datasets, 64 publishers provided licensing statements via the Crossref API, representing 33 % of all publishers studied. At the journal-level, 85 % of all hybrid open access journal titles covered by the Open APC initiative are associated with open content license statements valid without delay in Crossref. Figure 2 provides a breakdown of licensing metadata coverage per publisher. It highlights that leading commercial publishing houses provided open content license statements to Crossref for most of their journals listed by the Open APC initiative including Springer Nature, Elsevier BV and Wiley. SAGE Publications and many smaller sized publishers, however, did not share license metadata to Crossref. Consequently, their journals were not included in our study.

    +

    In the case of hybrid open access journals represented in the Open APC datasets, 68 publishers provided licensing statements via the Crossref API, representing 33 % of all publishers studied. At the journal-level, 84 % of all hybrid open access journal titles covered by the Open APC initiative are associated with open content license statements valid without delay in Crossref. Figure 2 provides a breakdown of licensing metadata coverage per publisher. It highlights that leading commercial publishing houses provided open content license statements to Crossref for most of their journals listed by the Open APC initiative including Springer Nature, Elsevier BV and Wiley. SAGE Publications and many smaller sized publishers, however, did not share license metadata to Crossref. Consequently, their journals were not included in our study.

    -Overview of Crossref licensing coverage per publisher. Yellow dots represent the number of hybrid open access journals disclosed by the Open APC initiative with licensing metadata, blue dots the overall number of hybrid open access journals in our sample. +

    Overview of Crossref licensing coverage per publisher. Yellow dots represent the number of hybrid open access journals disclosed by the Open APC initiative with licensing metadata, blue dots the overall number of hybrid open access journals in our sample.

    Retrieval accuracy

    -

    47,340 out of 66,304 hybrid open access articles disclosed by the Open APC initiative provided open content license statements valid without delay, representing a percentage of 71 %. To assess the accuracy of our retrieval, we manually checked a random sample of 100 Open APC articles for which no license was found. We determined 93 articles that did not share license statements with Crossref using the license node. The other 7 articles did report an open content license, but with a delay (delay-in-days metadata field) above 0 days.4

    +

    51,677 out of 73,292 hybrid open access articles disclosed by the Open APC initiative provided open content license statements valid without delay, representing a percentage of 71 %. To assess the accuracy of our retrieval, we manually checked a random sample of 100 Open APC articles for which no license was found. We determined 93 articles that did not share license statements with Crossref using the license node. The other 7 articles did report an open content license, but with a delay (delay-in-days metadata field) above 0 days.4

    Drawing another random sample of 100 articles, we manually validated if we obtained the correct license statements including start date from the Crossref API. We also evaluated if all journal articles were original articles or reviews. 94 articles were characterized as original article or review, confirming previous studies (Piwowar et al. 2018). Other document types were conference abstracts (N = 3), a medical guideline, a comment and a short report.

    -

    We were able to retrieve email addresses for 93 % of all articles in our dataset. Furthermore, recall and precision for obtaining and extracting email addresses from full-texts were investigated using a random sample of 200 articles. Recall asked if all author email addresses were found. Precision asked if the match of the first author email address was correct. While all first author email addresses were correctly parsed (precision = 1), 40 articles contained more than one email address from authors (recall = 0.8). In two cases, the corresponding author named two email addresses.

    +

    We were able to retrieve email addresses for 91 % of all articles in our dataset. Furthermore, recall and precision for obtaining and extracting email addresses from full-texts were investigated using a random sample of 200 articles. Recall asked if all author email addresses were found. Precision asked if the match of the first author email address was correct. While all first author email addresses were correctly parsed (precision = 1), 40 articles contained more than one email address from authors (recall = 0.8). In two cases, the corresponding author named two email addresses.

    Results

    -

    Using data from Open APC and Crossref, we found 226,942 open access articles published without delay in 4,635 subscription-based journals from 64 publishers between 2013 - 2019. Overall, these journals published 6,791,045 articles, resulting in a hybrid open access share of 3.3 %.

    +

    Using data from Open APC and Crossref, we found 250,087 open access articles published without delay in 4,970 subscription-based journals from 68 publishers between 2013 - 2019. Overall, these journals published 7,390,888 articles, resulting in a hybrid open access share of 3.4 %.

    In the following, this section presents the functionality of the interactive dashboard, and highlights key findings from an exploratory data analysis. The dashboard itself is divided into three webpages, which are accessible through the navigation bar. The first webpage, “Overview”, summarizes the development of hybrid open access publishing. The analysis can be subsetted by a publisher or a journal. The second webpage, “Compare”, lets you analyze the uptake of hybrid open access across publisher and years. The third webpage, “Institutional View”, is similar to the first webpage, but allows browsing by email domains instead of publishers and journals.

    Overview: Longitudinal development of hybrid open access publishing

    @@ -2867,37 +2867,37 @@

    Overview: Longitudinal development of hybrid open access publishing

    Growth of hybrid open access

    -

    The upper part of the dashboard highlights the longitudinal development of hybrid open access publishing between 2013 - 2019. The first tab shows the relative uptake, while the second tab presents the hybrid open access article count on a yearly basis. Similar to the dashboard, Figure 3 illustrates the relative and absolute uptake since 2013. Bar charts are sub-grouped according to normalized CC license types. Overall, results indicate that the number and proportion of hybrid open access journal articles rose steadily from 2013 (9,972 articles, OA share: 1.2 %) to 2018 (53,602 articles, OA share: 4.8 %). CC-BY is the most prevalent open content license found. Around 65 % of open access articles were made available using this licence, followed by the less permissive license CC-BY-NC-ND, representing 25 % of the articles.

    +

    The upper part of the dashboard highlights the longitudinal development of hybrid open access publishing between 2013 - 2019. The first tab shows the relative uptake, while the second tab presents the hybrid open access article count on a yearly basis. Similar to the dashboard, Figure 3 illustrates the relative and absolute uptake since 2013. Bar charts are sub-grouped according to normalized CC license types. Overall, results indicate that the number and proportion of hybrid open access journal articles rose steadily from 2013 (10,153 articles, OA share: 1.1 %) to 2018 (55,146 articles, OA share: 4.8 %). CC-BY is the most prevalent open content license found. Around 65 % of open access articles were made available using this licence, followed by the less permissive license CC-BY-NC-ND, representing 25 % of the articles.

    -Uptake of hybrid open access license statements. Graph A shows the relative growth, B the absolute number of articles found since 2013. +

    Uptake of hybrid open access license statements. Graph A shows the relative growth, B the absolute number of articles found since 2013.

    Comparison with license information from Unpaywall

    -

    The third tab presents the article coverage of the dashboard, and compares it with the number of additional articles that were retrieved from Unpaywall for the same set of hybrid open access journals. Although Unpaywall determined hybrid open access articles using open content licenses from Crossref as well, the service furtermore searched publisher websites. In doing so, Unpaywall did not keep track when an article was made openly avaible. The extent of the discrepancies in this comparison indicates how comprehensively publishers reported license metadata to Crossref, and whether or not journals provided delayed open access along with hybrid open access. In total, 142,161 additional open access articles could be retrieved using Unpaywall.

    -

    Similar to the dashboard presentation, Figure 4 compares the development of hybrid open access according to our data (blue area) with additional articles with open access license statements from Unpaywall (gray area). The figure provides a yearly breakdown for the two largest publishers in our sample, Elsevier and Springer Nature. Open access articles from the remaining 62 publishers were reduced to the residual category “Other”. The sharp decline of articles found by Unpaywall for Elsevier journals suggests that delayed open access was offered along with options to make individual articles open access upon publication. Indeed, Elsevier lists 130 journals with embargo periods between six and 48 months. A prominent example is the life-science journal Cell where all articles are made freely available after an embargo period of twelve months. While we found 238 hybrid open access articles published without delay, Unpaywall lists another 2,396 Cell articles with open content license, representing 55 % of the journal’s article volume that had been published since 2013. On the other hand, the number of additional Springer Nature articles obtained using Unpaywall’s license evidence is much lower, suggesting that delayed open access plays a smaller role in Springer Nature’s hybrid open access journal portfolio, and that license statements were shared to a large extent with Crossref.

    +

    The third tab presents the article coverage of the dashboard, and compares it with the number of additional articles that were retrieved from Unpaywall for the same set of hybrid open access journals. Although Unpaywall determined hybrid open access articles using open content licenses from Crossref as well, the service furtermore searched publisher websites. In doing so, Unpaywall did not keep track when an article was made openly avaible. The extent of the discrepancies in this comparison indicates how comprehensively publishers reported license metadata to Crossref, and whether or not journals provided delayed open access along with hybrid open access. In total, 144,027 additional open access articles could be retrieved using Unpaywall.

    +

    Similar to the dashboard presentation, Figure 4 compares the development of hybrid open access according to our data (blue area) with additional articles with open access license statements from Unpaywall (gray area). The figure provides a yearly breakdown for the two largest publishers in our sample, Elsevier and Springer Nature. Open access articles from the remaining 66 publishers were reduced to the residual category “Other”. The sharp decline of articles found by Unpaywall for Elsevier journals suggests that delayed open access was offered along with options to make individual articles open access upon publication. Indeed, Elsevier lists 130 journals with embargo periods between six and 48 months. A prominent example is the life-science journal Cell where all articles are made freely available after an embargo period of twelve months. While we found 248 hybrid open access articles published without delay, Unpaywall lists another 2,324 Cell articles with open content license, representing 52 % of the journal’s article volume that had been published since 2013. On the other hand, the number of additional Springer Nature articles obtained using Unpaywall’s license evidence is much lower, suggesting that delayed open access plays a smaller role in Springer Nature’s hybrid open access journal portfolio, and that license statements were shared to a large extent with Crossref.

    -Hybrid Open Access License Statements by publisher and source +

    Hybrid Open Access License Statements by publisher and source

    Transparency of hybrid open access

    In the lower part of the dashboard page “Overview”, the left-hand graph shows the extent to which information about open access funding was publicly available. On the right, an interactive chart presents the top-level and lower-level domain of the first or corresponding author’s email address, respectively.

    -

    Similarly to the left hand graph, Figure 5 presents the availability of spending information for open access articles, highlighting the two largest publishers Elsevier BV and Springer Nature. Open access articles from the remaining 62 publishers were reduced to the the residual category “Other”. Bars show the total number of hybrid open access articles per year and publisher. The bars are stacked according to the sources that disclosed the support of open access publication at the article-level (colored stacks), and where no such spending data was available for hybrid open access articles found (gray stacks labeled with NA).

    -

    The figure reveals large differences how funders and libraries enabled hybrid open access publications across publishers. While individual payments for publication fees dominated the sponsorship of open access publications in Elsevier journals, and that of many other publishers, supported Springer Nature hybrid open access articles mostly come from transformative agreements (TA). The figure also highlights that these agreements contributed to the growth of hybrid open access articles published in Springer Nature journals. Moreover, Figure 5 suggests that transformative agreements contribute to transparency: Whereas spending for 36 % of hybrid open access articles published in Springer Nature journals was disclosed with the Open APC initiative, the origin of expenditure for hybrid open access articles in Elsevier journals was available to a lesser extent (13 %). The largest match between publication and spending data, however, could be found for hybrid open access journals sponsored by the SCOAP\(^3\) consortium: the SCOAP\(^3\) repository tracked 94 % of hybrid open access articles published in SCOAP\(^3\) sponsored journals. It must be noted, however, that SCOAP\(^3\) supports high-energy physics content in these journals only. The remaining 6.2 % were therefore likely published on related topics. In total, spending information for 23 % of articles was found.

    +

    Similarly to the left hand graph, Figure 5 presents the availability of spending information for open access articles, highlighting the two largest publishers Elsevier BV and Springer Nature. Open access articles from the remaining 66 publishers were reduced to the the residual category “Other”. Bars show the total number of hybrid open access articles per year and publisher. The bars are stacked according to the sources that disclosed the support of open access publication at the article-level (colored stacks), and where no such spending data was available for hybrid open access articles found (gray stacks labeled with NA).

    +

    The figure reveals large differences how funders and libraries enabled hybrid open access publications across publishers. While individual payments for publication fees dominated the sponsorship of open access publications in Elsevier journals, and that of many other publishers, supported Springer Nature hybrid open access articles mostly come from transformative agreements (TA). The figure also highlights that these agreements contributed to the growth of hybrid open access articles published in Springer Nature journals. Moreover, Figure 5 suggests that transformative agreements contribute to transparency: Whereas spending for 34 % of hybrid open access articles published in Springer Nature journals was disclosed with the Open APC initiative, the origin of expenditure for hybrid open access articles in Elsevier journals was available to a lesser extent (16 %). The largest match between publication and spending data, however, could be found for hybrid open access journals sponsored by the SCOAP\(^3\) consortium: the SCOAP\(^3\) repository tracked 94 % of hybrid open access articles published in SCOAP\(^3\) sponsored journals. It must be noted, however, that SCOAP\(^3\) supports high-energy physics content in these journals only. The remaining 6.3 % were therefore likely published on related topics. In total, spending information for 23 % of articles was found.

    -Development of spending disclosure for hybrid open Access articles across publishers. Sources include expenditures for individual articles (“Open APC (Hybrid)”) and articles from transformative agreements (“Open APC (TA)”) provided by the Open APC Initiative. Overlap between hybrid open access articles found via Crossref and the SCOAP^3 repository is displayed as well. The light gray stack areas represent the number of articles where no information about open access sponsorship was available. Notice that it is very likely that the overall decrease of spending for hybrid open access reported to the Open APC initiative in 2019 is due to a lag between the time payments were made and reporting of payments to the initiative. +

    Development of spending disclosure for hybrid open Access articles across publishers. Sources include expenditures for individual articles (“Open APC (Hybrid)”) and articles from transformative agreements (“Open APC (TA)”) provided by the Open APC Initiative. Overlap between hybrid open access articles found via Crossref and the SCOAP\(^3\) repository is displayed as well. The light gray stack areas represent the number of articles where no information about open access sponsorship was available. Notice that it is very likely that the overall decrease of spending for hybrid open access reported to the Open APC initiative in 2019 is due to a lag between the time payments were made and reporting of payments to the initiative.

    Email Domains from First / Corresponding Author

    The interactive chart on the lower right of the first dashboard page presents email domains extracted from open access full-texts. These domains roughly indicate the affiliation of the first or of the corresponding authors, respectively, a data point used to delineate open access funding. In the dashboard, a hierarchical, interactive treemap visualizes the distribution of the email domains. Each top-level domain can be subdivided further into domain names representing academic institutions or companies. The size of each rectangle is proportional to the number of hybrid open access articles corresponding to this domain.

    -

    Figure 6 presents a breakdown by email domain suffix. In total, 211,506 email addresses were retrieved and parsed, representing 93 % of all articles in our dataset. 44,433 articles were associated with academic institutions in the UK (“ac.uk”), followed by domains from commercial organizations, mostly email providers like gmail.com or the Chinese 163.com and 126.com, and US-American institutions of higher education (“edu”). The figure illustrates that European institutions from the Netherlands (“nl”), Germany (“de”), Sweden (“se”), Austria (“ac.at”) and Poland (“edu.pl”) were among the top 10. In total, 445 top-level domains were retrieved.

    +

    Figure 6 presents a breakdown by email domain suffix. In total, 228,433 email addresses were retrieved and parsed, representing 91 % of all articles in our dataset. 46,865 articles were associated with academic institutions in the UK (“ac.uk”), followed by domains from commercial organizations, mostly email providers like gmail.com or the Chinese 163.com and 126.com, and US-American institutions of higher education (“edu”). The figure illustrates that European institutions from the Netherlands (“nl”), Germany (“de”), Sweden (“se”), Austria (“ac.at”) and Poland (“edu.pl”) were among the top 10. In total, 451 top-level domains were retrieved.

    -Text-mined top-level domains of author email addresses from hybrid open access journal articles. Only the first email address per article was considered. +

    Text-mined top-level domains of author email addresses from hybrid open access journal articles. Only the first email address per article was considered.

    @@ -2905,14 +2905,14 @@

    Email Domains from First / Corresponding Author

    Compare: Variation in Hybrid Open Access Publishing

    The second dashboard page, “Compare”, allows to explore how hybrid open access publishing varies across publishers. It focuses on market shares, and uptake levels of hybrid open access across journals and publishers. A table at the bottom of the webpage presents key indicators at the journal-level, which can be searched and downloaded as Excel spreadsheet.

    -

    Figure 7, which is also shown at the top of the “Overview” dashboard page, presents the top three publishers in terms of the number of hybrid open access articles published since 2013. Grey bars represent the total number of hybrid open access articles, colored bars the number of hybrid open access articles per publisher. These top three publishers – Elsevier BV, Springer Nature and Wiley – accounted for the largest proportion of open access articles (78 %), and that of hybrid open access journals (74 %).

    +

    Figure 7, which is also shown at the top of the “Overview” dashboard page, presents the top three publishers in terms of the number of hybrid open access articles published since 2013. Grey bars represent the total number of hybrid open access articles, colored bars the number of hybrid open access articles per publisher. These top three publishers – Elsevier BV, Springer Nature and Wiley – accounted for the largest proportion of open access articles (78 %), and that of hybrid open access journals (72 %).

    -Hybrid open access articles by publisher, shown as proportion of the total number of hybrid open access articles found. The colored bars show the number of articles per publisher, grey bars the overall distribution. +

    Hybrid open access articles by publisher, shown as proportion of the total number of hybrid open access articles found. The colored bars show the number of articles per publisher, grey bars the overall distribution.

    -

    As shown in Figure 8, and in the mid-left panel in the dashboard’s “Compare” page, numbers and proportion of hybrid open access journal articles vary across publishers and journals. In the two-years period 2017-2018, for example, the mean open access proportion per Springer Nature journal was 12 % (SD = 9.9 %), whereas the mean open access proportion per journal published by Elsevier BV was 4.3 % (SD = 5 %). The publisher Wiley performed between these two: The mean proportion was 6 % (SD = 5.3 %).

    +

    As shown in Figure 8, and in the mid-left panel in the dashboard’s “Compare” page, numbers and proportion of hybrid open access journal articles vary across publishers and journals. In the two-years period 2017-2018, for example, the mean open access proportion per Springer Nature journal was 12 % (SD = 10 %), whereas the mean open access proportion per journal published by Elsevier BV was 4.2 % (SD = 4.9 %). The publisher Wiley performed between these two: The mean proportion was 5.9 % (SD = 5.2 %).

    -Box plot characterizing spread and differences of the share of open access articles provided by subscription-based journal per publisher between 2017 and 2018 using five summary statistics (the median, the 25th and 75th percentiles, and 1.5 times the inter-quartile range between the first and third quartiles), and visualizing all outlying points individually. Axis displaying proportions was limited to around 30%. +

    Box plot characterizing spread and differences of the share of open access articles provided by subscription-based journal per publisher between 2017 and 2018 using five summary statistics (the median, the 25th and 75th percentiles, and 1.5 times the inter-quartile range between the first and third quartiles), and visualizing all outlying points individually. Axis displaying proportions was limited to around 30%.

    @@ -2923,7 +2923,7 @@

    Institutional view

    Figure 9 illustrates the uptake of hybrid open access, focusing on the public availability of spending information from academic institutions and consortia, an important measure to assess the transparency of open access funding. The figure highlights the prominent role hybrid open access publishing has among authors affiliated with British academic institutions; more than 20% of all hybrid open access articles were published from UK-based first or corresponding authors. According to spending data from the Open APC initiative, hybrid open access publication for articles from the UK were facilitated both through individual publication fees, also known as article-processing charges (APC), and transformative agreements.

    The large overlap between publication data and spending information from the Open APC collection for transformative agreements can be explained by coordinated national activities to transition subscriptions. Together with the UK, Dutch, Swedish and Austrian library consortia participate in the Springer Compact license scheme. Under this agreement, members from participating institutions are subscribed to read around 1,800 journals. Corresponding authors affiliated with these institutions can also publish open access in them. These consortia actively fed data about supported articles into Open APC (Pieper and Broschinski 2018). Spending information from countries without an coordinated approach toward hybrid open acces publishing including transformative agreements, however, were not well represented in the spending data provided by the Open APC initiative. The sponsors of these open access publications were therefore unknow to us.

    -Availability of spending information for hybrid open access articles by top-level domain of email address from first respective corresponding authors +

    Availability of spending information for hybrid open access articles by top-level domain of email address from first respective corresponding authors

    @@ -2955,8 +2955,8 @@

    How to contribute?

    -

    Bibliography

    -
    +

    Bibliography

    +

    Akbaritabar, Aliakbar, and Stephan Stahlschmidt. 2019. “Applying Crossref and Unpaywall Information to Identify Gold, Hidden Gold, Hybrid and Delayed Open Access Publications in the KB Publication Corpus.” SocArXiv, May. https://doi.org/10.31235/osf.io/sdzft.

    @@ -3106,17 +3106,17 @@

    Bibliography

    Wickham, Hadley, Jim Hester, and Romain Francois. 2017. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.

    -

    Xie, Yihui, J.J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.

    +

    Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.


      -
    1. See also comments from Mark Patterson and, in a similar vein, Najko Jahn to the preprint version of Piwowar et al. (2018).

    2. -
    3. See Jahn and Hobert (2019) for more details about how we use Unpaywall and Big Query in our data analytics work.

    4. -
    5. I gratefully acknowledge that manual checking was performed by Alexandra Claases (hybrid open access availablity and parsing of Crossref metadata), and Cäcilia Schröer and Nick Haupka (email address retrieval https://github.com/naustica/praktikum_projekt/tree/master/aufgabe_1_recall_precision).

    6. -
    7. For example, the article represented by https://api.crossref.org/works/10.1039/c7dt03848h did report a CC-BY license with a delay of 32.

    8. +
    9. See also comments from Mark Patterson and, in a similar vein, Najko Jahn to the preprint version of Piwowar et al. (2018).↩︎

    10. +
    11. See Jahn and Hobert (2019) for more details about how we use Unpaywall and Big Query in our data analytics work.↩︎

    12. +
    13. I gratefully acknowledge that manual checking was performed by Alexandra Claases (hybrid open access availablity and parsing of Crossref metadata), and Cäcilia Schröer and Nick Haupka (email address retrieval https://github.com/naustica/praktikum_projekt/tree/master/aufgabe_1_recall_precision).↩︎

    14. +
    15. For example, the article represented by https://api.crossref.org/works/10.1039/c7dt03848h did report a CC-BY license with a delay of 32.↩︎

    diff --git a/docs/about.md b/docs/about.md index 6a639ae..4f6fe84 100644 --- a/docs/about.md +++ b/docs/about.md @@ -1,6 +1,6 @@ --- title: "About the Hybrid OA Dashboard" -date: "First published in November 2017, updated 2019-10-06" +date: "First published in November 2017, updated 2020-01-07" output: html_document: df_print: paged @@ -18,7 +18,7 @@ csl: chicago.csl ## Summary -[This open source dashboard](https://subugoe.shinyapps.io/hybridoa/) presents the uptake of hybrid open access for 4,635 subscription journals from 64 publishers. Since 2013, these journals have published 6,791,045 articles, of which 226,942 were made openly available without delay, representing a hybrid open access share of 3.3%. +[This open source dashboard](https://subugoe.shinyapps.io/hybridoa/) presents the uptake of hybrid open access for 4,970 subscription journals from 68 publishers. Since 2013, these journals have published 7,390,888 articles, of which 250,087 were made openly available without delay, representing a hybrid open access share of 3.4%. Hybrid open access journals are included when they share the following two characteristics: @@ -39,7 +39,7 @@ Despite the considerable efforts to extend the evidence base around open access, This above-described lack of standardized methods and publicly available data about hybrid open access publishing limits not only its quantitative study, but also informed policy-making around open access [@Laakso_2019]. Particularly, the business model of hybrid open access journals is disputed, because publishers often charge publication fees, also known as article processing charges (APC), to provide immediate open access to individual articles in addition to subscriptions [@Suber_2012]. Although it was initially envisioned that with growing funding opportunities for publication fees publishers would progressively transition hybrid open access journals to fully open access [@Prosser_2003], it remains unclear to which extent the increasing willingness to pay for open access contributed to it, and how cost-effective these spendings were [@Bj_rk_2014; @Pinfield_2017]. -Funders and libraries have responded to the problems of missing evidence around hybrid open access publishing in the last years. To make expenditures more transparent, a growing number of institutions have started to disclose individual articles they supported as open data. The Wellcome Trust, the Austrian Science Fund FWF and British universities were among the first who shared their spendings for hybrid open access articles [@Kiley_2014; @fwf_apc_13; @jisc_14]. The [Open APC Initiative](https://github.com/openapc/openapc-de) collects and standardizes these openly available spending data together with crowd-sourced expenditures. Because Crossref indexes most articles where institutions sponsored publication fees, Open APC can use its metadata services to make open access expenditures comparable at the level of institutions, publishers and journals [@Jahn_2016; @Pieper_2018]. So far, the Open APC Initiative disclosed 66,304 hybrid open access articles supported by 298 research performing organisations and funders between 2013 - 2019. +Funders and libraries have responded to the problems of missing evidence around hybrid open access publishing in the last years. To make expenditures more transparent, a growing number of institutions have started to disclose individual articles they supported as open data. The Wellcome Trust, the Austrian Science Fund FWF and British universities were among the first who shared their spendings for hybrid open access articles [@Kiley_2014; @fwf_apc_13; @jisc_14]. The [Open APC Initiative](https://github.com/openapc/openapc-de) collects and standardizes these openly available spending data together with crowd-sourced expenditures. Because Crossref indexes most articles where institutions sponsored publication fees, Open APC can use its metadata services to make open access expenditures comparable at the level of institutions, publishers and journals [@Jahn_2016; @Pieper_2018]. So far, the Open APC Initiative disclosed 73,292 hybrid open access articles supported by 305 research performing organisations and funders between 2013 - 2019. Additionally, funders and libraries have developed compliance criteria including machine-readable Creative Commons license statements to improve the discoverability of open access content. The Sponsoring Consortium for Open Access Publishing in Particle Physics -- SCOAP$^3$, for example, requires CC-BY licenses. It also archives the full-text of funded articles in several formats in a dedicated repository. In Europe, the Wellcome Trust refers to the life science repository [Europe PubMed Central (Europe PMC)](http://europepmc.org/) for depositing funded articles along with a comprehensive set of metadata. Moreover, the funder [automatically checks](https://compliance.cottagelabs.com/docs) if authors and publishers comply with these obligations [@Kiley_2015]. In its contract with the publisher Wiley, the German DEAL consortium stated comprehensive metadata obligations to be implemented using Crossref and its metadata profile [@Sander_2019]. Likewise, the German Deutsche Forschungsgemeinschaft (DFG) referred to Crossref as metadata service for hybrid open access articles facilitated through its funding programme ["Open Access Transition Agreements"](https://www.dfg.de/en/research_funding/announcements_proposals/2017/info_wissenschaft_17_12/index.html). In the US, Chorus, a non-profit serving more than 50 publishers, analyzes Crossref metadata and assesses open access compliance for dedicated funders using [interactive dashboards](https://dashboard.chorusaccess.org/). @@ -65,7 +65,7 @@ The aim of this work is to address the following questions that are relevant to Methods follow the Wickham-Grolemund approach to practice data science [@Wickham_2017]. After importing, cleaning ("tidying") and transforming data from various sources, a process called "data wrangling", summary statistics were calculated and visualized to understand and communicate the uptake of hybrid open access publishing. For the latter, we created a dashboard, which allows visual interaction with our data. The workflow, illustrated in Figure 1 and described more detailed in this section, was implemented in R using open data and tools, making it transparent and re-usable. -This project, which is under active developement, started in November 2017, and is updated on a regular basis. Data used in the current version were gathered on 2019-10-06. Along with the data, methods are shared in the source code repository of this project, which is hosted on [GitHub](https://github.com/subugoe/hybrid_oa_dashboard). +This project, which is under active developement, started in November 2017, and is updated on a regular basis. Data used in the current version were gathered on 2020-01-07. Along with the data, methods are shared in the source code repository of this project, which is hosted on [GitHub](https://github.com/subugoe/hybrid_oa_dashboard). ![Summary of data and methods used, following the Wickham-Grolemund approach to practice data science [@Wickham_2017]\label{flow}](flow.png) @@ -97,7 +97,7 @@ To assess data accuracy of this dashboard, we first investigated how many hybrid #### Coverage accuracy -In the case of hybrid open access journals represented in the Open APC datasets, 64 publishers provided licensing statements via the Crossref API, representing 33 % of all publishers studied. At the journal-level, 85 % of all hybrid open access journal titles covered by the Open APC initiative are associated with open content license statements valid without delay in Crossref. Figure 2 provides a breakdown of licensing metadata coverage per publisher. It highlights that leading commercial publishing houses provided open content license statements to Crossref for most of their journals listed by the Open APC initiative including Springer Nature, Elsevier BV and Wiley. SAGE Publications and many smaller sized publishers, however, did not share license metadata to Crossref. Consequently, their journals were not included in our study. +In the case of hybrid open access journals represented in the Open APC datasets, 68 publishers provided licensing statements via the Crossref API, representing 33 % of all publishers studied. At the journal-level, 84 % of all hybrid open access journal titles covered by the Open APC initiative are associated with open content license statements valid without delay in Crossref. Figure 2 provides a breakdown of licensing metadata coverage per publisher. It highlights that leading commercial publishing houses provided open content license statements to Crossref for most of their journals listed by the Open APC initiative including Springer Nature, Elsevier BV and Wiley. SAGE Publications and many smaller sized publishers, however, did not share license metadata to Crossref. Consequently, their journals were not included in our study. @@ -110,19 +110,19 @@ In the case of hybrid open access journals represented in the Open APC datasets, -47,340 out of 66,304 hybrid open access articles disclosed by the Open APC initiative provided open content license statements valid without delay, representing a percentage of 71 %. To assess the accuracy of our retrieval, we manually checked a random sample of 100 Open APC articles for which no license was found. We determined 93 articles that did not share license statements with Crossref using the license node. The other 7 articles did report an open content license, but with a delay (`delay-in-days` metadata field) above 0 days.[^1] +51,677 out of 73,292 hybrid open access articles disclosed by the Open APC initiative provided open content license statements valid without delay, representing a percentage of 71 %. To assess the accuracy of our retrieval, we manually checked a random sample of 100 Open APC articles for which no license was found. We determined 93 articles that did not share license statements with Crossref using the license node. The other 7 articles did report an open content license, but with a delay (`delay-in-days` metadata field) above 0 days.[^1] Drawing another random sample of 100 articles, we manually validated if we obtained the correct license statements including start date from the Crossref API. We also evaluated if all journal articles were original articles or reviews. 94 articles were characterized as original article or review, confirming previous studies [@Piwowar_2017]. Other document types were conference abstracts (N = 3), a medical guideline, a comment and a short report. -We were able to retrieve email addresses for 93 % of all articles in our dataset. Furthermore, recall and precision for obtaining and extracting email addresses from full-texts were investigated using a random sample of 200 articles. Recall asked if all author email addresses were found. Precision asked if the match of the first author email address was correct. While all first author email addresses were correctly parsed (precision = 1), 40 articles contained more than one email address from authors (recall = 0.8). In two cases, the corresponding author named two email addresses. +We were able to retrieve email addresses for 91 % of all articles in our dataset. Furthermore, recall and precision for obtaining and extracting email addresses from full-texts were investigated using a random sample of 200 articles. Recall asked if all author email addresses were found. Precision asked if the match of the first author email address was correct. While all first author email addresses were correctly parsed (precision = 1), 40 articles contained more than one email address from authors (recall = 0.8). In two cases, the corresponding author named two email addresses. ## Results -Using data from Open APC and Crossref, we found 226,942 open access articles published without delay in -4,635 subscription-based journals from 64 publishers between 2013 - 2019. Overall, these journals published 6,791,045 articles, resulting in a hybrid open access share of 3.3 %. +Using data from Open APC and Crossref, we found 250,087 open access articles published without delay in +4,970 subscription-based journals from 68 publishers between 2013 - 2019. Overall, these journals published 7,390,888 articles, resulting in a hybrid open access share of 3.4 %. In the following, this section presents the functionality of the [interactive dashboard](https://subugoe.shinyapps.io/hybridoa/), and highlights key findings from an exploratory data analysis. The dashboard itself is divided into three webpages, which are accessible through the navigation bar. The first webpage, ["Overview"](https://subugoe.shinyapps.io/hybridoa/#section-overview), summarizes the development of hybrid open access publishing. The analysis can be subsetted by a publisher or a journal. The second webpage, ["Compare"](https://subugoe.shinyapps.io/hybridoa/#section-compare), lets you analyze the uptake of hybrid open access across publisher and years. The third webpage, ["Institutional View"](https://subugoe.shinyapps.io/hybridoa/#section-institutional-view), is similar to the first webpage, but allows browsing by email domains instead of publishers and journals. @@ -134,7 +134,7 @@ Launching the app shows up the key findings, which can be broken down by publish -The upper part of the dashboard highlights the longitudinal development of hybrid open access publishing between 2013 - 2019. The first tab shows the relative uptake, while the second tab presents the hybrid open access article count on a yearly basis. Similar to the dashboard, Figure 3 illustrates the relative and absolute uptake since 2013. Bar charts are sub-grouped according to normalized CC license types. Overall, results indicate that the number and proportion of hybrid open access journal articles rose steadily from 2013 (9,972 articles, OA share: 1.2 %) to 2018 (53,602 articles, OA share: 4.8 %). CC-BY is the most prevalent open content license found. Around 65 % of open access articles were made available using this licence, followed by the less permissive license CC-BY-NC-ND, representing 25 % of the articles. +The upper part of the dashboard highlights the longitudinal development of hybrid open access publishing between 2013 - 2019. The first tab shows the relative uptake, while the second tab presents the hybrid open access article count on a yearly basis. Similar to the dashboard, Figure 3 illustrates the relative and absolute uptake since 2013. Bar charts are sub-grouped according to normalized CC license types. Overall, results indicate that the number and proportion of hybrid open access journal articles rose steadily from 2013 (10,153 articles, OA share: 1.1 %) to 2018 (55,146 articles, OA share: 4.8 %). CC-BY is the most prevalent open content license found. Around 65 % of open access articles were made available using this licence, followed by the less permissive license CC-BY-NC-ND, representing 25 % of the articles. @@ -143,9 +143,9 @@ The upper part of the dashboard highlights the longitudinal development of hybri #### Comparison with license information from Unpaywall -The third tab presents the article coverage of the dashboard, and compares it with the number of additional articles that were retrieved from Unpaywall for the same set of hybrid open access journals. Although Unpaywall determined hybrid open access articles using open content licenses from Crossref as well, the service furtermore searched publisher websites. In doing so, Unpaywall did not keep track when an article was made openly avaible. The extent of the discrepancies in this comparison indicates how comprehensively publishers reported license metadata to Crossref, and whether or not journals provided delayed open access along with hybrid open access. In total, 142,161 additional open access articles could be retrieved using Unpaywall. +The third tab presents the article coverage of the dashboard, and compares it with the number of additional articles that were retrieved from Unpaywall for the same set of hybrid open access journals. Although Unpaywall determined hybrid open access articles using open content licenses from Crossref as well, the service furtermore searched publisher websites. In doing so, Unpaywall did not keep track when an article was made openly avaible. The extent of the discrepancies in this comparison indicates how comprehensively publishers reported license metadata to Crossref, and whether or not journals provided delayed open access along with hybrid open access. In total, 144,027 additional open access articles could be retrieved using Unpaywall. -Similar to the dashboard presentation, Figure 4 compares the development of hybrid open access according to our data (blue area) with additional articles with open access license statements from Unpaywall (gray area). The figure provides a yearly breakdown for the two largest publishers in our sample, Elsevier and Springer Nature. Open access articles from the remaining 62 publishers were reduced to the residual category "Other". The sharp decline of articles found by Unpaywall for Elsevier journals suggests that delayed open access was offered along with options to make individual articles open access upon publication. Indeed, [Elsevier lists 130 journals](https://www.elsevier.com/about/open-science/open-access/open-archive) with embargo periods between six and 48 months. A prominent example is the life-science journal [Cell](https://www.cell.com/cell/archive) where all articles are made freely available after an embargo period of twelve months. While we found 238 hybrid open access articles published without delay, Unpaywall lists another 2,396 Cell articles with open content license, representing 55 % of the journal's article volume that had been published since 2013. On the other hand, the number of additional Springer Nature articles obtained using Unpaywall's license evidence is much lower, suggesting that delayed open access plays a smaller role in Springer Nature's hybrid open access journal portfolio, and that license statements were shared to a large extent with Crossref. +Similar to the dashboard presentation, Figure 4 compares the development of hybrid open access according to our data (blue area) with additional articles with open access license statements from Unpaywall (gray area). The figure provides a yearly breakdown for the two largest publishers in our sample, Elsevier and Springer Nature. Open access articles from the remaining 66 publishers were reduced to the residual category "Other". The sharp decline of articles found by Unpaywall for Elsevier journals suggests that delayed open access was offered along with options to make individual articles open access upon publication. Indeed, [Elsevier lists 130 journals](https://www.elsevier.com/about/open-science/open-access/open-archive) with embargo periods between six and 48 months. A prominent example is the life-science journal [Cell](https://www.cell.com/cell/archive) where all articles are made freely available after an embargo period of twelve months. While we found 248 hybrid open access articles published without delay, Unpaywall lists another 2,324 Cell articles with open content license, representing 52 % of the journal's article volume that had been published since 2013. On the other hand, the number of additional Springer Nature articles obtained using Unpaywall's license evidence is much lower, suggesting that delayed open access plays a smaller role in Springer Nature's hybrid open access journal portfolio, and that license statements were shared to a large extent with Crossref. @@ -158,9 +158,9 @@ Similar to the dashboard presentation, Figure 4 compares the development of hybr In the lower part of the dashboard page "Overview", the left-hand graph shows the extent to which information about open access funding was publicly available. On the right, an interactive chart presents the top-level and lower-level domain of the first or corresponding author's email address, respectively. -Similarly to the left hand graph, Figure 5 presents the availability of spending information for open access articles, highlighting the two largest publishers Elsevier BV and Springer Nature. Open access articles from the remaining 62 publishers were reduced to the the residual category "Other". Bars show the total number of hybrid open access articles per year and publisher. The bars are stacked according to the sources that disclosed the support of open access publication at the article-level (colored stacks), and where no such spending data was available for hybrid open access articles found (gray stacks labeled with NA). +Similarly to the left hand graph, Figure 5 presents the availability of spending information for open access articles, highlighting the two largest publishers Elsevier BV and Springer Nature. Open access articles from the remaining 66 publishers were reduced to the the residual category "Other". Bars show the total number of hybrid open access articles per year and publisher. The bars are stacked according to the sources that disclosed the support of open access publication at the article-level (colored stacks), and where no such spending data was available for hybrid open access articles found (gray stacks labeled with NA). -The figure reveals large differences how funders and libraries enabled hybrid open access publications across publishers. While individual payments for publication fees dominated the sponsorship of open access publications in Elsevier journals, and that of many other publishers, supported Springer Nature hybrid open access articles mostly come from transformative agreements (TA). The figure also highlights that these agreements contributed to the growth of hybrid open access articles published in Springer Nature journals. Moreover, Figure 5 suggests that transformative agreements contribute to transparency: Whereas spending for 36 % of hybrid open access articles published in Springer Nature journals was disclosed with the Open APC initiative, the origin of expenditure for hybrid open access articles in Elsevier journals was available to a lesser extent (13 %). The largest match between publication and spending data, however, could be found for hybrid open access journals sponsored by the SCOAP$^3$ consortium: the SCOAP$^3$ repository tracked 94 % of hybrid open access articles published in SCOAP$^3$ sponsored journals. It must be noted, however, that SCOAP$^3$ supports high-energy physics content in these journals only. The remaining 6.2 % were therefore likely published on related topics. In total, spending information for 23 % of articles was found. +The figure reveals large differences how funders and libraries enabled hybrid open access publications across publishers. While individual payments for publication fees dominated the sponsorship of open access publications in Elsevier journals, and that of many other publishers, supported Springer Nature hybrid open access articles mostly come from transformative agreements (TA). The figure also highlights that these agreements contributed to the growth of hybrid open access articles published in Springer Nature journals. Moreover, Figure 5 suggests that transformative agreements contribute to transparency: Whereas spending for 34 % of hybrid open access articles published in Springer Nature journals was disclosed with the Open APC initiative, the origin of expenditure for hybrid open access articles in Elsevier journals was available to a lesser extent (16 %). The largest match between publication and spending data, however, could be found for hybrid open access journals sponsored by the SCOAP$^3$ consortium: the SCOAP$^3$ repository tracked 94 % of hybrid open access articles published in SCOAP$^3$ sponsored journals. It must be noted, however, that SCOAP$^3$ supports high-energy physics content in these journals only. The remaining 6.3 % were therefore likely published on related topics. In total, spending information for 23 % of articles was found. @@ -170,7 +170,7 @@ The figure reveals large differences how funders and libraries enabled hybrid op The interactive chart on the lower right of the first dashboard page presents email domains extracted from open access full-texts. These domains roughly indicate the affiliation of the first or of the corresponding authors, respectively, a data point used to delineate open access funding. In the dashboard, a hierarchical, interactive treemap visualizes the distribution of the email domains. Each top-level domain can be subdivided further into domain names representing academic institutions or companies. The size of each rectangle is proportional to the number of hybrid open access articles corresponding to this domain. -Figure 6 presents a breakdown by email domain suffix. In total, 211,506 email addresses were retrieved and parsed, representing 93 % of all articles in our dataset. 44,433 articles were associated with academic institutions in the UK ("ac.uk"), followed by domains from commercial organizations, mostly email providers like gmail.com or the Chinese 163.com and 126.com, and US-American institutions of higher education ("edu"). The figure illustrates that European institutions from the Netherlands ("nl"), Germany ("de"), Sweden ("se"), Austria ("ac.at") and Poland ("edu.pl") were among the top 10. In total, 445 top-level domains were retrieved. +Figure 6 presents a breakdown by email domain suffix. In total, 228,433 email addresses were retrieved and parsed, representing 91 % of all articles in our dataset. 46,865 articles were associated with academic institutions in the UK ("ac.uk"), followed by domains from commercial organizations, mostly email providers like gmail.com or the Chinese 163.com and 126.com, and US-American institutions of higher education ("edu"). The figure illustrates that European institutions from the Netherlands ("nl"), Germany ("de"), Sweden ("se"), Austria ("ac.at") and Poland ("edu.pl") were among the top 10. In total, 451 top-level domains were retrieved. @@ -181,7 +181,7 @@ Figure 6 presents a breakdown by email domain suffix. In total, 211,506 email ad The second dashboard page, "Compare", allows to explore how hybrid open access publishing varies across publishers. It focuses on market shares, and uptake levels of hybrid open access across journals and publishers. A table at the bottom of the webpage presents key indicators at the journal-level, which can be searched and downloaded as Excel spreadsheet. -Figure 7, which is also shown at the top of the "Overview" dashboard page, presents the top three publishers in terms of the number of hybrid open access articles published since 2013. Grey bars represent the total number of hybrid open access articles, colored bars the number of hybrid open access articles per publisher. These top three publishers -- Elsevier BV, Springer Nature and Wiley -- accounted for the largest proportion of open access articles (78 %), and that of hybrid open access journals (74 %). +Figure 7, which is also shown at the top of the "Overview" dashboard page, presents the top three publishers in terms of the number of hybrid open access articles published since 2013. Grey bars represent the total number of hybrid open access articles, colored bars the number of hybrid open access articles per publisher. These top three publishers -- Elsevier BV, Springer Nature and Wiley -- accounted for the largest proportion of open access articles (78 %), and that of hybrid open access journals (72 %). @@ -191,7 +191,7 @@ Figure 7, which is also shown at the top of the "Overview" dashboard page, prese -As shown in Figure 8, and in the mid-left panel in the dashboard's "Compare" page, numbers and proportion of hybrid open access journal articles vary across publishers and journals. In the two-years period 2017-2018, for example, the mean open access proportion per Springer Nature journal was 12 % (SD = 9.9 %), whereas the mean open access proportion per journal published by Elsevier BV was 4.3 % (SD = 5 %). The publisher Wiley performed between these two: The mean proportion was 6 % (SD = 5.3 %). +As shown in Figure 8, and in the mid-left panel in the dashboard's "Compare" page, numbers and proportion of hybrid open access journal articles vary across publishers and journals. In the two-years period 2017-2018, for example, the mean open access proportion per Springer Nature journal was 12 % (SD = 10 %), whereas the mean open access proportion per journal published by Elsevier BV was 4.2 % (SD = 4.9 %). The publisher Wiley performed between these two: The mean proportion was 5.9 % (SD = 5.2 %).