-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Credit: Hacking the credit and citation ecosystem (making it work, or work better, for software) #51
Comments
These are just some brainstorming thoughts—where, by definition, no ideas are bad...—but one way to hack the existing citation ecosystem would be to introduce a hierarchy (or >1 level, at least) of citations. Right now, all citations in a paper are treated equally, but clearly certain references (such as those for software) contribute more. For example, if a study relied on a particular CFD code to obtain all of their results, arguably the study wouldn't exist without that software and thus its creators should get more credit than a standard citation. As it is, that sort of "vital" citation isn't recognized any differently than any other. So, perhaps one way to make the current system work better for software would be to introduce a new "substantial" or "significant" citation category to indicate the greater importance/dependence on products such as software and data. Then, looking at the counts of these sorts of citations would help others see the associated contribution to the field/science better than existing citation metrics. Also, this could tie in to the proposed transitive credit schemes. Of course, this may not make much of a difference for folks on the extremes of the spectrum... software or data that others aren't using won't see any changes, and highly cited software packages already get a good deal of credit with large numbers of citations. |
There's some overlap between this idea and both my transitive credit idea (http://arxiv.org/abs/1407.5117) and the Project Credit work (http://projectcredit.net), though it's not the same as either. In transitive credit, contriponents are given a weight, and in project credit, there has been discussion of contributions being at one of two or three levels, though this is just for contributors, not citations. |
I wasn't aware of Project Credit—that's interesting, thanks. Unless I'm mistaken, it looks like that is primarily (if not entirely) focused on assigning credit to people serving in various roles on a publication. However, now that I take another look at your transitive credit ideas, I do see that both people and products (software, data, etc.) are assigned credit percentages. (This may be slightly off-topic for this particular discussion, but I do wonder about how someone writing a paper would divide the credit between their efforts and a software package they used.) Certainly, transitive credit would give a more quantifiable measure of credit compared to my idea. |
none of these ideas are isolated, and none are completely satisfactory, at least to me. But I would like to push transitive credit further, and I think it could be merged with (or overlaid on) project credit. On the other hand, as you say, there are some questions about the details. |
Running with transitive credit for now, it seems like it should be possible to take existing citation relationships (directed graphs, really) and the associated ecosystem, and apply the credit map for each paper. However, I suppose that would require applying some credit percentage to each citation in a paper... unless the existing citation system remained complementary to the credit system. |
right, and weights could just be autocalculated as even to start (10 citations -> each gets a 0.1 weight)... Ok, authors also get even weights... Maybe authors get 0.5 divided evenly, and citations get 0.5 divided evenly. There probably would have to be defaults like this to start with in any case, even if the submitter was making changes later. |
Yeah, that essentially overlaps with my original suggestion, but in a more quantifiable way and allowing as many hierarchies in citations as you want. I agree that this could be overlaid on project credit, where the different role classifications could relate to default (different) credit weights. Defaults are a good idea, although I imagine there would be some pushback on splitting the credit 50-50 between authors and citations—but of course an author could change as needed. |
I don't know what the right default is - 50/50 is just the default default :) |
Credit for authors and citations are really two different things. The first is credit for that particular paper, the latter for previous work, and everybody understands it that way I think. I would leave it at that. Comparing the two isn't really possible, and trying to not a good idea imho. Otherwise it is too easy to give the authors 100%, just to boost their credit. |
I don't really agree. If you think about what should be given credit for a new product (whether a paper, software, data, etc.), it's both the people who directly contributed to it as well as all the other products that were needed to make it work. In my version of what we should try, the person who registers the new product should weight the credit for both the people and products. Of course, all contributors should agree. |
I agree that it could be difficult to compare the contributions of, for example, the authors or the software they used to perform a study. I suppose that could lead to bigger philosophical questions about the capabilities of a tool vs. the novelty of what is done with it. However, as it is, for papers the current citation system severely underrepresents the importance of software in particular by only giving them a citation at the same level/weighting as any other citation in the paper. I think many would agree that isn't fair for studies that relied heavily on software that someone else developed—going back to my CFD example, the study wouldn't be possible without the software developed by others, so they should get some credit for their contribution (which arguably could be on the same level as an individual author). |
I'm not seeing how software is different in this regard from prior science without which a study wouldn't have been possible either. Or experimental protocols. Are we suggesting to ask authors to draw some kind of line between the science and materials without which a study wouldn't have been possible, and those it could have done without? Isn't that pretending that scientific advance can be viewed as following a tree of derivation, rather than being the result of an interwoven network? |
Well, I think there is a difference between building on past work, and directly using someone else's software. The analogy on the experimental side of things would be to use someone else's experimental equipment to do your study—not to build your own setup based on what they described. In such a case, at least in the papers I've seen, typically the owners of the equipment being used show up as authors of the paper! So, they are clearly getting more credit than a simple citation, which is what creators of software being directly utilized are getting right now. I agree that if you develop your own software (or experiment) based on past work then a citation is appropriate—the issue is when you are using something directly created by someone else. |
Re: experimental equipment use implies co-authorship: I do not think this is universal but rather varies by field and often within a field. In a wider discussion of credit (transitive or otherwise) one might hope that in addition to plowing new ground we could also help standardize existing practice. As an equipment provider should I expect acknowledgement, citation, or co-authorship? As an instrument user, I should clearly know what is expected of me. If we can communicate clearly then it is probably a win. |
Yeah, that is certainly based only on my experience (and might not even be consistent in my field). I definitely agree that what we're discussing could equally apply to equipment use as software use, if it's some unique experimental setup... I imagine you'd have to draw the line somewhere, for example, at standard, easily available equipment or software tools (e.g., compilers). |
@kyleniemeyer, I agree it's unique capability that is important. To me, it can be refined by saying that unique capability needed to reproduce the results -- not just for convenience, or personal preference -- should get a gold star. In the future when more publications are living recomputable records, perhaps the act of swapping certain components and not others in order to reproduce the results will make this less of a subjective call and more a directly measurable attribute. |
The main thing from my point of view is that scientific software is not being sufficiently cited. Can we create a software citation website, similar to say ResearchGate but focused on scientific software, that is capable of automatically parsing existing papers and software files for mention of software and generate the bibtex, JSON-LD entries, and corresponding statistics of citation or credits (e.g., software-specific citations, h-index, i10-index)? In such a process we can build an expanding archive of scientific software entries over time and perhaps foster a culture of crediting important software in scientific/engineering communities. |
@sctchoi I would start with quantifying that assertion based on some data. How much is software cited now? How does that vary by community? What is a working definition of "sufficient"? This kind of study could also look the "dark matter" of publications that don't cite or mention software they use (which I hope would be corrected in the review process, but I would like to see proof). |
I agree this would need some supporting data. Those may also show that the citation behavior is probably highly uneven across fields. For example, in biology, the by far most cited papers in almost all its fields are about a scientific software. (This still doesn't have to mean that all scientific software in biology is sufficiently cited. But where software is under-cited is far from as obvious as your assertion makes it sound.) |
Hi all, Two things to add here. First, you'll find some empirical data on what Howison, J., & Bullard, J. (in press). Software in the scientific Second, the data set is available from the JASIST paper (the first paper https://github.com/jameshowison/softcite/blob/master/data/SoftwareCitationDataset.ttl --James btw, I'd love to extend this and find out what software was used but not Howison, J., & Herbsleb, J. D. (2011). Scientific software production: On Mon, Sep 28, 2015 at 8:17 AM, Hilmar Lapp [email protected]
|
google doc for notes at WSSSPE3 - https://docs.google.com/document/d/1oN0ZYqIoWtOE1LBMIlWY9N8nn5LHTncj8GjUKPh62pA/edit?usp=sharing |
Note that the plan is to merge this group with the FORCE11 Software Citation Working Group. |
No description provided.
The text was updated successfully, but these errors were encountered: