Skip to content
Amin Alam edited this page Feb 27, 2023 · 2 revisions

Welcome to the NueroDUI wiki!

Overview

The publication of the FAIR principles (Wilkinson et al, 2016) shed necessary light on how good data management can lead to a dataset being recycled. In science there has been a subtle ‘throw-away’ culture regarding data. A scientist generates data to answer ‘their’ question, publishes this data at a respectable journal, and promptly forgets about it (in a lot of cases). When a scientist receives a request for this data they don’t always provide it (Tedersoo et al, 2021; Gabelica et al, 2022; Watson, 2022). Therefore, large swaths of data are not Findable, Accessible, Interoperable or Reusable (FAIR). There is a reason for this. While many publications have determined that scientific data is not FAIR, it is also important to note that very little actionable guidance has been provided to researchers on how to actually make their data FAIR. This lack of guidance is compounded by the fact that science is composed of very specific areas producing vastly differing types of data. Therefore, there is no one-size-fits-all means of making data FAIR. In order to take steps toward FAIR data, field-specific audits must be conducted to determine how the produced data will be managed. Presented here is a software developed for scientists working in the field of neuroscience. This software was designed to provide a solution to local data management issues in a neuroscience laboratory. Furthermore, the software itself has been designed based on the input of neuroscientists so it captures components of data management specific to this field. Varying methodologies converge in neuroscience, thus the data produced is varied. We wanted to create a programme that could efficiently and systematically manage all of the diverse datasets produced as a result of working at the interface of various methods. Ultimately, our vision was to create a software which could efficiently store and search data, whilst also incorporating annotation features. A database which accommodates data management.

What is data management in neuroscience

The first step in actualising our vision involved clearly conceptualising what data management is in a neuroscience context. Based on this need we audited the lab from a data perspective in order to determine the typical data lifecycle from various perspectives. This work resulted in the development of a neuroscience-specific data lifecycle (See below). Essentially, we constructed a general map that charts the course of data from its inception to end-point. Based upon this we then broke the map into different stages. Each stage was then characterised by components deemed necessary for adequate data management. These components subsequently formed the basis of what our software would do. Particularly our programme targets stage 1 and 2 specifically. The following is a break-down of each stage.

Stage 1: contextualize raw datasets

Data management throughout stage 1 is characterised by the necessity to contextualize raw data. When a raw dataset is produced it is very important to capture information pertaining to how the dataset was generated. This information can be called metadata. In a nutshell, metadata is data about data (Musen et al, 2015). With metadata in mind, it became important for our software to provide an efficient means of generating contextual information for raw datasets. Two main components were identified as important for contextualize raw data in our field, these were protocols and conditions. Protocols broadly capture the steps consistently followed in the laboratory to generate a dataset, while conditions refer to aspects of the protocol which may vary experiment to experiment. The goal of stage 1 data management is generating an established link between the raw dataset, the protocol used to generate the dataset, and also the specific conditions altered in order to generate novel data. Our software aids researchers in achieving this. For example, the software has a built in hub where all of the labs protocols are stored and ready to download for editing. These protocols can be downloaded ad lib and attached to raw datasets, thus providing context. These protocols can also be regularly updated as they evolve in real time. When a researcher uses the software to input a dataset, they can decide whether they want to attach a protocol as an appendix. This appending function can be used for any ancillary information. What is achieved through this function is an ability to link various forms of contextualize data to the raw dataset. This allows for future interpretation of the dataset without necessarily having to consult the researcher who generated the data, ultimately saving time.

Further providing context to raw datasets, and made easily possible via our software is the addition of conditional information. Within our program exists a condition template where researchers inputting raw data can easily highlight which conditions were applied to the dataset. The potential condition information that can be supplied by a researcher through the manager is systematic and specific to the particular experiment being conducted. This particular neuroscience-specific information has been added based on direct consultation with neuroscience researchers implementing varied methodologies. As a result, the condition templates reflects the varying potential conditions a neuroscience researcher may need to apply in order to contextulise raw datasets. Similar to the protocol hub, the condition information can be easily updated as the conditions utilised in laboratory settings evolve. Overall, providing further conditional information allows for further contextualising of raw datasets.

Stage 2: concatenate raw datasets

A researcher conducts various experiments across time to generate raw datasets which can be leveraged to answer a research question. What follows is analysis of these datasets. Preceding data analysis is a process of joining many raw datasets together in order to form a more comprehensive dataset capable of providing reliable and valid information. Our software can track this intermediary process of raw data concatenation. Any raw dataset inputted into the software will be assigned a randomly generated identifier (#ID). This identifier is completely unique to the dataset. Researchers will generate numerous #IDs, which will all represent different raw datasets. Within the software the researcher can then ‘concatenate’ these raw datasets together. This allows a laboratory to keep track of which raw datasets have been joined together for subsequent analysis. Keeping tabs on this information is important when viewed through the lens of publication. It is becoming standard operating procedure for journals to request the raw datasets which underpin a manuscripts analysis output (Miyakawa, 2020). Having all of this information tracked and categorized makes it very easy for researchers to work backwards and quickly obtain information (such as location or metadata) needed to progress with publication. Our software provides this ability to researchers in a neuroscience context.