© FR Design/stock.adobe.com

FAQ & Glossary

Learn more about various topics in the field of research data in our FAQs. You can find explanations for key terms in our glossary.

FAQs

General questions:

Who or what are RDS?

Research Data Services (RDS) are a collaborative working group made up of staff from the University Library and IT.SERVICES at RUB. We are responsible for the evaluation and establishment of both a sustainable infrastructure and service for research data management.

We provide support in research data management and data handling for researchers at all career stages and in every stage of their research projects. We offer various services and tools for this purpose, including training sessions, consultations, and our research data repository.

On this page, you can find our contact information and get to know our team:

What is RDM?

RDM stands for research data management. Research data management encompasses measures for organizing, documenting, and preserving all data used or generated in a research process. RDM is applicable to all disciplines and involves various types of research data defined by the disciplinary context. It ensures access, reuse, reproducibility, and the quality of all research data that forms the basis of scientific results.

For more information on research data management, you can refer to our glossary.

What is research data?

Research data encompasses all data generated, developed, or analysed in scientific work. They are defined by the disciplinary context and vary from one field to another. Examples of research data include measurement data, laboratory values, audiovisual information, texts, survey data, objects from collections, or samples. The task of research data management is to systematically handle the collected data.

For more information on research data, please refer to the entry for research data in our glossary and to the following sources:

What is FAIR Data?

The FAIR principles are designed to ensure sustainable research data management by preparing and storing data and associated metadata in a way that allows others to reuse them. FAIR data is, therefore, findable, accessible, interoperable, and reusable.

For more information on the FAIR principles and FAIR Data, refer to the glossary under the entry "FAIR Data".

What does DMP mean?

DMP stands for data management plan. A data management plan accompanies the entire research process and supports it not only in the planning and proposal stages but also regulates processes such as archiving and data publication.

A detailed explanation of the DMP can be found in glossary under the entry "DMP".

Practical Questions

What do I need for a project proposal?

The specific requirements for the content and structure of a proposal vary significantly depending on the funding agency, format, and discipline. Many funding agencies expect information on the handling of research data to be included in the proposal. For instance, a data management plan may be required.
A useful reference for the information to be included in proposals regarding research data management is the DFG Checklist. We provide a Completion guide for this checklist.

On our page "Projects and Proposals"you can find additional information on grant applications. If you have any questions on this topic, please feel free to contact us. We are happy to assist you.

I need to provide information on handling of research data in a project proposal. What do I need for this?

Is it a DFG proposal? Check the DFG Checklist for Handling of Research Data and our associated completion guide.

Many other funding agencies also provide similar guidelines. Information on the handling of research data includes management of data during the project stage (storage, documentation, responsibilities, etc.) as well as the publication, reuse, and archiving of the data. A data management plan includes details on these aspects of research data management and should be created during the planning process before the start of any research project.

For creating a data management plan, you can use RDMO tool. If you have further questions regarding data management plans and funding applications for research data, we are here to provide guidance.

Where can I store my data?

Long-term preservation of research data is an important part of research data management. The RUB offers various services for storing research data, such as our repository, Sciebo, the Fileservice/Network Drive, and the Backup Service. In a backup, all files are secured for emergencies, but only for a relatively short period of time. Archiving is suitable for the longer-term preservation of data.
If you have further questions regarding the storage and archiving of research data, we are happy to provide guidance.

How can I organize my data effectively?

A consistent naming and structure of research data makes it easier to keep track of data collections and experiments, and ensures that there is no accidental use of different versions of a dataset.

Various methods can be used for structuring data:

You can employ flat folder hierarchies (up to 3-4 levels), use descriptive names, and ensure clarity in the terms used. Maintaining file extensions (e.g., csv, tiff) is important. Additionally, the naming system and structure should be documented in a readme.txt file. If you regularly edit or supplement your data, versioning is recommended. Adequate methods include proper file naming or the use of version control software like Git. An example of versioning through naming is the three-level versioning: Major.Minor.Revision (e.g., 1.0.0).

Research Data Services offer training on using Git in the research process. If you have further questions, feel free to contact us.

Which metadata should I use?

Metadata is data used to describe research data. This form of documentation has the advantage of making data discoverable and comprehensible. The choice of metadata to be assigned during the research process depends on the subject area, research project, and funding source.

However, your metadata should at least answer the following questions:

  • Where does the data come from? What data collection method was used?
  • Which measuring instruments were used, which sources, which literature?
  • Who conducted the surveys?
  • Which data formats and units were used?
  • When and where was the data collected?
  • It is recommended to use clear and meaningful names for the description and variables. A good starting point to fairsharing.org. If there are directories with metadata for your discipline, use them. If you cannot find a suitable schema, are unsure, or have further questions, we are happy to provide guidance.
    Source: https://www.fdm.uni-hamburg.de/en/fdm/metadaten.html

Our services

How can I get a consultation?

Feel free to contact us ! Together, we can find a suitable consultation appointment for you.
Each research community and project is unique, with its own specific needs. Funders also have individual expectations and may require information on handling of research data already during the application process. For this reason, personalized advice in the field of research data management (RDM) is particularly important.

We can assist you with various topics, including:

  • Data management plans
  • Proposal submission/external funding applications
  • Use of tools and methods
  • Data documentation/metadata
  • Publication and archiving (also in non-RUB infrastructures)
  • RDM policies and responsibilities
  • Guidance on legal issues (no legal advice!)
How can I get training?

Here you can find the schedule for our upcoming events and training sessions on research data management. If you wish to have a tailored training session on one or more RDM topics for your research project, please feel free to contact us. Additionally, we offer an introduction to various data management topics through a Moodle course (German).

Glossary

Archiving

Long-term archiving (LTA) is intended to ensure the long-term usability of data over an undefined period. However, in many academic disciplines, a ten-year retention period der Forschungsdaten als Standard etabliert. Weil dieser Zeitraum von ständigem technischen und soziokulturellen Wandel geprägt ist, bedarf es einer regelmäßigen Überprüfung der Daten im Hinblick auf die Erhaltung ihrer Nutzbarkeit.

LTA aims to preserve the:

  • authenticity
  • integrity
  • accessibility
  • comprehensibility of data. This includes both the provision of technical infrastructure and organizational measures, as well as the establishment of workflows and standards (legal issues, quality assurance).
  • purely physical storage (bitstream preservation) is a strategy for preserving data in the state at the time of delivery (ingest). However, due to technological changes, data carriers, file formats, software, and storage locations quickly become inaccessible and unusable.
Backup

To prevent data loss, it is advisable to regularly create backups, preferably at a predetermined time.

  • Backups should be stored on a different medium and kept separate from the original data. This is especially important when the original data is stored in an external cloud environment.
  • A common rule of thumb to follow is the 3-2-1 rule. It states that all data should be backed up in triplicate on at least two different storage media. One of these copies should be stored offsite.
  • Additionally, it is recommended to include not only the data but also necessary software applications in the backup strategy.

Source:

RUB, in collaboration with IT-Services, provides a backup service . Additional offerings, such as storage infrastructure and organizational tools for research data, are currently under development.
For more information on data security, please visit:· 


Creative Commons licences

In order to ensure maximum reusability of scientific research data, which may in principle be subject to copyright, the granting of additional rights of use may be considered, e.g. by licensing the data accordingly. The use of liberal licensing models, in particular the globally recognised Creative Commons (CC) licences, is one way of defining conditions for the subsequent use of published research data in a comprehensible manner.

Source:  https://forschungsdaten.info/praxis-kompakt/english-pages/glossary/#c403985

File formats

The file format (sometimes also referred to as file type) is generated during the storage of a file and includes information about the structure of the data within the file, its purpose, and affiliation. Using the information available in the file format, application programs can interpret the data and make the contents accessible. Typically, the format of a file can be identified by its corresponding extension appended to the actual file name, consisting of a dot and two to four letters.

Most file formats are designed for specific purposes and can be grouped based on certain criteria:

  • Executable files
  • System files
  • Library files
  • User files: image files (vector graphics [SVG, ...], raster graphics [JPG, PNG, ...]), text files, video files, etc.

With file formats, a further distinction is made between proprietary and open formats. Proprietary formats allow files to be opened, edited, and saved only with the corresponding application, utility, or system programs (e.g., .doc/.docx, .xls/.xlsx). Open formats, on the other hand, (e.g., .html, .jpg, .mp3, .gif) allow files to be opened and edited with software from various manufacturers.
File formats can actively be changed through conversion during the saving process, but this may result in data loss. In the scientific domain, attention should be paid to compatibility, suitability for long-term archiving, and lossless conversion to alternative formats.

For more information, please visit: forschungsdaten.info (German).

Source:

Digital Object Identifier (DOI)

A digital object identifier (DOI) is a permanently valid identifier that uniquely identifies digital objects, allowing them to be referenced. DOIs are particularly useful for citing, for example, articles or datasets published in a repository. They remain constant over the entire lifetime of the designated object.

  • The DOI system is managed by the International DOI Foundation. Another well-known system for persistent identification is the Uniform Resource Name (URN).

  • In general, a DOI consists of a prefix indicating the institution that assigned the DOI and a suffix separated by a "/" indicating the object itself (e.g. DOI 10.24352/UB. OVGU-2019-098).s
data management plan (DMP)

A data management plan (DMP) structures the handling of research data in a scientific project. It describes how to deal with the data used during and after the end of the project. Many third-party funding institutions (DFG, FWF, SNF, Horizon Europe, Volkswagen Foundation) expect information on the handling of research data to be included as part of a funding proposal for the allocation of funds from certain funding lines. A formal DMP is only required in the rarest of cases, especially by the EU. Nevertheless, a DMP is useful for working on a research project. In particular, the current status and special features can be recorded in a DMP throughout the entire research data life cycle. It is therefore helpful for administration and for maintaining an overview.

Source: https://forschungsdaten.info/themen/informieren-und-planen/datenmanagementplan/

The Research Data Management Organizer (RDMO) is a tool for research data management and supports you in the creation of data management plans.

FAIR data

FAIR stands for Findable , Accessible , Interoperable , Reusable The FAIR principles are designed to ensure sustainable research data management by preparing and storing data and associated metadata in a way that allows third parties to reuse them. The principles apply to both data storage itself and to infrastructures and services, aiming to make research more transparent and efficient.

Key to implementation is the provision of comprehensive metadata, persistent identifiers, and clear usage licenses, ensuring that the data is well-prepared for both human and machine use. Additional information and tips on implementation see here (in German).


research data
  • Research data encompasses all data generated, developed, or analysed in the course of scientific work.
  • This includes, but is not limited to, measurement data, laboratory values, audiovisual information, texts, survey data, objects from collections, or samples for research data. Research data encompasses raw data as well as data in various stages of processing, up to publishable final products.
  • Documentation of data collection and processing in a research project is also part of research data.

This results in a discipline- and project-specific understanding of research data, with varying requirements for data preparation, processing, and management—referred to as research data management.
For more detailed information, please refer to Forschungsdaten.info and Digitale Zukunft

Source:


RDM

Research data management involves a range of measures for organizing, documenting, and storing all data used or generated in a research process. Structured measures can be taken at all stages of the data lifecycle to maintain the scientific validity of research data, preserve their accessibility for third-party analysis, and secure the chain of evidence. Research data management is applicable to all disciplines and encompasses various types of research data defined by the disciplinary context.

In addition to increasing the visibility of one's own data and associated research, research data management enables:

  • Improved data quality and preparation
  • Easier reuse by researchers or others
  • Contextualization – through linking different datasets in new, previously unknown contexts

Furthermore, sustainable research data management ensures compliance with requirements and standards from different disciplines, research funding, publishing bodies, and research ethics guidelines.
Source:


Licenses

It is important to find a license that is appropriate for the type of material being published. The requirement to properly attribute authors when reusing an article, poem, or essay is deeply embedded in the norms of scholarly practice and serves as a means for users to appreciate and understand, in context, which parts of a work are original.
However, with data, there are often very good reasons to waive the obligation of attribution. Several prominent data portals for cultural heritage, such as Europeana, only accept data that is made available under the Creative Commons Zero license (CC0). The better a work's metadata can be combined with other data (linked open data), the more useful it is. Therefore, it is advisable to use the CC0 license for metadata, as otherwise, among other reasons, the chain of attributions can become very long.
It is crucial to consider licensing early in each step of the scholarly process.

The following points should be taken into account:

  • Integrate the licensing of your research data into the publication processes or guidelines of your institution.
  • In case of generating research data in a collaborative project, it should be determined in the project proposal under which license the data will be published. The German Research Foundation (DFG) explicitly recommends using CC-BY-SA for published texts in open access and CC0 for metadata.
  • For complete licensing, always provide the following information: the name of the rights holder, year of publication, and the license type.
  • Preference should be given to the use of open standard licenses.
  • Ensure that you have the rights to all the data you intend to publish.
  • Decide whether you want to allow commercial use of your data.
  • With Creative Commons licenses, it should be noted that they are non-exclusive, i.e. content can also be licensed under other licenses in addition to the CC license. However, it is advisable to avoid this to prevent legal conflicts.
  • Consider that different parts of your data collection may be subject to different licenses. Therefore, choose a separate license for metadata, controlled vocabularies, or digital objects/contents (images, full texts, audio contributions, videos, etc.) as well as databases and data from third parties.
  • Check current legal handouts for the humanities
    (German) and discuss your questions with interested colleagues if possible.

Source: https://forschungslizenzen.de/#lizenzen

Metadata

Metadata refers to all additional information that is necessary or useful for interpreting the actual data, such as research data and enables the (automatic) processing of research data by technical systems. Metadata is often referred to as 'data about data' .

It serves to categorize and characterize different information about digital objects:

  • Technical metadata, for example, includes details about data volume and data format and is crucial for long-term data storage .
  • Descriptive metadata (also known as content metadata) provides information about (e.g. scientific) data within digital objects, influencing their findability, referencing, and reusability.

Descriptive explanations, such as through an abstract, stored alongside (research) data, are also valuable. This includes indications of usage rights, equipment used, applied standards, especially when no associated publication is available.
Source: https://www.forschungsdaten.org/index.php/Metadaten

NFDI

The aim of the national research data infrastructure (NFDI) is to systematically manage scientific and research data, provide long-term data storage, backup and accessibility, and network the data both nationally and internationally.  The NFDI will bring multiple stakeholders together in a coordinated network of consortia tasked with providing science-driven data services to research communities.“
Source: https://www.dfg.de/en/research-funding/funding-initiative/nfdi

Open Data

Open Data refers to data that can be used, disseminated, and reused by third parties for any purpose, including information, analysis, or even further economic use. Restrictions of use are only allowed to preserve the origin and openness of knowledge; for example, CC-BY may be used to ensure attribution to the creator. The concept of open data is based on the idea that allowing free reuse promotes greater transparency and encourages collaboration.
More information about Open Data can be found here.
Source:


Open Science

Open Science is a frequently used, but rarely specified term. It describes the approach of making transparent not only the results of scientific work but the entire process. The results of publicly funded research should, whenever possible, be made available worldwide for free, without legal or technical barriers on the internet, and made reusable. The easier research results can be found and accessed, the better they can serve as a foundation for further research activities. Under the umbrella term Open Science, strategies and practices are encompassed that describe the transformation in research methodology, organizational and content-related aspects of teaching, publishing, information and literature supply, as well as the preservation of research data.
This includes the following subpoints:

  • Open Access, the free access to scientific publications
  • Open Data, the open and sustainable access to all research data
  • Citizen Science, the involvement of individuals from the public in the entire process of knowledge acquisition.

The open access to knowledge and science, including publications, data, and software, aims to achieve greater transparency, efficiency, visibility, and, consequently, an improvement in quality and an increase in trust in science and research. The establishment of Citizen Science projects at the university promotes a participatory research approach.

Source: TU Berlin

ORCID

The Open Research and Contributor (ORCID)-ID is an internationally recognised persistent identifier that can be used to uniquely identify researchers. The ID is publisher-independent and can be used permanently by researchers for their scientific output, regardless of their affiliated institution. It consists of 16 digits, which are represented in four blocks of four (e.g. 0000-0002-2792-2625). The ORCID-ID is established as an ID at numerous publishers, universities and science-related institutions and is integrated into the workflow, e.g. when reviewing journal articles.

Source: https://forschungsdaten.info/praxis-kompakt/english-pages/glossary/#c403990

Persistent Identifier (PID)

To be able to find, identify, and cite electronically published data of any kind reliably and permanently, you need a persistent identifier. A persistent identifier is a unique label for a specific digital object and remains constant, even if the name or location of a publication changes.
Examples of persistent identifiers include URN (Uniform Resource Name), DOI (Digital Object Identifier), ORCID, Researcher ID Thomson Reuters, Scopus Author ID, GND-Nummer (Gemeinsame Normdatei-Nummer), Google Scholar Citations Profi, ISNI (International Standard Name Identifier), ISBN and ISSN.
Digital Object Identifier (DOIs) enable online publications to be citable.
In contrast to the more transient URL address, the DOI serves as a persistent identifier. The university library assigns DOIs to digital objects (e.g., research materials, articles, digitized content, images) belonging to university members.
Source: Uni Bamberg

RDMO

RDMO is a tool for creating data management plans. The aim of RDMO is to plan, control and document the handling of data in your scientific project in a structured way. In addition, the collected information can be provided in the form of a report or data management plan. RDMO thus simplifies the submission of applications to research funding organisations such as the EU, DFG and BMBF.
If you have any questions or problems with RDMO UA Ruhr, please refer to the instructions or contact us directly.

Repository

A repository is a storage location for digital objects. In addition to repositories for software and text documents, there are also repositories for research data specifically. These repositories are used for publishing and usually for archiving data as well. Most data repositories collect metadata in a searchable database and offer the option of generating a permanent identifier (e.g. a DOI) and issuing a license when uploading a file. Repositories are either public or accessible to a restricted group of users only. Read more

Source:


Versioning

Versioning allows tracking changes and, if necessary, reverting them. This is particularly advantageous when files are regularly edited or updated. Recommended methods for versioning include proper file naming conventions or the use of version control software such as Git. An example of versioning through naming is three-tier versioning: Major.Minor.Revision (e.g., 1.0.0). Versioning should be applied both during the research process itself, for marking different working versions of data, and after subsequent modifications to already published research datasets. This allows users to cite the correct version of a research dataset.

Source:


en_GBEnglish