RDM Glossary

NOTE: For a generalized and comprehensive research data management glossary visit the Research Data Management Glossary a newly developed project between CODATA, the Committee on Data of the International Council for Science, and CASRAI, the Consortia Advancing Standards in Research Administration Information.

Cloud Services

A method of storing and sharing data by keeping it on remote servers accessed from the Internet. Cloud services are maintained, operated and managed by a cloud service provider on storage servers. Cloud services can be public or private. Public cloud services include Google Drive, DropBox, iCloud and OneDrive. UWinnieg endorses private cloud services including NextCloud and OwnCloud. While any use of cloud services comes with some inherent risk, the risks for public and private servers are different. Some main differences include server location, server control, and attack surface. With public cloud storage data is stored in servers that could be anywhere in the world, and thus subject to that country’s laws. With private cloud services your data is stored in local servers. Private companies control public cloud services and the data that is stored there. Finally public cloud services have sprawling infrastructure with many different points where an unauthorized user could attempt to extract data, in some cases private services are less open to such attacks.

Data Lifecycle

The data lifecycle refers to all of the stages in the existence of data from collection to destruction. A lifecycle view is used to enable active management of the data over time, thus maintaining security, accessibility, and utility.

Data Management Plan (DMP)

A DMP is a formal statement describing how research data will be managed and documented throughout a research project. Almost all DMPs contain the following core elements: metadata, policies for access and sharing, policies for re-use and redistribution, and plans for archiving preservation and destruction. UWinnipeg encourages the use of DMP Assistant by Portage, a bilingual tool for preparing DMPs that follows best practices in data stewardship and walks researchers step-by-step through key questions about data management.

Data Security

Ways of keeping data safe so that the researcher can access the data when needed (data availability), the data are not altered (data integrity), the confidentiality of the data is preserved (data confidentiality), and the data are carefully preserved and disposed of appropriately (retention/destruction). Learn more about how to keep your data secure through the Research Data Security FAQ.

Data Sharing

The practice of making data available for reuse. This may be done, for example, by depositing the data in a repository or through data publication.

De-Identification

The act of minimally changing individual-level data to decrease the probability of discovering an individual’s identity. It involves masking direct identifiers (e.g., name, phone number, address) as well as transforming indirect identifiers that could be used alone or in combination to-identify an individual (e.g., birth dates, geographic details, dates of key events). If done correctly, de-identification consistently provides assurance, that there is a very small risk of re-identification of any data that are released.

Deletion

The process of destroying data stored on hard disks, mobile devices and other forms of electronic media so that it is completely unreadable and cannot be accessed or used.

Encryption

Encryption is a method of encoding your data so that only you, or someone you authorize, can access it. The Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans states that “in general, identifiable data obtained through research that is kept on a computer and connected to the Internet should be encrypted.” There are a couple of different methods for encrypting your data including encrypting individual files and encrypting entire devices. Both options have pros and cons.

Metadata

Literally, "data about data"; data that defines and describes the characteristics of other data, used to improve both understanding of data and data-related processes. Business metadata includes the names and business definitions of subject areas, entities and attributes, attribute data types and other attribute properties, range descriptions, valid domain values and their definitions. Technical metadata includes physical database table and column names, column properties, and the properties of other database objects, including how data is stored. Process metadata is data that defines and describes the characteristics of other system elements (processes, business rules, programs, jobs, tools, etc.). Data stewardship metadata is data about data stewards, stewardship processes and responsibility assignments.

Anonymized Data

Data that could not lead to the identification of a specific individual, to distinguishing one person from another, or to personally identifiable information. These may be data that have been de-identified, or that could not lead to personally identifiable information in the first place.

Online Survey Software

Online survey software provides questionnaires that research subjects can complete over the Internet. They are usually Web forms along with a database to store the answers, and statistical software to provide analytics.

Personal Health Information (PHI)

Personal Health Information (PHI) is personally identifiable information related to an individual’s health care. This may include identifying information about an individual that relates to their physical or mental health; that consists of the health history of their family; that relates to payments or eligibility for health care; or, includes the individual’s health number.

Personally Identifiable Information (PII)

Personally identifiable information (PII) is information that can be used to uniquely identify, contact, or locate a single person or can be used with other sources to uniquely identify a single individual. Data are identifiable if the information contains the name of an individual, or other identifying items such as birth date, address or geocoding. Data will be identifiable if the information contains a unique personal identifier and the holder of the information also has the master list linking the identifiers to individuals. Data may also be identifiable because of the number of different pieces of information known about a particular individual. It may also be possible to ascertain the identity of individuals from aggregated data where there are very few individuals in a particular category. Identifiability is dependent on the amount of information held and also on the skills and technology of the holder. 

Portable Storage Device

Portable storage devices are any device or media which is easily transportable, upon which information can be stored. This definition is not restricted to purpose built storage devices such as CD/DVDs, removable hard drives, and USB flash drives, but also may include laptop computers, tablets, smart phones, PDAs, and any other portable computing device. Portable storage devices can be either internet-connected or not-connected and different data security measures will apply in each case.

Public Facing

A public facing resource accepts anonymous connection requests from any public internet protocol address, in other words externally accessible resources that the public can access.

Raw Data

Data that have not been processed for meaningful use. Although raw data have the potential to become "information," they require selective extraction, organization, and sometimes analysis and formatting for presentation. Raw data has yet to be de-identified and if there is any stage of the data lifecycle wherein your data will contain PII it is this stage.

Research Data Management (RDM)

The active organization and maintenance of data throughout the research cycle to increase efficiency and enable reusability of the data products.

Adapted with permission from Chandra Kavanagh.