Search

RDM Data Documentation & Storage

Library


It is important to provide robust documentation and description of your datasets for any possible future users. When collecting data, it is also important to think about what formats you will be storing, and what kind of security you will use.

Which file formats are preferred?

File formats can go out of style, making it difficult for anyone to retrieve the data in the future (think of laserdiscs!) It is important to try to plan for future readability as much as possible. In most cases, this means using the simplest file format that is not proprietary - i.e., not tied to a particular software or program. Open File Formats usually fall into this category. In all cases, the simpler the file format, the more likely it will be readable in the future.

Further Reading:

What kind of description is useful?

Consider adding a plain text based README file to your dataset that describes the dataset itself. You can include definitions of special terms, description of folders, file formats, citations, what was done to clean the data, and more.

In addition to a README file, you will need to add metadata.

How do I add metadata?

Metadata is the information that describes your data set. Properly describing your data helps make them accessible and usable over time.

Common metadata elements would be:

- title of dataset

- Creator - author/ researcher

- Subject - using standard language from your discipline

- Description - including the how & why

- Abstract, Type, Date, License, etc

It is useful to be as descriptive as possible in your metadata, and to provide keywords that may help give context to how your data may be useful. Even if your data will not be publicly available, it is important for your own use during the research process, and for any collaborators you may have. 

Consider how you will organize & label your data with metadata as part of your Data Management Plan. If you plan on depositing your data to a repository like WinnSpace, look at the metadata field the repository uses. Doing this early in the research process will make it easier to deposit your data.

Further Reading:

Data Storage and Security

When storing data, you need to always consider the effect of loss of the data to the study, and to anyone involved in the study. You need to plan a way to minimize the effects of the loss or destruction of data.

To prevent the accidental destruction of data, we recommend the 3-2-1 backup strategy:

  • 3 total copies of your data on
  • 2 different devices
  • with at least 1 copy offsite

The University of Winnipeg has Data Protection Classifications and Requirements that you should follow in order to safeguard your data.

Your network space is available from anywhere with WebFiles. Please contact TSC to discuss additional storage needs and costs.

Safe and Secure Collaboration

Sharing sensitive data on commercial platforms (including free solutions such as Dropbox) can potentially cause security concerns.

Compute Canada has two preferred solutions for secure and private data sharing, available to any researcher with Tri-Agency funding, as well as more Advanced Research Computing solutions. You will need to sign up for an account with Compute Canada first (approval make take up to 2 business days, and the application should be done by the Primary Investigator), as well as sign up for the solution more suited to your needs.

  • OwnCloud - 50 GB of secure collaboration space, provided by West Grid via Compute Canada.
  • Globus - larger data sharing service (terabytes of data).

 

Further Reading: