Best Practices for Managing Your Data - Brandeis University

Best Practices for Managing Your Data - Brandeis University

The Many Lives of Research Data: A Discussion on Organizing, Preserving & Sharing Gina Bastone and Melanie Radik Based on material created for the New England Collaborative Data Management Curriculum by: Andrew Creamer, UMass Medical School Donna Kafel, UMass Medical School Elaine Martin, UMass Medical School Regina Raboin, Tufts University

Generously funded by NLM grant HHS-N-276-2011-00010-C CC BY-NC Why Manage Data? And yet, data is the currency of science, even if publications are still the currency of tenure. To be able to exchange data, communicate it, mine it, reuse it, and review it is essential to scientific productivity, collaboration, and to discovery itself (Gold 2007).

Module 1: Overview of Research Data Management Types of Research Data Observational Qualitative Experimental Simulation data Derived or compiled data Module 1: Overview of Research Data Management

3 Good Reasons for Managing Your Data Personal research advantages o Avoid duplication of efforts o Find older data quickly o Increase citation impact and discoverability of your research o Avoid errors that get you mocked by Colbert Transparency & integrity

o Defend publication challenges o Patent & copyright security Compliance o IRB o Funding agencies requirements o Publishers requirements Module 1: Overview of Research Data Management Emerging Federal

Requirements The Administration is committed to ensuring that the direct results of federally funded scientific research are made available to and useful for the public, industry, and the scientific community. Such results include peer-reviewed publications and digital data (Holdren 2013). Module 1: Overview of Research Data Management

Module 1: Overview of Research Data Management Module 1: Overview of Research Data Management Module 1: Overview of Research Data Management Module 1: Overview of Research Data Management

Module 1: Overview of Research Data Management Module 1: Overview of Research Data Management Module 1: Overview of Research Data Management Data Management

Issues Module 1: Overview of Research Data Management Issue #1: Responsibility Challenges of Team Science and collaborating across institutions Challenges managing research notes: laboratory notebooks, interview and survey products, etc.

Challenges with rotating research personnel Module 1: Overview of Research Data Management Module 1: Overview of Research Data Management Best Practices Define roles and assign responsibilities for data management

For each task identified in your data management plan, identify the skills needed to perform the task Match skills needed to available staff and identify gaps Develop training plans for continuity Assign responsible parties and monitor results Module 1: Overview of Research Data Management Applying Best Practices

There may be policies or laws that affect who should be responsible for different data management tasks within your research team: consult your subject librarian for help determining whether any apply to your research. Contact your subject librarian for assistance, resources, and tools to better manage the information in your paper and/or electronic laboratory notebooks. Librarians can also help you to catalog, organize, preserve and archive your laboratory notebooks and/or research notes. Module 1: Overview of Research

Data Management Issue #2: Data Management Plans (DMPs) What types of data will be created? Who will own, have access to, and be responsible for managing these data? What equipment and methods will be used to capture and process data? Where will data be stored during and after? Module 1: Overview of Research

Data Management Funder DMP vs the Life Cycle of a Project Module 1: Overview of Research Data Management Issue #3: File Management Inconsistently labeled files o in multiple versions

o inside poorly structured folders o stored on multiple media o in multiple locations o and in various formats Module 1: Overview of Research Data Management Slide Credit: Jen Ferguson 2013 Best Practices Avoid special characters in a file name

Use capitals or underscores instead of periods or spaces Use 25 or fewer characters Use documented & standardized descriptive information about the project/experiment Use date format ISO 8601:YYYYMMDD Include a version number Module 1: Overview of Research Data Management Applying Best Practices

Librarians can help you with resources and tools for: Creating file naming conventions Creating directory structure naming conventions Versioning your files Choosing appropriate file formats for preserving and sharing your data files Module 1: Overview of Research Data Management Issue #4: Metadata How will someone make sense of your data, e.g.

the cells and values of your spreadsheet? What best practices or disciplinary standards could be used to label your data? How can you describe a data set to make it discoverable? Module 1: Overview of Research Data Management Metadata for Data

Module 1: Overview of Research Data Management Metadata for Data Module 1: Overview of Research Data Management Best Practices Describe the contents of data files Define the parameters and the units on the parameter

Explain the formats for dates, time, geographic coordinates, and other parameters Define any coded values Describe quality flags or qualifying values Define missing values Module 1: Overview of Research Data Management Best Practices: Fields File str

Title

Creator Identifier Subject Funders Rights Access information Language Dates Location Methodology

Data processing Sources List of file names File Formats Module 1: Overview of Research Data Management

Variabl Code li Version Checks Applying Best Practices

Librarians can help you with locating metadata standards for creating a data dictionary such at the Clinical Trials Protocol Data Elements Definitions used by the FDA or the important metadata elements identified by the ICPSR. We can also help you to locate disciplinary and general metadata standards and resources for annotating and describing your data and data files, such as DDI, used in population research, or Dublin Core, which is a general standard that is widely used. Module 1: Overview of Research

Data Management Issue #5: Backing Up and Securing Data How often should data be backed up? How many copies of data should you have? Where can you store your data? How much server space can I get? Module 1: Overview of Research Data Management

Slide Credit: Moore 2013 Best Practices Make 3 copies (original + external/local + external/remote) Have them geographically distributed (local vs. remote) Use a Hard drive (e.g. Vista backup, Mac Timeline, UNIX rsync) or Tape backup system Cloud Storage - some examples of private sector storage resources include: (Amazon S3, Elephant Drive, Jungle Disk, Mozy, Carbonite) Unencrypted is ideal for storing your data because it will make it most

easily read by you and others in the futurebut if you do need to encrypt your data because of human subjects then: Keep passwords and keys on paper (2 copies), and in a PGP (pretty good privacy) encrypted digital file Uncompressed is also ideal for storage, but if you need to do so to conserve space, limit compression to your 3rd backup copy Module 1: Overview of Research Data Management Applying Best Practices

Consult Research Technology: [email protected] o On-campus Research Omega storage provisioning up to multiTB o Personal server hardware and set-up advice o Cloud storage and backup solutions o Other backup advice: e.g., external hard drive recommendations Consult Information Security: [email protected] o IRB & other compliance implementation o Setting encryption and permissions

Module 1: Overview of Research Data Management Issue #6: Ownership and Retention Intellectual Property Policy IRB data retention policy Funders data retention policy Publishers data retention policy Federal and State laws Module 1: Overview of Research

Data Management Module 1: Overview of Research Data Management Module 1: Overview of Research Data Management Retention Best Practices: It Depends

IRB OHRP Requirements: 45 CFR 46 = 3 years from completion HIPAA Requirements: = minimum of 6 years from date of signed authorization FDA Requirements 21 CFR 312.62.c = 2 years following the date a

marketing application is approved for the drug or the investigation is discontinued and FDA is notified. VA Requirements: At present records for any research that involves the VA must be retained indefinitely per VA federal regulatory requirements. Intellectual Property Requirements - Any research data used to support a

patent through must be retained for the life of the patent in accordance with Intellectual Property Policy. Questions of data validity: If there are questions or allegations about the validity of the data or appropriate conduct of the research, you must retain all of the original research data until such questions or allegations have been completely resolved. Module 1: Overview of Research

Data Management Applying Best Practices Check your funder or publisher requirements If these are unclear, contradictory, or absent, consult the University Records Manager, George Despres: [email protected] Or consult our Records Management Guide: http:// brandeis.libguides.com/URM

Module 1: Overview of Research Data Management Issue#7: Long-Term Planning What will happen to my data after my project ends? How can I appraise the value of my data? What are my options for archiving and preserving my data? What are my options for publishing and sharing

data? Module 1: Overview of Research Data Management Importance of Formats Slide Credit: Jen Ferguson 2013 Best Practices Is the file format open (i.e. open source) or closed (i.e. proprietary)?

Is a particular software package required to read and work with the data file? If so, the software package, version, and operating system platform should be cited in the metadata Do multiple files comprise the data file structure? If so, that should be specified in the metadata When choosing a file format, select a consistent format that can be read well into the future and is independent of changes in applications. Non-proprietary: Open, documented standard, Unencrypted, Uncompressed, ASCII formatted files will be readable into the future.

Module 1: Overview of Research Data Management Applying Best Practices Librarians can help you to appraise your data and plan for the longterm preservation of your research data. This includes: o Creating a doi and persistent id for maximizing discoverability of your data and measuring its citation impact o Locating file formats suitable for long-term preservation o Locating and submitting data to a suitable data repository

o Choosing metadata standards for increased discoverability o Help with publishing and sharing your data Module 1: Overview of Research Data Management For More Information: For storage and backup solutions please send a request to Ian Roys research technology support team at [email protected] For security concerns, particularly for HIPAA and other personal data please consult Mike Corns campus security group at

[email protected] For retention requirements and file destruction best practices, please consult the University Records Manager, George Despres: [email protected] For help drawing up a DMP, setting up file management workflows, identifying more stable file formats, more complete metadata descriptions, appropriate data repositories and more, please contact your subject librarian. Module 1: Overview of Research Data Management

Works Cited DataONE. 2013. Best Practices for Data Management. http://www.dataone.org/best-practices. MIT Libraries. 2013. Data Management and Publishing. MIT http://libraries.mit.edu/guides/subjects/data-management/index.html. Office of Research Integrity. 2013. Data Management. United States Department of Health and Human Services. United States Federal Government. http://ori.hhs.gov/education/products/rcradmin/topics/data/ open.shtml. Special thanks to Jen Ferguson, Richard Moore and Glenn Gaudette

for permission to use their slides. Module 1: Overview of Research Data Management

Recently Viewed Presentations

  • Writing Across the Curriculum

    Writing Across the Curriculum

    Example Biography Glogster - www.glogster.com - Import image, audio, video, or link from YouTube, URL, etc. Tools also for recording audio/video. Create "glogs" or posters with media, text tools, frames for media, layer media. ZooBurst -digital storytelling tool that lets...
  • The collection, curation and modeling of Open Melting

    The collection, curation and modeling of Open Melting

    The collection, curation and modeling of Open Melting Point measurements. August 26, 2011. 5th Meeting on U.S. Government Chemical Databases and Open Chemistry. Jean-Claude Bradley. Department of Chemistry. Drexel University
  • Lecture 7: Wireless Sensor Networks

    Lecture 7: Wireless Sensor Networks

    Wireless sensor networks mainly use . broadcast. communication while ad hoc networks use . point-to-point . communication. Unlike ad hoc networks wireless sensor networks are . limited by sensors. limited power, energy and computational capability. Sensor nodes may . not...
  • The Psychodynamic Approach to Personality

    The Psychodynamic Approach to Personality

    Freud's Theory: In the psychodynamic approach, Freud postulated that our personality is mostly defined by unconscious psychological processes. Freud assumed that we had various structures within this unconscious mind that allowed us to know what to do in a situation
  • Coastal Processes - University of Canterbury

    Coastal Processes - University of Canterbury

    Additionally, these off-shore bars ensure that the large waves break further off-shore and therefore have less energy to force water up the beach thereby helping to limit further erosion to the backshore and dune system.
  • Pumping Apparatus Driver/Operator - Yola

    Pumping Apparatus Driver/Operator - Yola

    Pumping Apparatus Driver/Operator — Lesson 15 Pumping Apparatus Driver/Operator Handbook, 2nd Edition Chapter 15 — Foam Equipment and Systems
  • Sampling Biodiversity Using macroinvertebrates

    Sampling Biodiversity Using macroinvertebrates

    Diversity of Macroinvertebrate Life is an indicator of . STREAM HEALTH . Benthic Macroinvertebrates? "Benthic" = Bottom. Most Macros are in the LARVAL or Nymph stage of life cycle. Egg Larva/Nymph Adult. Dragonfly Life cycle. ... Crayfish, caddisflies, stoneflies. Collectors-...
  • S. M. LaValle (UIUC), M. S. Branicky (Case

    S. M. LaValle (UIUC), M. S. Branicky (Case

    Probabilistic Planning Probabilistic Roadmap Methods (Overmars, Kavraki, Amato, Han) Building the PRM Lazy PRM Drop line 6 Collision checking occurs during search 0.13 0.066 Spectrum of Planners S. M. LaValle (UIUC), M. S. Branicky (Case Western), and S. R. Lindemann.