As part of planning their research design, researchers must develop a data protection plan that adequately protects their data from unintended disclosure or possible theft. Researchers are required to submit a data protection plan as part of their IRB protocol. To support the development of data protection plans, the guidelines define a three-level categorization system for research data and describe the minimum data protections recommended for each level. These storage technologies and data handling practices are designed to be consistent with Princeton University’s standards for protecting student records and other types of administrative data . A researcher can always elect to use more secure methods of data management than the minimum recommended (for example, one could manage level 1 data as if they were level 3 data). However, the benefits of additional layers of security beyond the minimum recommended need to be weighed against the additional burdens they can impose on members of the research team.
Level 1 - Benign information about individually identifiable people
Level 1 data contain PII on human subjects who have been given an assurance of confidentiality. Level 1 data files do not contain sensitive information but need some protection due to the assurance of confidentiality. Accidental or unintended disclosure is unlikely to result in harm to the study subjects. The risks to the research subject may be considered no greater than those associated with everyday life.
Level 2 - Sensitive information about individually identifiable people
Level 2 data include individually identifiable information that, if disclosed, could reasonably be expected to present a non-minimal risk of civil liability, moderate psychological harm, or material social harm to individuals or groups. The risks to the research subject may be considered greater than those associated with everyday life.
Level 3 - Very sensitive information about individually identifiable people
Level 3 data include individually identifiable information that could cause significant harm to an individual if exposed, including, but not limited to, serious risk of criminal liability, serious psychological harm or other significant injury, loss of insurability or employability, or significant social harm to an individual or group. The risks to the research subject may be considered greater than those associated with everyday life.
As a general practice, PII that are needed for project management but not needed for analysis should be separated from the data to be used for analysis at the earliest possible phase of the project. In practice, this usually means splitting the data into two files: one containing all of the PII not needed for the analysis and a unique ID variable, the other containing the same unique ID variable and all of the data collected for the analysis. The common identifier in both data sets enables the researcher to re-link the PII and non-PII data at a future date, as the needs of the project may require.
Removal of all PII (temporarily or permanently) significantly reduces the risk of harm to study participants, but does not entirely eliminate the potential for harm that can result from loss, theft or unintended disclosure. However, even when working with de-identified data, researchers should continue to use the minimum data protection standards outlined below.