There are several factors to consider when designing a data protection plan for storing and protecting research data. In all cases, researcher should develop a data protection plan that addresses these four priorities:
- Protecting research subjects from harm that might result from unintended disclosure or inappropriate use of confidential data
- Upholding the researcher’s assurance of confidentiality
- Adhering to requirements specified in any restricted use agreements
- Using optimal storage and use technologies that protect data securely without imposing unwarranted or excessive burdens on researchers
-
-
As part of planning their research design, researchers must develop a data protection plan that adequately protects their data from unintended disclosure or possible theft. Researchers are required to submit a data protection plan as part of their IRB protocol. To support the development of data protection plans, the guidelines define a three-level categorization system for research data and describe the minimum data protections recommended for each level. These storage technologies and data handling practices are designed to be consistent with Princeton University’s standards for protecting student records and other types of administrative data . A researcher can always elect to use more secure methods of data management than the minimum recommended (for example, one could manage level 1 data as if they were level 3 data). However, the benefits of additional layers of security beyond the minimum recommended need to be weighed against the additional burdens they can impose on members of the research team.
Level 1 - Benign information about individually identifiable people
Level 1 data contain PII on human subjects who have been given an assurance of confidentiality. Level 1 data files do not contain sensitive information but need some protection due to the assurance of confidentiality. Accidental or unintended disclosure is unlikely to result in harm to the study subjects. The risks to the research subject may be considered no greater than those associated with everyday life.
Level 2 - Sensitive information about individually identifiable people
Level 2 data include individually identifiable information that, if disclosed, could reasonably be expected to present a non-minimal risk of civil liability, moderate psychological harm, or material social harm to individuals or groups. The risks to the research subject may be considered greater than those associated with everyday life.
Level 3 - Very sensitive information about individually identifiable people
Level 3 data include individually identifiable information that could cause significant harm to an individual if exposed, including, but not limited to, serious risk of criminal liability, serious psychological harm or other significant injury, loss of insurability or employability, or significant social harm to an individual or group. The risks to the research subject may be considered greater than those associated with everyday life.
As a general practice, PII that are needed for project management but not needed for analysis should be separated from the data to be used for analysis at the earliest possible phase of the project. In practice, this usually means splitting the data into two files: one containing all of the PII not needed for the analysis and a unique ID variable, the other containing the same unique ID variable and all of the data collected for the analysis. The common identifier in both data sets enables the researcher to re-link the PII and non-PII data at a future date, as the needs of the project may require.
Removal of all PII (temporarily or permanently) significantly reduces the risk of harm to study participants, but does not entirely eliminate the potential for harm that can result from loss, theft or unintended disclosure. However, even when working with de-identified data, researchers should continue to use the minimum data protection standards outlined below. -
How should I store and protect research data that do not contain PII?
-
While de-identification of research data is an important step in protecting confidentiality, there is still some possibility that the identities of research subjects can be inferred from non-PII information in the data, especially if combined with other forms of data outside of the researcher’s control. Therefore, researchers working with data from which PII have been removed (temporarily or permanently) are advised to follow these minimum data security guidelines:
- Workstations should be configured and used in a manner that is consistent with University security practices
- Storage of data on password protected PCs, laptops, file servers, or cloud storage services
- Secure storage of removable storage media such as CDs or flash memory drives
- Use of encryption when files are transferred from one user to another
-
How should I store and protect research data that contain PII?
-
Level 1 - Benign information about individually identifiable people
Level 1 data that contain PII may be stored on any of the following devices as long as they are at a minimum configured to require users to authenticate themselves using login ID and password and only allow access to authorized project team members and system administrators: Acceptable storage media include:
- The hard drive of a server or workstation as long as it is configured in a manner that is consistent with the University security practices
- Centrally-managed network file storage
- A secure cloud storage system approved by the Office of Information Technology (OIT). For information regarding approved storage systems, please contact OIT’s Research Data Management Team ([email protected]).
- Any of the following devices that are managed using an audited, check-out/check-in system:
- An external drive
- A piece of removable storage media, (e.g., USB drive)
When not in use, storage media must be kept in a locked drawer or cabinet in a secured space (e.g., a central storage area or an authorized project team member’s office), with key access required for both the office and the storage location. Similarly, data collected or stored on paper forms that have PII (such as signed consent forms or questionnaires) should be stored in a locked file cabinet in a secure office space or building.
Devices used to access Level 1 data may include any workstation or server that:- Houses the hard drive on which the data are stored,
- Is physically connected to the external hard drive or removable medium on which the data are stored,
- Is authorized to access a shared network drive on which the data are stored, or
- Is authorized to connect to a physically-secured and firewall-protected server (e.g., terminal server, Linux server accessed via SSH) in the University’s data center that has access to the data.
If Level 1 data are stored on a shared network-accessible drive:
Level 1 data should not be copied to and/or stored on a personal workstation’s hard drive.
The file storage system must be located in a physically secured space that preferably requires the presentation of a valid, authorized keycard to enter that space. The use of a physical key is permitted as an alternative to using keycards as long as there is a mechanism in place to monitor entry to the secured space.- The storage device must be protected through a network access control mechanism, such as a hardware or software-based firewall or router-based access control lists.
- Only system administrators responsible for the maintenance and management of the network-based storage media may have direct physical access to the storage system’s hardware, and that access must be granted via a keycard system.
- Anyone attempting to access Level 1 data either directly from the file server or remotely via a network connected client device must provide an authorized University computer account (NetID) and its associated password. Any external collaborator who has an account with an institution that is a member of a federation to which Princeton University subscribes, may use that account in cases where federation is supported.
- Network-based data may be accessed via wired or wireless network connections.
- All passwords must be encrypted wherever they are stored and when they are transmitted over the network.It is recommended that Level 1 data transmitted across the network be encrypted in transit.
- All encryption protocols and key lengths used must be approved by the Office of Information Technology (OIT). For information regarding approved encryption methods, please contact OIT’s IT Security team at [email protected].
The practices for managing Level 1 data should be reviewed by the researcher at the start of the data set’s lifecycle, annually during the lifecycle of the data set, and at the end of the data set’s lifecycle. Note that this information is required by the IRB as part of the annual review process.
Level 2 - Sensitive information about individually identifiable people
Level 2 data are subject to all of the standard practices listed for Level 1 data with the following adjustments:
- Level 2 data should not be copied to and/or stored on a personal workstation’s hard drive unless the Level 2 data is stored on the workstation’s hard drive in an encrypted form using encryption technology approved by the Office of Information Technology (OIT). For information regarding approved hard drive encryption technology, please contact OIT’s IT Security team ([email protected])
- If Level 2 data is stored on an external hard drive or piece of removable media, the media must be managed utilizing a check-in/check-out mechanism
- Level 2 data transmitted across the network must be encrypted utilizing an approved encryption protocol and key length.
- When maintaining network-based storage systems containing Level 2 data, system administrators are required to authenticate themselves using an authentication mechanism, approved by the Office of Information Technology (OIT), that requires the administrator to provide a second factor to validate his or her identity in addition to a password, such as a code sent to a specified mobile device, the number displayed on an assigned physical token, or some form of biometric verification (e.g., fingerprint). For information regarding approved authentication mechanisms, please contact OIT’s IT Security team ([email protected]).
- It is recommended that users of level 2 data access not access the data directly from their personal device through network file services, but through a server (e.g., terminal server, Linux server accessed via SSH) in the University’s data center that requires an authentication mechanism, approved by the Office of Information Technology (OIT), that requires the user to provide a second factor to validate his or her identity in addition to a password, such as a code sent to a specified mobile device, the number displayed on an assigned physical token, or some form of biometric verification (e.g., fingerprint). For information regarding approved authentication mechanisms, please contact OIT’s IT Security team ([email protected]).
- Only project team members may be given access to materials related to Level 2 data (derivative results, output, etc.). These individuals should be identified on the IRB protocol associated with the proposed research. For electronic data, access to the related data must be actively managed via system access controls. When not in use, any physical or removable media containing these materials must be stored in a locked drawer or cabinet in a project team member’s office, with key access for both the office and the storage location.It is recommended that Level 2 data should be stored in an encrypted manner using an approved encryption protocol and key length.
The practices for managing Level 2 data should be reviewed by the researcher at the start of the data set’s lifecycle, four times a year during the lifecycle of the data set, and at the end of the data set’s lifecycle. Note that this information is required by the IRB as part of the annual review process.
Level 3 - Very sensitive information about individually identifiable people
- Level 3 data are subject to all of the standard practices described for Level 2 data with the following adjustments:
- All authorized users of level 3 data, including project team members and system administrators, are strongly encouraged to access the data through a server (e.g., terminal server, Linux server accessed via SSH) in the University’s data center that requires an authentication mechanism, approved by the Office of Information Technology (OIT), that requires the user to provide a second factor to validate his or her identity in addition to a password, such as a code sent to a specified mobile device, the number displayed on an assigned physical token, or some form of biometric verification (e.g., fingerprint). For information regarding approved authentication mechanisms, please contact OIT’s IT Security team ([email protected]).
- Client access to the data set should require a file encryption key to decrypt the data.Level 3 data must be stored in an encrypted manner using an encryption protocol and key length approved by the Office of Information Technology (OIT). For information regarding approved encryption methods, please contact OIT’s IT Security team ([email protected]).
- Level 3 data must not be copied to and/or stored on a personal workstation’s hard drive
The practices for managing Level 3 data should be reviewed by the researcher at the start of the data set’s lifecycle, four times a year during the lifecycle of the data set, and at the end of the data set’s lifecycle. Note that this information is required by the IRB as part of the annual review process.
-
Data Protection Standards in Restricted Use Agreements
-
In cases where a Restricted Use Agreement requires additional or wholly different data storage and security measures, the researcher should default to whichever standards are more secure with respect to the sensitivity level of the data. In cases where the requirements conflict with the recommended standards outlined above, the researcher should consult with the Princeton University IRB and with specialists in the Research Computing group at OIT.