Manual

Data Submission Overview

Data Submission Introduction provides a general overview for submitting data and metadata to HTAN. This page provides details regarding those steps.

Data Submission Steps

  1. Complete Pre-submission Tasks;
  2. Submit Data; and
  3. Submit Metadata;

An HTAN Center uploads data to its HTAN Synapse Project. The project is organized into versioned folders which are tied to the HTAN Data Model. Please see the other tabs on this page for more information about Synapse Project organization and each of the data submission steps.

Figure 1. HTAN Data Submission Process
Figure 1. HTAN Data Submission Process

Pre-submission Tasks

  • Have at least one user with Certified User status on Synapse.

To upload files to the Synapse Platform, you need to be a Synapse Certified User. You can complete your certification by taking a short certification quiz. Please see the Synapse Certified User Documentation for more information. Please also be aware that use of Synapse requires agreement to the latest Synapse Terms of Service. Synapse users must also enable two-factor authentication (2FA). You can find information about the latest TOS and guidance on enabling 2FA on your account here.

  • Contact your Data Liaison

When you are ready to upload data, please contact your data liaison. Please have users obtain certified user status prior to contacting your data liaison. If you have not submitted data previously, your Data Liaison may also need to ensure you have the proper access rights for you Synapse Project.

  • Ensure the dataset conforms to the HTAN Data Model and uses HTAN Identifiers.

The HTAN Data Model is built upon data standards described on the Data Model page. All HTAN Centers are required to encode their clinical, biospecimen and assay data and metadata using the HTAN Data Model. If you have a new data type which is not currently represented in the HTAN Data Model, please contact your data liaison.

All data should be identified using HTAN identifiers. Please see the Identifiers and Creating HTAN Identifiers sections of this manual for more information regarding HTAN identifiers.

  • Ensure that your data does not contain PHI.

Please review your data to ensure that it does not contain PHI. The HTAN DCC cannot accept data with PHI, including dates less than a year. For example, dates in metadata must be converted to days from birth and all image files must have PHI removed from file headers.

HTAN Center Folder Structure

For each data release, the HTAN Data Coordinating Center (DCC) will create a set of 3 folders (Figure 2):

  • v#_ingest;
  • v#_stage; and
  • v#_release.

For example, the folders v8_ingest, v8_stage and v8_release were added to each Center's Synapse Project for the first Phase 2 HTAN data release (v8). When the DCC is ready to accept data for release v9, new folders (v9_ingest, v9_stage and v9_release) will be added to each HTAN Center's Synapse Project. The DCC will notify and guide HTAN Centers to the correct folder when data upload is open for a particular release.

Figure 2. HTAN Synapse Folder Structure
Figure 2. HTAN Synapse Folder Structure

Within the ingest folder, there are subfolders which are tied to the HTAN Data Model. For example, at the time of the v8 release, Clinical and Biospecimen records were supported as well as the following assays: Digital Pathology, Multiplex Microscopy, sc/snRNA-sequencing, SpatialOmics and Whole Exome Sequencing (WES). As a result, subfolders for each data type exist within the ingest folder as shown in Figure 3.

Figure 3. v8_ingest Subfolders
Figure 3. v8_ingest Subfolders

Upload Data Files

Data files can be transferred to the appropriate Synapse folder either by using the Synapse User Interface (Synapse UI) or programmatically.

For large file uploads, Synapse also provides guidance in this tutorial regarding uploading data in bulk.

Submit Metadata

Synapse provides a curator within its platform for creating and managing metadata. Please see their documentation for information about using the curator. Please see the Metadata Standards page of this manual for more information about Phase 2 HTAN Metadata requirements.

In the Synapse system, metadata can be File-based (associated with a data file) or Record-based. Please see Synapse's Documentation for more information about these terms.

For HTAN:
Record-based metadata includes the Clinical and Biospecimen modules of the HTAN2 Data Model.
File-based metadata includes all other modules in the HTAN2 Data Model.

Synapse

Understanding the HTAN Data Model

  • To understand the general structure of the HTAN Data Model and HTAN Identifiers, please see the HTAN Data Model section of this manual.