Submitting Assay Data and Metadata
As stated in Data Submission Overview, data submission involves two key steps: 1. Uploading assay data files to Synapse; and 2. Completing and validating metadata using the Data Curator App (DCA).
!!! Once assay data files are submitted to Synapse, the files will have entityIDs (e.g. syn12345670) assigned to them. These can then be prepopulated into the manifests on the DCA. For this reason, assay files should be submitted before generating the associated manifests. !!!
This page provides details regarding those steps.
To submit data, you will also need to understand the HTAN data model and specific requirements for your particular data type. For a general overview of the HTAN data model, please see HTAN Data Model. To understand specific requirements for your data type, please see Data Standards.
HTAN uses the Synapse Portal and DCA, developed and maintained by Sage Bionetworks, to manage clinical, biospecimen and assay data submissions (dataset ingress). In order to submit data, your center should:
- Have at least one user with Certified User status on Synapse.
- Contact your Data Liaison to set up your project and cloud bucket.
- Ensure the assay dataset conforms to the HTAN Data Model, uses HTAN Identifiers and does not contain Protected Health Information (PHI).
- Organize and upload your dataset to the Synapse Project
- Validate and submit metadata using the DCA.
Please read the rest of this page for more information about each of these steps.
Have at least one user with Certified User status on Synapse.
To upload files to the Synapse Platform, you need to be a Synapse Certified User. Because Synapse stores data from human subjects research, Sage Bionetworks requires that you demonstrate understanding of and compliance with privacy and security issues. You can complete your certification by taking a short certification quiz. Please see the Synapse Certified User Documentation for more information.
Contact your Data Liaison to set up your project and cloud bucket.
When you are ready to upload data, please contact your data liaison. Your data liaison will need to know: 1. Your centers 2. Who on your team will be doing the data upload. 3. The synapse usernames for team members identified in #2.
Please have users obtain certified user status prior to contacting your data liaison.
With the above information, the DCC will initialize your Synapse project for metadata submission and a cloud storage location for dataset uploads. If the data submission is for a new atlas, the DCC will also create an HTAN atlas ID. Once your Synapse project has been initialized, your data liaison will reach out to you with the location of your Synapse project and you can begin uploading your data.
Ensure the dataset conforms to the HTAN Data Model, uses HTAN Identifiers and does not contain PHI.
The HTAN Data Model is built upon data standards described on the Data Standards page. All HTAN Centers are required to encode their clinical, biospecimen and assay data and metadata using the HTAN Data Model. If you have a new data type which is not currently represented in the HTAN Data Model, please contact your data liaison.
A concrete way to understand the expectations for data submissions is to view the metadata templates (manifests) for clinical, biospecimen and assay data available in the (DCA). For any given dataset, you may be submitting:
- clinical manifest(s), e.g. Demographics, Diagnosis
- biospecimen manifest(s)
- assay manifest(s), e.g. Bulk RNA-seq level 1
- assay data files
The first three items will be validated and submitted using the DCA. The last item, assay data files, only needs to be uploaded to the synapse project itself.
All data should be identified using HTAN identifiers. Please see the HTAN Identifier section of this manual for more information regarding HTAN identifiers.
!!! Please review your data to ensure that it does not contain PHI. !!!
Organize and upload your dataset to the Synapse Project
Please organize your data using the flattened data layout described in Synapse’s Data Ingress Docs
Data files can be transferred using the Synapse User Interface (Synapse UI) or programmatically. Please see Synapse’s Data Ingress Docs for more information on how to upload files.
!!! If you upload files to Synapse programmatically, please use synapseclient version 3.0.0 or higher. !!!
Validate and submit metadata using Synapse’s Data Curator App (DCA).
The DCA contains HTAN-specific manifests (metadata templates) which can be 1) completed on the app, or 2) downloaded, completed and uploaded back to the DCA. Manifests for assay data will be pre-populated with assay file entityIDs once they are associated with a particular dataset folder. Once the metadata templates are completed by your center, they should then be validated and submitted via the DCA.
!!! Please note: If you have added files where a pre-existing manifest exists, please add to data to the existing manifest on the DCA app or download the existing manifest from the DCA to make updates. Do not use a local copy of the manifest at your center. !!!
Please see Synapse’s Data Ingress Docs for more details regarding the web app.