#
File Standards
The HTAN Data Model includes requirements for Assay Files. These include:
- How Data Files are organized (
Assay Data Levels ); - What File Formats are accepted (
Accepted File Formats ); and - Specific Assay or Assay Module Requirements (
Specific Requirements ).
#
Assay Data Levels
In both the HTAN Phase 1 and Phase 2 Data Models, assay data is divided into levels. Each assay type has levels progressing from raw data to more processed data. (See Figure 1) For Sequencing data, the HTAN Model Data levels help distinguish data as open access versus access-controlled. For example, level 1 and 2 sequencing data are access-controlled data.
#
Phase 2 Assay Data Levels
The Phase 2 Data Model is still being developed. This page will be updated as these updates occur. The following are the currently supported assays, organized alphabetically.
#
Phase 1 Assay Data Levels
Phase 1 data was divided into 4 levels for most assay types. Exceptions include:
- some assays did not have a level 1 (RPPA) or level 1 data was typically not collected (Imaging), but higher levels exist;
- some assays did not have higher levels such as scDNA-seq and scmC-seq (only levels 1 and 2), Bulk DNA, Bulk RNA, HiC and Bulk Methylation Sequencing (only levels 1-3);
- spatial transcriptomics assays had platform-specific data levels which deviated from the traditional 4 levels. Specifically,
- some platforms have only one experiment level instead of multiple file levels (10X Xenium ISS and Nanostring CosMx SMI); and
- some platforms have additional files (auxiliary or annotation metadata) which were not a part of the typical 4 level system (e.g. 10X Visium, Nanostring GeoMx DSP).
Please see the Table below for more information.
#
Phase 2 vs Phase 1 Assay Data Levels
As outlined in the Data Model Introduction, the Phase 1 and Phase 2 HTAN Data Models Differ. In terms of assay data levels, currently, the following differences exist.
#
Accepted File Formats
In HTAN Phase 2, there are specific requirements for file formats which are checked for each assay type. Please expand the panel below for more information.
#
Specific Assay/Assay Module Requirements
#
Genomic References and Annotation Versions
HTAN data contributors must provide the genomics reference and annotation version used for any aligned sequencing files. This information can be found in the level 2 metadata. HTAN strongly recommends using GENCODE/Ensembl genome annotations for level 2 sequencing data.
HTAN does not restrict data contributors to a specific genomic reference or annotation versions for most sequencing data with the exception of level 3/4 scRNA-seq data. Please see the Phase 2 Single Cell RNA-seq page for more information regarding the scRNA-seq requirements.
#
Digital Pathology and MultiplexMicroscopy
HTAN can only accept de-identified information. Imaging equipment will often capture dates and other information and store this in the header of the resulting image file. HTAN cannot accept image files with any dates in the header. All image files must have dates or other identifying information removed from the image file header.
#
scRNA-seq
Specific file requirements for single cell RNA-seq h5ad files are modeled after CELLxGENE's requirements. Please see the Phase 2 Single Cell RNA-seq page for more information.