#
Phase 2 sc/snRNA-seq Data Submission
#
Overview
In HTAN Phase 2, the following files are submitted for single cell/single nuclei RNA-sequencing (sc/snRNA-seq) data:
Metadata requirements are documented in the HTAN Data Model readthedocs pages. This part of the manual describes file requirements for level 3_4 h5ad files.
The HTAN h5ad (AnnData 0.10) requirements are modeled after CELLxGENE's requirements. They also include three attributes developed by the Human Cell Atlas (HCA). Please see the
#
Required File Attributes
Similar to CELLxGENE's Dataset Requirements, level 3_4 sc/snRNA-seq h5ad files must contain the following attributes. Please see rHTAN_h5ad_exemplar_2025_03_03.h5ad for an example file which meets these requirements.
#
HTAN h5ad File Validation
The HTAN Data Coordinating Center (DCC) has released a PyPi package called HTAN-h5ad-validator with which Centers can validate their sc/snRNA-seq h5ad files. Sage Bionetworks will run the validator on sc/snRNA-seq h5ad files submitted to Synapse.
#
Background: h5ad files, CELLxGENE, Human Cell Atlas
#
h5ad (AnnData 0.10) brief overview
Please see AnnData’s documentation for a more detailed description of the AnnData object.
For HTAN’s purposes, the following parts of the AnnData object are of interest:
- .X - a matrix with counts where rows are cells and columns are genes.
- var - a matrix with gene information (e.g. gene name, gene_is_filtered).
- obs - a matrix with cell-level information.
- obsm - one or more numpy ndarrays with cell embeddings.
CELLxGENE requires that raw data are submitted. Normalized data may also be submitted.
#
CELLxGENE
The HTAN DCC submits sc/snRNA-seq data to CELLxGENE, a tool developed by the Chan Zuckerberg Initiative (CZI) to visualize and explore single cell and spatial data. The DCC submits data to CellxGene in h5ad (AnnData 0.10) format. CELLxGENE’s schema requires:
- use of Ensembl gene IDs.
- a specific genome reference and annotation version.
- specific h5ad (AnnData 0.10) attributes.
- use of specific ontologies for many of the required attributes (i.e. cell ontology).
The HTAN requirements for h5ad files are modeled after CELLxGENE's Dataset Requirements.
#
Human Cell Atlas (HCA)
The Human Cell Atlas (HCA) is a large repository of single cell data from healthy subjects. It provides standards for single-cell data submission which adopt most of the CELLxGENE schema, but also include additional fields. Aligning HTAN data with CELLxGENE will potentially facilitate data integration with other consortia such as the HCA. The HTAN requirements include three HCA attributes in addition to CELLxGENE required attributes.