Manual

# Phase 2 SpatialOmics

# Overview

SpatialOmics is defined as spatially resolved sequencing and sequence-hybridization based omics technologies which molecular profiling of tissues while preserving spatial context. Please see covered assays and out of scope for specific assays covered by the SpatialOmics standard.

Metadata requirements are documented in the HTAN Data Model readthedocs pages. This part of the manual describes file requirements for SpatialOmics data.

# Covered Assays

Spot-based capture assays
These methods capture transcripts via physical arrayed spots or beads with spatial barcodes:

  • 10x Genomics Visium and Visium HD
  • Slide-seq and related variants

In situ sequencing (ISS) and hybridization-based assays
These assays directly read transcript identity and location in fixed tissue using fluorescent in situ sequencing or barcoded probes:

  • seqFISH and seqFISH+
  • MERFISH
  • Academic ISS methods

Barcoded capture with molecular decoding (molecular barcoding + NGS)
These platforms use transcript capture followed by NGS-based decoding, often with multiplexed probe panels and image-based segmentation:

  • 10x Genomics Xenium
  • Nanostring CosMx SMI
  • BGI/MGI STOmics platforms:
    • Stereo-seq (RNA)
    • Stereo-CITE (RNA + protein)
    • Stereo-seq OMNI (multiome)

Other sequencing-based spatial molecular profiling
This includes assays that capture spatially resolved:

  • Epigenetic features (e.g., spatial ATAC-seq)
  • Multi-omic profiles (e.g., transcriptome + proteome)
  • Spot-based capture assays

# Out of Scope/Covered in a separate standard

Non-spatial transcriptomics
Bulk RNA-seq Single-cell RNA-seq (scRNA-seq)

Mass spectrometry-based imaging and spatial proteomics
Covered in a separate spatial proteomics RFC

Multiplexed protein-based spatial imaging
Including CyCIF, CODEX, and MIBI Covered in the Imaging RFC

Non-multiplexed digital pathology
Including diagnostic H&E imaging and conventional histopathology slides Covered in the Digital Pathology RFC

Please see the "Required Files" and "Optional Files" tabs for more information.

Level Requirement Data Type Example Files
1 optional bundle of Raw data (FASTQs) .tar, tar.gz, .zip
3 required Processed assay output bundle with accompanying panel information table, if applicable. ⚠️ NO CONTROLLED ACCESS DATA. .tar, tar.gz, .zip
4 optional Harmonized output file (e.g., AnnData or seurat compatible RDS) included to support downstream analysis h5ad, RDS

H&E or multiplex immunofluorescence (MIF) image metadata is captured in separate digital pathology or multiplex tissue imaging templates. These may be connected via attributes in the SpatialOmics metadata or they can be linked through shared biospecimen IDs.

HTAN Phase 2 SpatialOmics Data Levels
HTAN Phase 2 SpatialOmics Data Levels

# Required Components

For each spatial assay, contributors must provide the following:

Processed assay output bundle (Level 3)

  • A compressed archive (e.g., .tar.gz, .zip) containing platform-specific output files. Please see Example Level 3 bundles.
  • Must follow the expected directory and file structure defined in HTAN platform guidance.
  • ⚠️ May NOT contain controlled-access data such as fastqs or bams.
  • Should include segmentation outputs, raw or normalized matrices, and any relevant vendor JSONs, manifest files, or images.
  • For assays that include registration transforms or same-section imaging, these must be bundled or clearly referenced.

Bundle-level metadata

  • One row per assay bundle in the spatial metadata table (submitted as .tsv or via Synapse Table).
  • Captures key information such as assay platform, kit version, QC metrics, and registration to upstream biospecimen.

Please see the HTAN Data Model readthedocs pages for specific metadata attributes and requirements.

Panel information (if applicable)

  • If the assay uses a targeted sequencing or protein panel, a reference to a Synapse-hosted panel information table must be included.
  • Sequencing and protein panels should be submitted as separate tables or filtered by Target_Type.
  • Applies to assays like Xenium, CosMx, and Stereo-CITE.

# Example Level 3 bundles

If you do not see your platform in the examples below and need more information, please contact your data liaison.

10x Genomics Visium / Visium HD
Typical output:

  • filtered_feature_bc_matrix.h5
  • spatial/tissue_positions_list.csv
  • spatial/scalefactors_json.json
  • spatial/tissue_lowres_image.png (or highres, if used)
  • analysis/clustering/gene_expression_graphclust/clusters.csv
  • metrics_summary.csv
  • Optional: molecule_info.h5
  • Optional: Binned outputs for Visium HD (e.g., bin_2um/, bin_8um/ folders)
  • Prohibited: .fastq, .bam and .bai files. If these are in your output folder, you must remove them.

Nanostring CosMx SMI
Typical output:

  • cell_by_gene.csv (expression matrix)
  • cell_metadata.csv
  • gene_metadata.csv
  • segmentation_mask.ome.tiff or equivalent
  • full_resolution_image.ome.tiff
  • cell_boundaries.json or .csv
  • pipeline_metadata.json
  • Optional: FOV summary statistics

10x Genomics Xenium
Typical output:

  • analysis/transcripts.parquet
  • analysis/cells.parquet
  • analysis/images/ (HE or DAPI TIFFs)
  • analysis/segmentation.zarr
  • analysis/analysis.json
  • metadata.json
  • Optional: panel.json
  • Optional: spatial/ subfolder for image registration assets

seqFISH
Typical output:

  • Counts_raw.csv
  • coordinates.csv
  • seqFISH_metadata.csv
  • DAPI experiment folder (ome.tiff files)
  • ROI experiment folder (ome.tiff files)
  • point locations.mat
  • all_gene_Names.mat

# Optional or Situational Components

Level 1 data – bundle of Raw data (FASTQs)

  • Not required and contents not standardized.
  • Minimal metadata attributes describing contents to enable reuse.

Level 4 data – Interoperable h5ad or rds files

  • If available, a harmonized output file (e.g., AnnData or seurat compatible RDS) may be included to support downstream analysis.
  • Should follow standard and structure conventions (to be separately defined).
  • Includes spatial coordinates, feature metadata, and expression matrices in a reusable format.

High resolution Imaging metadata

  • H&E or multiplex immunofluorescence (MIF) image metadata is captured in separate digital pathology or multiplex tissue imaging templates.
  • The SpatialOmics level3 metadata provides a pointer mechanism to connect those templates, although this is also possible through shared biospecimen IDs.