Manual

# Phase 2 Whole Exome Sequencing

# Overview

Phase 2 HTAN data standards for Whole Exome Sequencing (WES) are similar to those implemented in Phase 1 under the assay type "Bulk DNA-seq".

Metadata requirements are documented in the HTAN Data Model readthedocs pages. This part of the manual describes file requirements for Whole Exome Sequencing data.

# Data Levels

WES data has 3 levels.

Table 1 HTAN Data Levels for WES Data

Level Definition Example Data
1 Raw data FASTQs, Unaligned BAMs
2 Aligned primary data Aligned BAMs
3 Derived biomolecular data VCFs (mafs optional)

# Genomic Reference and Annotation Version

HTAN data contributors must provide the genomics reference and annotation version used for any aligned sequencing files. This information can be found in the level 2 metadata. HTAN strongly recommends using GENCODE/Ensembl genome annotations for level 2 sequencing data.

# File Requirements

Level 1
FASTQ files or Unaligned BAM files MUST be submitted for all sequencing data.
Each FASTQ or Unaligned BAM file MUST have a single record (row) in the manifest.

Level 2
Level 2 data MUST be submitted if alignment was performed.

Level 3
Level 3 DNA-seq files MUST include a vcf file containing called variants.
🌟 Level 3 DNA-seq files SHOULD include a seg file if copy number variation was assessed.
🤷‍♀️ Submission of maf files is OPTIONAL.