Phase 2 Whole Exome Sequencing
Overview
Phase 2 HTAN data standards for Whole Exome Sequencing (WES) are similar to those implemented in Phase 1 under the assay type "Bulk DNA-seq".
Metadata requirements are documented in the HTAN Data Model readthedocs pages. This part of the manual describes file requirements for Whole Exome Sequencing data.
Data Levels
WES data has 3 levels.
Table 1 HTAN Data Levels for WES Data
Genomic Reference and Annotation Version
HTAN data contributors must provide the genomics reference and annotation version used for any aligned sequencing files. This information can be found in the level 2 metadata. HTAN strongly recommends using GENCODE/Ensembl genome annotations for level 2 sequencing data.
File Requirements
Terminology:
This document uses the following terms from IEFT RFC 2119
MUST / REQUIRED / SHALL: ✅ (denotes absolute requirement)
MUST NOT / SHALL NOT: ❌ (denotes absolute prohibition)
SHOULD / RECOMMENDED: 🌟 (denotes recommendation)
SHOULD NOT / NOT RECOMMENDED: 🙅 (denotes not recommended)
MAY / OPTIONAL: 🤷♀️ (denotes optional)
Level 1
✅ FASTQ files or Unaligned BAM files MUST be submitted for all sequencing data.
✅ Each FASTQ or Unaligned BAM file MUST have a single record (row) in the manifest.
Level 2
✅ Level 2 data MUST be submitted if alignment was performed.
Level 3
✅ Level 3 DNA-seq files MUST include a vcf file containing called variants.
🌟 Level 3 DNA-seq files SHOULD include a seg file if copy number variation was assessed.
🤷♀️ Submission of maf files is OPTIONAL.