# Using the HTAN Data Portal

The HTAN Data Portal provides an interactive interface with which to explore HTAN data. It provides mechanisms to:

  • download metadata;
  • explore available data; and
  • download assay data.

To get started, go to: https://data.humantumoratlas.org/explore.

To orient you to the HTAN Data Portal, consider the example of accessing precancerous polyp data from Vanderbilt University, as described in their recent Cell publication.

By default, HTAN data is organized by research center:

HTAN Portal: Home Page
HTAN Portal: Home Page

If you scroll down on the page, you will see Vanderbilt University:

HTAN Portal: Vanderbilt Atlas
HTAN Portal: Vanderbilt Atlas

As of this writing, you can see that the Vanderbilt Colon Atlas project has 90 cases and 193 biospecimens.

# Downloading Metadata

Once you have identified the project of interest, you can click the download metadata button:

HTAN Portal: Download Metadata
HTAN Portal: Download Metadata

You will then be prompted with a dialog box of all metadata associated with the specified project. For example:

HTAN Portal: Metadata Table
HTAN Portal: Metadata Table

Behind the scenes, HTAN leverages the Synapse Platform created and maintained by Sage Bionetworks. Each piece of HTAN data is automatically assigned a unique Synapse identifier, such as syn25010909. In the screenshot above, you can see that the Vanderbilt project has multiple metadata files, each associated with a unique Synapse identifier.

If you click on any of the Synapse links above, you can immediately download a comma separated value (CSV) file associated with the metadata category. There is no need to create a Synapse account or log into Synapse. For example, here we have download the Vanderbilt biospecimen file and loaded it into Excel:

HTAN Tabular Data within Excel
HTAN Tabular Data within Excel

Once you have downloaded metadata files, you can parse them in your favorite programming language, such as R or Python. To understand the individual columns within each metadata file, please refer to the HTAN Data Model.

# Exploring Available Data

The HTAN Data Portal provides a unified interface for filtering and exploring HTAN data sets. Each filter is available at the top of the page:

HTAN Portal:  Filters
HTAN Portal: Filters

To get started, you can click the Atlas pull-down menu, and select the Vanderbilt HTAN center:

HTAN Portal:  Filter by HTAN Center
HTAN Portal: Filter by HTAN Center

Your selection will now be reflected in the user interface:

HTAN Portal:  Filter by Vanderbilt University
HTAN Portal: Filter by Vanderbilt University

If you click the Cases or Biospecimens tabs, you can browse available metadata. Clicking the Files tab will take you to an interactive table listing all files available for download.

HTAN Portal:  Files Tab
HTAN Portal: Files Tab

At this point, the Files tab is likely to contain hundreds of files, and may be difficult to navigate. You can further refine the files table by clicking on the Assay Type or File Type filters. This will trigger pop-up windows that describe the assay and file type categories available within the Vanderbilt project. For example, if you click Assay Type you will see:

HTAN Portal:  Filter by Assay Type
HTAN Portal: Filter by Assay Type

Data that is available within the Vanderbilt project is set to bold. You can therefore see that the Vanderbilt project has Bulk DNA, H&E Images, Multiplex ImmunoFluorescence images and Single Cell RNA Seq Data.

If you click scRNA-seq, the file table will automatically update. You can then select the File Type filter to drill-down even further:

HTAN Portal:  Filter by File Type
HTAN Portal: Filter by File Type

Clicking Level 4 here will now filter the File table to only include Level 4 sequencing data that consists of Single Cell RNA Seq h5ad formatted files:

HTAN Portal:  Multiple Filters Enabled
HTAN Portal: Multiple Filters Enabled

Note that you can remove any existing filters by clicking on any of the “chips” in the page header. For example, if you want to remove the Level 4 filter, just click the Level 4 chip:

HTAN Portal:  Removing Filters
HTAN Portal: Removing Filters

Clicking View Details on any of these files will pop open a metadata table. For example:

HTAN Portal:  Metadata Details
HTAN Portal: Metadata Details

# Downloading Assay Data

Once you have specified your filter criteria, the Files tab will display all matching files. At this point, you may see two types of files:

  • Open Access Files (Data Access = Synapse or CDS/SB-CGC (open access)); or
  • Access-Controlled Files (Data Access = dbGAP)

HTAN Data Access Types
HTAN Data Access Types

To download files, select the files you would like, then click on the "Download selected files" button.

Select Files and Download
Select Files and Download

The pop-up window that appears provides the information you need to access the files, including directions for:

  1. accessing Cancer Data Service (CDS) data on SB-CGC;
  2. directly downloading CDS data using the Gen3 Client; and
  3. accessing data available on Synapse via either the Synapse web interface or the Synapse CLI.

Download pop-up window
Download pop-up window

In addition to the modes of data access described in the Download pop-up window, open access data can be:

  • transferred from Synapse to SB-CGC for cloud processing and analysis; or
  • downloaded from CELLxGENE (single cell/single nuclei RNA-seq data in h5ad (AnnData) format).