# Accessing CDS Data in SB-CGC

Seven Bridges Cancer Genomics Cloud (SB-CGC) is a cloud platform for the analysis and storage of large cancer datasets. There are two mechanisms for transferring HTAN data from the NCI's Cancer Data Service (CDS) to SB-CGC:

  1. Direct export from the CDS Portal.
  2. Export via a Data Repository Service (DRS) Manifest.

HTAN Data in CDS includes:

  1. Open Access Level 2 Imaging Files; and
  2. Access-controlled Sequencing Data (e.g. fastq and BAM files)

# Imaging Files

# Direct Export

In order to access HTAN imaging data within the CDS Portal, navigate to the portal in a web browser and click on the Explore CDS Data button on the landing page.

1

 

On the Data Explorer page, expand the STUDY section on the left sidebar, scroll down, and check the box next to Human Tumor Atlas (HTAN) imaging data.

CDS Portal: Accessing HTAN Imaging Data
CDS Portal: Accessing HTAN Imaging Data

This action will change the summary panel to reflect selecting HTAN data only.

Scroll down, or click on the Collapse View tab on the upper right just below the query summary line in order to see the tabulated view of all of the participants, samples or files in HTAN.

CDS Portal: Accessing HTAN Imaging Data
CDS Portal: Accessing HTAN Imaging Data

Click on the Add All Files button, or select the check boxes next to all Participants, Samples or Files for a subselection and then click on the Add Selected button. This action will update your cart icon in the upper right corner.

CDS Portal: Accessing HTAN Imaging Data
CDS Portal: Accessing HTAN Imaging Data

Clicking on the cart icon, will bring up a list of the selected files. Expand the Available Export Options drop down menu and select Export to Cancer Genomics Cloud.

CDS Portal: Adding Data to Cart
CDS Portal: Adding Data to Cart

Follow the prompts to log in to CGC. Then select a Destination project, check the box to agree to CGC terms and import the data.

# DRS Manifest Files

DRS manifests are CSV files which list the files you would like to obtain. They require at minimum the name and drs_uri of each file of interest. For data transfer using a DRS Manifest, there are two main steps:

  1. Generate the DRS Manifest
  2. Import the data to CGC

# 1. Generate the DRS Manifest

For HTAN data, DRS Manifests can be generated from three different locations:

  • CDS Portal
  • HTAN Data Portal
  • Google BigQuery
# Generating a DRS Manifest from the CDS Portal

Follow the directions for Direct Export of files from CDS. In the cart, click on the Download Manifest button on the upper right to download a CSV-formated (Excel compatible) copy of your file list.

# Generating a DRS Manifest from the HTAN Data Portal

From the HTAN Data Portal, click CDS/SB-CGC (Open Access) under the Data Access filter.

HTAN Portal: Accessing Imaging Data in CDS
HTAN Portal: Accessing Imaging Data in CDS

Navigate to the Files tab, check the box next to Filename in upper left, and then click Download selected files.

HTAN Portal: Selecting Imaging Files
HTAN Portal: Selecting Imaging Files

Click Download Manifest, which will download a local file called cds_manifest.csv.

HTAN Portal: Download DRS Manifest
HTAN Portal: Download DRS Manifest

# Generating a DRS Manifest from Google BigQuery

HTAN metadata and a mapping of HTAN Data File IDs to CDS DRS URIs are available as Google BigQuery tables via the Institute for Systems Biology Cancer Gateway in the Cloud (ISB-CGC) (see Google BigQuery). These tables can be used to subset data to a cohort of interest, and obtain DRS URIs of files to access.

For a step-by-step guide on how to generate a DRS manifest file using Google BigQuery, please see the Python notebook Creating_CDS_Data_Import_Manifests_Using_BQ.ipynb.

# 2. Import the data into CGC

Once you have your manifest, follow the instructions on SB-CGC's Import from a DRS server documentation page to import data from a manifest file.

# Sequencing Data

The CDS Portal, within NCI's Cancer Research Data Commons (CRDC), provides an interface to filter and select data from a variety of NCI programs, including controlled-access, primary sequence data from the Human Tumor Atlas Network (HTAN).

The directions for accessing sequencing data on CDS are similar to those for Level 2 Imaging Data Access, including Direct Export from CDS to CGC and importing data using a Data Repository Service (DRS) Manifest. Please follow the Level 2 Imaging Data Access directions to access sequencing data, noting the following changes:

  1. For Direct Export or Generating a DRS Manifest from CDS, choose Human Tumor Atlas (HTAN) primary sequence data on the STUDY section of the left hand sidebar instead of Human Tumor Atlas (HTAN) imaging data.

Figure 3

 

  1. To generate a DRS Manifest from the HTAN Data Portal, click CDS/SB-CGC (dbGaP) under the Data Access filter instead of CDS/SB-CGC (Open Access).

HTAN Portal: Accessing Genomic Data in CDS
HTAN Portal: Accessing Genomic Data in CDS