#
Accessing HTAN open access data stored in Synapse via SB-CGC
HTAN’s open-access data, including most datasets, is hosted on Synapse and can be loaded into the SevenBridges Cancer Genomics Cloud using the following instructions. Note that open-access Level 2 images are available separately through the Cancer Data Service (CDS).
#
Video walkthrough
Watch video walkthrough on Zoom
#
Step-by-Step Guide
#
1. Start a Data Studio Instance
- Go to CancerGenomicsCloud.org.
- Log in and launch a Data Studio instance with JupyterLab.
#
2. Install and Configure Synapse Client
- Open the Terminal in JupyterLab.
- Install the Synapse client (if not already installed) by typing:
pip install synapseclient
- Obtain a Personal Access Token:
- Log into Synapse.org.
- Go to Account Settings > Personal Access Tokens.
- Create a new token with “View and Download” permissions and copy the token.
- Configure Synapse client with the Token:
- In the JupyterLab Terminal, type:
synapse config
- When prompted, enter your Synapse username (optional) and paste the token.
- In the JupyterLab Terminal, type:
#
3. Select Files from the HTAN Data Portal
- Visit the HTAN Data Portal and find the data files you need (e.g., single-cell RNA-seq data in H5AD format).
- Click the purple “Download selected files” button for the chosen files.
- Copy the download commands (e.g.,
synapse get syn1234
).
#
4. Download Files into the Data Studio Instance
- Paste the copied download commands into the JupyterLab Terminal.
- Verify that the files are in the workspace directory. You can now use these files within notebooks in the Data Studio instance.
#
5. Move Files to Output Folder
- Move files to the
output-files
directory to ensure they sync with your CGC project:cp <filename> ../output-files
#
6. Stop Data Studio Instance
- Stop the Data Studio instance to trigger synchronization.
- Check your project files on CGC; the downloaded files should now be available for further use.