This page outlines raw data and the subsequent processing for use within the Cancer Dependency Map at Sanger.

Raw Data

Dataset Origin Data Type Model Type Details Link
RNA-Seq Sanger BAM Cell Line Illumina HiSeq 2000 EGAD00001001357
RNA-Seq CGHub N/A Cell Line Illumina HiSeq 2000/2500 N/A*

*Data published on CGHub has since been removed.

Processed Data

Descriptions of how the raw data was processed including algorithms and filtering. Processed datasets can be downloaded here, the active dataset can also be accessed using the DepMap web resources and API.

Fusion Data

Summary: Gene fusions were systematically identified through analysis of RNA-seq data to define fusion transcripts in the cancer cell lines. RNA-seq data for 587 cell lines was obtained from the Cancer Genome Hub (CGHub) and 447 cell lines were sequenced at the Sanger Institute. In order to improve the accuracy of fusion transcript calling, three different algorithms were used, deFuse, TopHat-Fusion, and STAR-Fusion, across all samples.

For 23 cell lines, sequence was obtained from both CGHub and Sanger Institute to allow comparison of the output based on the sequence from the two studies. Where replicated datasets were available, only fusions called from Sanger Institute sequencing data have been incorporated.

Generation of RNA-Seq Data (Sanger): For sequencing performed at the Sanger Institute, cell line pellets were collected during exponential growth in RPMI or Dulbecco’s Modified Eagle’s Medium/F12 and were lysed with TRIzol (Life Technologies) and stored at −70 °C. Following chloroform extraction, total RNA was isolated using the RNeasy Mini Kit (Qiagen). DNAse digestion was followed by the RNAClean Kit (Agencourt Bioscience). RNA integrity was confirmed on a Bioanalyzer 2100 (Agilent Technologies) prior to labeling using 3′ IVT Express (Affymetrix).  Sequence libraries were prepared in an automated fashion on the Agilent Bravo platform using the stranded mRNA Library Prep Kit from KAPA Biosystems. Processing steps were unchanged from those specified in the KAPA manual, except for use of an in-house indexing set.

Processing & Filtering: Three publicly available gene fusion detection algorithms were used (TopHat-Fusion (v2.1.0), STAR-Fusion (v2.5.0), and deFuse (v0.7.0)) as described in GitHub (https://github.com/cancerit/cgpRna/blob/dev/README.md). From the output of the three distinct fusion detection algorithms, only fusions that were called with four or more reads that align directly across the breakpoint were taken forward. Fusions were also required to be called by at least two different algorithms. Fusions identified from the analysis of 245 non-neoplastic samples downloaded from GTEx53 were then removed from the dataset.

The frame of fusion transcripts was predicted using the GRASS algorithm that is built into the fusion-calling pipeline (https://github.com/cancerit/cgpRna/blob/dev/README.md/https://github.com/cancerit/grass).

In Patients & In COSMIC: Fusions curated by COSMIC and those identified in patients ( Gao, Q. et al.) have been flagged.

Publication reference: Picco, G., Chen, E.D., Alonso, L.G. et al. Functional linkage of gene fusions to cancer cell fitness assessed by pharmacological and CRISPR-Cas9 screening. Nature Communications, (2019).