This page outlines raw data and the subsequent processing for use within the Cancer Dependency Map at Sanger.
|Dataset||Origin||Data Type||Model Type||Details||Link|
|RNA-Seq||Sanger||BAM||Cell Line||Illumina HiSeq 2000||EGAS00001000828|
|RNA-Seq||Broad||BAM||Cell Line||Illumina HiSeq 2000 or HiSeq 2500||PRJNA169425|
|RNA-Seq||Sanger||BAM||Organoid||Illumina HiSeq 4000||To be published.|
Descriptions of how the raw data was processed including algorithms and filtering. Processed datasets can be downloaded here, the active dataset can also be accessed using the DepMap web resources and API.
RNA-Seq (Cell Lines)
RNA-seq data collated from the Wellcome Sanger Institute and the Broad Institute (Garcia-Alonso et al.,. 2018) were processed using the iRAP pipeline (Fonseca et al., 2014). The original datasets for each institute are available for download with read counts and FPKM (fragments per kilobase million) values.
Data presented through the API and Cell Model Passports website combines the Sanger and Broad datasets. Where cell models have been screened at both institutes, the Sanger data has been selected for the merged dataset. Separate files for both Sanger and Broad data are available for download. The data contains read counts, FPKM and also TPM (transcripts per million) values.
Paired-end transcriptome reads were quality filtered and mapped to GRCh38 (Ensembl build 98) using STAR-v2.5.0c (Dobin et al., 2013) with a standard set of parameters (https://github.com/cancerit/cgpRna). Resulting .bam files were processed to obtain per gene read count data using HTSeq 0.7.2 (Anders et al., 2015). In addition Transcripts Per Million (TPM) values were calculated using the count and transcript length data.
Experimental Method (Cell Lines - Sanger Data)
For sequencing performed at the Sanger Institute, cell line pellets were collected during exponential growth in RPMI or Dulbecco’s Modified Eagle’s Medium/F12 and were lysed with TRIzol (Life Technologies) and stored at −70 °C. Following chloroform extraction, total RNA was isolated using the RNeasy Mini Kit (Qiagen). DNAse digestion was followed by the RNAClean Kit (Agencourt Bioscience). RNA integrity was confirmed on a Bioanalyzer 2100 (Agilent Technologies) prior to labeling using 3′ IVT Express (Affymetrix). Sequence libraries were prepared in an automated fashion on the Agilent Bravo platform using the stranded mRNA Library Prep Kit from KAPA Biosystems. Processing steps were unchanged from those specified in the KAPA manual, except for use of an in-house indexing set.
Publication reference: Picco, G., Chen, E.D., Alonso, L.G. et al. Functional linkage of gene fusions to cancer cell fitness assessed by pharmacological and CRISPR-Cas9 screening. Nature Communications, (2019).