Proteomics

This page outlines the raw data and the subsequent processing for the proteomics dataset (ProCan-DepMapSanger).


Raw Data

Dataset Origin Data Type Link
ProCan-DepMapSanger CMRI & Sanger Raw .wiff files. PXD030304

Processed Data

Descriptions of how the raw data was processed including algorithms and filtering. Processed datasets can be downloaded here, the active dataset can also be accessed using the Cell Model Passports web resources and API.

The different analytical steps during data acquisition and processing are briefly summarised below and detailed information can be found in the corresponding publication.

Protein Intensity User Guidance:

The mass spectrometry data provides relative rather than absolute protein quantitation. This enables comparisons between the intensity of a particular protein across different cell lines, but it is not valid to compare different protein intensities within a single cell line.


Spectral library and DIA-MS data processing

An in silico spectral library was created using DIA-NN (version 1.8) (Demichev et al. 2020) for the canonical human proteome (Uniprot Release 2021_03; 20,612 sequences), along with retention time peptides and commonly occurring microbial and viral sequences. The final spectral library contained a total of 12,487 proteins and 144,578 precursors. DIA-NN (version 1.8) was used to process the MS data using this spectral library, implemented using RT-dependent normalisation. DIA-NN output data were filtered to retain only precursors from proteotypic peptides with Global.Q.Value ≤ 0.01. These precursors were then used for protein quantification by maxLFQ (Cox et al. 2014), implemented using the DiaNN R Package and with default parameters.

Data was then log2-transformed. MS runs across replicates of each cell line were combined by calculating the geometric mean. The final dataset, termed ProCan-DepMapSanger, was derived from 6,864 mass spectrometry runs covering 949 cell lines and quantifying a total of 8,498 proteins.

Protein quantifications at the replicate level, and the number of peptides identified per protein in each MS run are available in figshare https://doi.org/10.6084/m9.figshare.19345397.

Z-Score is calculated with reference to each protein as measured across the entire cell line panel.


Experimental Method

Data Independent Acquisition (DIA)-Mass Spectrometry (MS)

Three cell pellets were analysed for each of the 949 cell lines. The cell pellets were processed using Accelerated Barocycler Lysis and Extraction (ABLE) protocol with minor modifications (Lucas et al., 2019). Each of the three replicates was then injected on two of six different SCIEX™ 6600 TripleTOF® mass spectrometers (Sciex) coupled to Eksigent nanoLC 425 high-performance liquid chromatography (HPLC) systems, housed in a single laboratory, ProCan® in Westmead, Australia. This system was run in sequential windowed acquisition of all theoretical fragment ion spectra (SWATH™) mode using 100 variable isolation windows.

Publication reference: Gonçalves, E., Poulos, R.C., Cai, Z., et al., Pan-cancer proteomic map of 949 human cell lines, Cancer Cell, (2022)