Skip to content

LINCS dataset

Michael Bornholdt edited this page Jun 22, 2021 · 2 revisions

LINCS dataset

Github

https://github.com/broadinstitute/lincs-cell-painting

Level 1-3 data. Different Normilizations: https://github.com/broadinstitute/lincs-cell-painting/tree/master/profiles/2016_04_01_a549_48hr_batch1

Level 5 consensus data: https://github.com/broadinstitute/lincs-cell-painting/tree/master/consensus/2016_04_01_a549_48hr_batch1

Metadata

Broad sample ID, pertubation, MOA mapping: https://github.com/broadinstitute/lincs-cell-painting/blob/master/metadata/moa/repurposing_info_external_moa_map_resolved.tsv

Plate map Log (ie which plate map has which perturbation in which well: https://github.com/broadinstitute/lincs-cell-painting/tree/master/metadata/platemaps/2016_04_01_a549_48hr_batch1/platemap

Plate code, Plate map, Batch mapping: https://github.com/broadinstitute/lincs-cell-painting/blob/master/metadata/platemaps/2016_04_01_a549_48hr_batch1/barcode_platemap.csv

Cell centers

Cell centers to the LINCS data can be found on the DGX. Those however are from the Unet and may be different from what CellProfiler/Cytominer is using.

The extracted locations can be found on S3. backup_locations holds the large csv that are directly extracted from the SQLite files (from CellProfiler output) and locations holds the csv locations that DeepProfiler needs. https://imaging-platform.s3.us-east-1.amazonaws.com/projects/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad/workspace/backend/2016_04_01_a549_48hr_batch1 s3://imaging-platform/projects/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad/workspace/deep_learning/locations/

These can be extracted via the https://github.com/cytomining/DeepProfiler/commit/accd19c3765bbc44d9307bd6d10522b469b3aab7 script from Juan

The extraction scripts used to generate these files can be found in this repo under pre-trained/data

BEWARE:

  1. The filenames of the location files are Plate/Well-Site-Nuclei.csv for the input (e.g. SQ0014812/B02-4-Nuclei.csv)
  2. The output filenames (the profiles per site) are PLate/Well_Site.npz (e.g. SQ0014812/B02_4.npz)
  3. The image size in my current experiment is 1080x1080. This may change so always check the size of your images and check the values in your location files. Plot them to be sure that they are correct!
  4. The filenames of the location files on the DGX are incorrect, they have the same Nomenklatur as the images.
  5. Sometimes location files or images are just missing or empty. This is the case for: SQ00015208/B22-5, ...
Clone this wiki locally