Example Pipeline Tutorial#
The goal of this example pipeline is to get the user familiar with working with napari-ndev
for batch processing and reproducibility (view Image Utilities
and Workflow Widget
). In addition, this example pipeline thoroughly explains the Measure Widget
, since this is a shared use across many pipelines.
This Example Pipeline does not cover how napari-ndev
is used for high-throughput annotations, the machine learning tools (APOC Widget
), and designing your own workflows. This information will instead be covered in the interactive tutorials that follow.
Image Utilities#
We are going to start with the Image Utilities
widget in order to concatenate the CellPainting images. This will show a common use of the Image Utilities plugin, wherein various file formats can be managed and saved in to a common OME-TIFF format, including channel names and physical pixel scaling.
Choose Directory
selects where images will be saved.Select files
individual or multiple files can be selected. Select the first 5 images (representing the 5 channels of 1 image).-
Metadata
dropdown. We will add in names to save the channels with, according to information that is useful. This could be the fluorophore (e.g. Hoescht 33342) or other identifying information (e.g. nuclei).Channel Name(s)
: copy and paste['H33342', 'conA', 'SYTO14', 'WGA_Phall', 'MitoTDR']
. The format you want to use is a list[]
of strings'a','b','etc.'
Scale, ZYX
. Set Y and X to0.656
. Z will be ignored since images are 2D.
-
Batch Concat.
Pressing this button will iterate through all files in the folder, selecting them in groups of 5 (i.e. the number of original files selected) and then saving them with the above parameters.
Investigate the images#
If you want to investigate the raw images press Open File(s)
this will open the original images with their known scale (1,1,1)
. Each image will open as grayscale, and will not be layered.
Now, investigate your concatenated images. Go to Select Files
and find the folder ConcatenatedImages
inside the Choose Directory
previously chosen. Select the first image and Open File(s)
. This time, the images will be open to the scale we set (0,0.656,0.656)
and with a default layering and pseudo-coloring. This is how all images get passed down throughout the plugin.
Example workflow#
Once images are in a format that is helpful for analysis, we can proceed with other widgets. This does mean that some images do not need to be processed with the Image Utilities
Widget; for example, some microscopes properly incorporate scale and channel names into the image metadata. For this tutorial, we are going to use the Workflow Widget
to pre-process, segment, and label features of the image with a pre-made custom workflow file (see cellpainting\scripting_workflow.ipynb
to see how). The intent of the Workflow Widget
is to easily reproduce This custom workflow was designed initially with the napari-assistant
which will be explored further in the following tutorial sections.
The goal for this workflow is to segment the nucleus, cell area (based on a voronoi tessellation of the nuclei), cytoplasm (cell area - nucleus), and the nucleoli. We will later measure the properties of these objects using the Measure Widget
.
Using the Workflow Widget for Batch Processing#
Image Directory
choose theConcatenatedImages
found in the previous parent folder.\Result Directory
create a folder to save the output images into.Workflow File
navigate toscripted_cellpainting_workflow.yaml
Now, you will now see the UI automatically update to show the roots
(input images of the Workflow file). Furthermore, these roots
will be populated by the channel names of the images in the chosen directory. In this workflow there are three root images required: (1) Root 0: cyto_membrane
is WGA_Phall
, (2) Root 1: nuclei
is H33342
, and (3) Root 2: nucleoli
is SYTO14
.
Next, switch to the Tasks
tab. In this tab, the leaves
or workflow tasks that sit at the terminals of task tree are automatically selected. However, we are also interested in visualizing the nuclei. So, hold control or command on your keyboard and also click nuclei-labels
to add this task to the batch workflow. If all workflow tasks you are interested in are represented as leaves
than you can even skip this tab!
Finally, press Batch Workflow
. The Image Directory
will be iterated through with the workflow. The Progress Bar will show updates and a log file will be saved to show the input parameters and progress of the batch processing, including any possible errors.
Workflow notes#
Just as we selected an additional task for the workflow, any number of tasks can be acquired from the workflow and if Keep Original Images
is checked, these will also be saved in the resulting batch processed images. As such, the workflow widget can also be used to easily visualize intermediate steps of the Workflow to investigate how something was achieved and share that information. Below, napari is showing every original channel and every task in this workflow as a grid in napari; all of this is saved into one single file.
Coming Soon: the ability to use layers in the workflow as roots to do single image Workflows and adding them into napari immediately!
Measure Widget#
The Measure Widget
provides the ability to measure images in batch, group important information, and even utilize metadata to map sample treatments and conditions. This widget is the newest addition the napari-ndev
, in part because it has taken me a long time to conceptualize how to make image measurements accessible in batch, so I am particularly looking for usage feedback. For detailed usage instructions see the Measure Widget
Example.
How measuring in Python generally works#
It is often most helpful to represent a segmented image as 'labels'. Labels (including the Labels Layer
in napari) have a pseudocolor scheme where each label (i.e. object) has a specific value, and that value is represented by a color. When these labels are then measured, each label object is measured independently and represented in one row. With few objects of interest in low-throughput processes, this can make sense, but, a label image with 100 objects will result in a spreadsheet with 100 rows. Accordingly, even measuring 10 images with 100 objects each leads to 1000 rows. To many scientists, these are both small object numbers and small image numbers, so you can imagine how quickly and easily datasets can be in the hundreds of thousands or millions of rows.
Furthermore, many many properties of images can be labeled, from area (which is scaled properly throughout this plugin to real units), to perimeter, to solidity, to sphericity. Thus, measuring label properties in Python generally requires knowledge of python to make sense of this long multi-variate data. Especially when it comes to grouping data by treatments or doing counts or other aggregating functions on any measurement of the labels.
The Measure Widget
seeks to address the most common usability cases for high-throughput analyses by providing human readable outputs. Furthermore, treatment metadata mapping can easily be shared from a more advanced researcher to a novice, for reproducibility of more involved analyses.
Initial Batch Measurement with the Widget#
Label Directory
: Select the directory containing the Labels you desire to measure -- in this case choose the directory from theWorkflow Widget
. This image file can contain any number of labels (or non-labels, but those should not be measured). Channels will populate bothLabel Image
select andIntensity Images
.Image Directory
: An Optional directory -- choose theConcatenatedImages
directory to populate the original channel images to theIntensity Images
select box.Region Directory
: Another Optional directory intended for 'ROI'/Region of Interest labels -- not used for this pipeline.Label image
: Using multi-selection, selectcell-labels
,cyto-labels
, andnuclei-labels
. We will measure each object in each image.Intensity images
: Using multi-selection, selectnucleoli-labels
(to measure the number of nucleoli inside the label),conA
andmitoTDR
(to measure the underlying intensity of the channel on the label).Region Props
. This is a list of the measurements for each label. For this example, at least selectlabel
,area
,intensity_mean
andsolidity
.label
is the identity, and is recommended to always be checked. Otherwise you can measure shape features like area, eccentricity, and solidity or you can measure intensity features like the mean, max, min, etc. Note, that measuring something like theintensity max
of an intensity image that represents an ROI serves as a means to identify if it is inside (i.e. the value of the ROI) or outside (i.e. 0) the region.- At this point, you could hit the Measure button and it will measure all label channels in each image in batch. However, for this example we also want to add some identification and treatment data to the output. This example data comes from wells with no treatment, so we will generate some ourselves to explain the concept, but this should be straightforward enough to apply to your own data. To use the
ID Regex
andTx Map
tags we use dictionaries of key: value pairs where the key becomes the column name, and the value contains the regular expression to search for. ID Regex
tab. This dictionary extracts information from the filename with regular expression patterns. These data all come fromplate1
but if we had multiple plates we could extract the plate number with the following regexr'(plate\d{1,2})-'
whatever is inside the()
is considered the 'group' that gets returned. In this case we can provide the dictionary to return the identifying number of the plate and the well position. We specifically need the well position in order to map it to the treatment map. Copy and paste this intoID Regex
{
'plate': r'plate(\d{1,3})_',
'well': r'_(\w+?)_site',
'site': r'_site(\d{1,3})_',
}
Tx Map
tab. This dictionary maps well positions to an overall platemap. This time, the key remains the column identification, but then another dictionary is used to map the treatments inside, see below for example. The platemap is expected to be of standard configuration, but can include wells that are not imaged. First press 'Update Treatment ID Choices' to use the previous regex for Well ID. Selectwell
forTreatment ID
and384
forNumber of Wells
. We are going to pretend the platemap has the following treatments:
{
'media': {
'HBSS': ['A1:C24'],
'DMEM': ['D1:F24'],
},
'treatment': {
'control': ['A12:P14'],
'drug': ['A15:P18'],
}
}
- Press the
Measure
button! We have all the options set to richly annotate our data with identifying info and treatments... in batch!
Grouping the data#
Navigate to the Output Directory
and find the measure_props...csv
for your data! You can see each measure for each label, but it's hard to read interpret this way.
label_name | id | site | well | plate | label | area | intensity_mean-nucleoli-labels | intensity_mean-conA | intensity_mean-MitoTDR | solidity | row | column | media | treatment |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
cell-labels | plate1_A14_site1_Ch1__0__plate1_A14_site1_Ch1 | 1 | A14 | 1 | 1 | 1469.167104 | 0.7454598711189221 | 256.04100761570004 | 295.11511423550087 | 0.7832071576049552 | A | 14 | HBSS | control |
cell-labels | plate1_A14_site1_Ch1__0__plate1_A14_site1_Ch1 | 1 | A14 | 1 | 2 | 505.6448000000001 | 0.089361702 | 407.0757446808511 | 389.1506382978723 | 0.9767248545303407 | A | 14 | HBSS | control |
cell-labels | plate1_A14_site1_Ch1__0__plate1_A14_site1_Ch1 | 1 | A14 | 1 | 3 | 336.092416 | 1.2586427656850192 | 233.87580025608196 | 326.2509603072983 | 0.9455205811138013 | A | 14 | HBSS | control |
Instead, we want to group the data by useful metrics. Navigate to the Grouping
tab. Select the output measure_props...csv
for Measured Data Path
; the selection is interpreted to fill the remaining information in the tab. If we include the id
name in our grouping column, then it will summarize each individual image. If you then also select other identifying information, like site, well, plate, etc. then this information will be kept in the summarized file. Ultimately, data will be grouped by the most fine-grained group (in this case, each image, aka the id
). So, if you wanted to just know differences between treatments you could do group only by treatment
; caution this hides your raw data and just reduces the information to the aggregate function.
For this pipeline, we are going to group by: id, label_name (which label channel it is), site, well, plate, media, and treatment. This will summarize the data by id
at the finest (each file), but preserve all that metadata. Then, we keep Count Column
set to label
so that it counts the number of each object in the image. Finally, we are going to aggregate other measured features. Select Aggregation Columns
: intensity_mean-conA
(to measure the intensity of ER) and intensity_mean_MitoTDR
(mitochondria), and area
(to compare the size of each object). Then observe how the there are multiple Aggregation Functions
that by default is set to mean
.
Next, check
Pivot Wider
. This will place each individual label channel in the columns, rather than replicating in rows. This is generally more human-readable and familiar for non-coding statistical work.
Finally, press Group Measurements
button! You now have the output dataset.
Make observations#
One of the best parts of summarizing your data is quickly checking for quality control. Investigate measure_props...grouped.csv
- Do we get the same number of rows that we would expect? (hint, it should be the number of images, with how we grouped)
- Are there the same number of nuclei as cytoplasms in each image? Should there be?
- Is the intensity of a certain marker localized more to the cytoplasm or the nucleus?
- Is the are of the whole cell larger than the cytoplasm and nucleus alone? Does nucleus + cytoplasm = cell?
id | site | well | plate | media | treatment | label_count | label_count.1 | label_count.2 | area_mean | area_mean.1 | area_mean.2 | intensity_mean-MitoTDR_mean | intensity_mean-MitoTDR_mean.1 | intensity_mean-MitoTDR_mean.2 | intensity_mean-conA_mean | intensity_mean-conA_mean.1 | intensity_mean-conA_mean.2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
nan | nan | nan | nan | nan | nan | cell-labels | cyto-labels | nuclei-labels | cell-labels | cyto-labels | nuclei-labels | cell-labels | cyto-labels | nuclei-labels | cell-labels | cyto-labels | nuclei-labels |
plate1_A14_site1_Ch1__0__plate1_A14_site1_Ch1 | 1 | A14 | 1 | HBSS | control | 79.0 | 79.0 | 79.0 | 1458.3978094177216 | 1214.6478728101267 | 243.74993660759498 | 325.2266947462635 | 307.7767026467916 | 412.45047953549573 | 283.3011895008876 | 264.05000585134076 | 377.55476585283094 |
plate1_A14_site2_Ch1__0__plate1_A14_site2_Ch1 | 2 | A14 | 1 | HBSS | control | 92.0 | 92.0 | 92.0 | 1270.5623624347827 | 1015.8361933913045 | 254.74487930434788 | 335.0570967957602 | 319.5738507170695 | 405.50202331367353 | 273.7174073017615 | 256.90458122232934 | 349.3471722687742 |
plate1_B13_site1_Ch1__0__plate1_B13_site1_Ch1 | 1 | B13 | 1 | HBSS | control | 59.0 | 59.0 | 59.0 | 1805.06988040678 | 1502.4123834576274 | 302.6574969491526 | 308.182553594441 | 295.40835758750995 | 377.9935857420644 | 292.5031657524073 | 274.4186976700165 | 391.6966321932782 |
plate1_B13_site2_Ch1__0__plate1_B13_site2_Ch1 | 2 | B13 | 1 | HBSS | control | 71.0 | 71.0 | 71.0 | 1558.5194041690143 | 1291.1595267605635 | 267.3598774084507 | 299.17016394582697 | 285.23325714705294 | 363.05445180651634 | 305.7340276405519 | 284.01603790001866 | 406.23380996542903 |
plate1_C12_site1_Ch1__0__plate1_C12_site1_Ch1 | 1 | C12 | 1 | HBSS | control | 127.0 | 127.0 | 127.0 | 1203.229621417323 | 944.5570237480316 | 258.67259766929135 | 343.52778951543183 | 329.7049438466621 | 398.5256715321834 | 283.1283666399538 | 270.7264526431119 | 333.90274822634825 |
plate1_C12_site2_Ch1__0__plate1_C12_site2_Ch1 | 2 | C12 | 1 | HBSS | control | 124.0 | 124.0 | 124.0 | 1166.5923096774195 | 921.3840805161292 | 245.20822916129035 | 348.5899736336976 | 331.9816108022042 | 415.9173276097387 | 287.77845139732943 | 273.9409912708669 | 343.3100275440203 |
plate1_D16_site1_Ch1__0__plate1_D16_site1_Ch1 | 1 | D16 | 1 | DMEM | drug | 137.0 | 137.0 | 137.0 | 1136.5299405547446 | 848.7231084379563 | 287.80683211678837 | 324.09437073563026 | 314.2939434267824 | 361.2811199284404 | 341.1003129872079 | 326.6836244548794 | 397.2258103183944 |
plate1_D16_site2_Ch1__0__plate1_D16_site2_Ch1 | 2 | D16 | 1 | DMEM | drug | 126.0 | 126.0 | 126.0 | 1220.6002488888892 | 932.7191263492065 | 287.88112253968256 | 305.43130665340715 | 298.09709415581307 | 335.4623238823955 | 334.3002097571851 | 323.0418712295005 | 381.9689568813829 |
plate1_E18_site1_Ch1__0__plate1_E18_site1_Ch1 | 1 | E18 | 1 | DMEM | drug | 147.0 | 147.0 | 147.0 | 1054.613018122449 | 800.8962803809525 | 253.71673774149664 | 345.9989260804927 | 335.50803523526207 | 383.15517282809606 | 296.09051609563323 | 283.99028924506246 | 339.21053128779465 |
plate1_E18_site2_Ch1__0__plate1_E18_site2_Ch1 | 2 | E18 | 1 | DMEM | drug | 174.0 | 173.0 | 174.0 | 894.9381222988507 | 649.1033999537573 | 249.56520165517242 | 351.5135374204032 | 347.28641992275476 | 370.0639647086021 | 282.8140508927973 | 276.74482673528496 | 304.67453165821416 |