Skip to content

Example Pipeline Tutorial#

The goal of this example pipeline is to get the user familiar with working with napari-ndev for batch processing and reproducibility (view Image Utilities and Workflow Widget). In addition, this example pipeline thoroughly explains the Measure Widget, since this is a shared use across many pipelines.

This Example Pipeline does not cover how napari-ndev is used for high-throughput annotations, the machine learning tools (APOC Widget), and designing your own workflows. This information will instead be covered in the interactive tutorials that follow.

Image Utilities#

We are going to start with the Image Utilities widget in order to concatenate the CellPainting images. This will show a common use of the Image Utilities plugin, wherein various file formats can be managed and saved in to a common OME-TIFF format, including channel names and physical pixel scaling.

Batch Concatenation

  1. Choose Directory selects where images will be saved.
  2. Select files individual or multiple files can be selected. Select the first 5 images (representing the 5 channels of 1 image).
  3. Metadata dropdown. We will add in names to save the channels with, according to information that is useful. This could be the fluorophore (e.g. Hoescht 33342) or other identifying information (e.g. nuclei).

    1. Channel Name(s): copy and paste ['H33342', 'conA', 'SYTO14', 'WGA_Phall', 'MitoTDR']. The format you want to use is a list [] of strings 'a','b','etc.'
    2. Scale, ZYX. Set Y and X to 0.656. Z will be ignored since images are 2D.
  4. Batch Concat. Pressing this button will iterate through all files in the folder, selecting them in groups of 5 (i.e. the number of original files selected) and then saving them with the above parameters.

Investigate the images#

Image Utilities

If you want to investigate the raw images press Open File(s) this will open the original images with their known scale (1,1,1). Each image will open as grayscale, and will not be layered.

Now, investigate your concatenated images. Go to Select Files and find the folder ConcatenatedImages inside the Choose Directory previously chosen. Select the first image and Open File(s). This time, the images will be open to the scale we set (0,0.656,0.656) and with a default layering and pseudo-coloring. This is how all images get passed down throughout the plugin.

Example workflow#

Once images are in a format that is helpful for analysis, we can proceed with other widgets. This does mean that some images do not need to be processed with the Image Utilities Widget; for example, some microscopes properly incorporate scale and channel names into the image metadata. For this tutorial, we are going to use the Workflow Widget to pre-process, segment, and label features of the image with a pre-made custom workflow file (see cellpainting\scripting_workflow.ipynb to see how). The intent of the Workflow Widget is to easily reproduce This custom workflow was designed initially with the napari-assistant which will be explored further in the following tutorial sections.

The goal for this workflow is to segment the nucleus, cell area (based on a voronoi tessellation of the nuclei), cytoplasm (cell area - nucleus), and the nucleoli. We will later measure the properties of these objects using the Measure Widget.

Workflow Example

Using the Workflow Widget for Batch Processing#

  1. Image Directory choose the ConcatenatedImages found in the previous parent folder.\
  2. Result Directory create a folder to save the output images into.
  3. Workflow File navigate to scripted_cellpainting_workflow.yaml

Now, you will now see the UI automatically update to show the roots (input images of the Workflow file). Furthermore, these roots will be populated by the channel names of the images in the chosen directory. In this workflow there are three root images required: (1) Root 0: cyto_membrane is WGA_Phall, (2) Root 1: nuclei is H33342, and (3) Root 2: nucleoli is SYTO14.

Workflow-roots

Next, switch to the Tasks tab. In this tab, the leaves or workflow tasks that sit at the terminals of task tree are automatically selected. However, we are also interested in visualizing the nuclei. So, hold control or command on your keyboard and also click nuclei-labels to add this task to the batch workflow. If all workflow tasks you are interested in are represented as leaves than you can even skip this tab!

Workflow-tasks

Finally, press Batch Workflow. The Image Directory will be iterated through with the workflow. The Progress Bar will show updates and a log file will be saved to show the input parameters and progress of the batch processing, including any possible errors.

Workflow notes#

Just as we selected an additional task for the workflow, any number of tasks can be acquired from the workflow and if Keep Original Images is checked, these will also be saved in the resulting batch processed images. As such, the workflow widget can also be used to easily visualize intermediate steps of the Workflow to investigate how something was achieved and share that information. Below, napari is showing every original channel and every task in this workflow as a grid in napari; all of this is saved into one single file.

Workflow-all

Coming Soon: the ability to use layers in the workflow as roots to do single image Workflows and adding them into napari immediately!

Measure Widget#

The Measure Widget provides the ability to measure images in batch, group important information, and even utilize metadata to map sample treatments and conditions. This widget is the newest addition the napari-ndev, in part because it has taken me a long time to conceptualize how to make image measurements accessible in batch, so I am particularly looking for usage feedback. For detailed usage instructions see the Measure Widget Example.

How measuring in Python generally works#

It is often most helpful to represent a segmented image as 'labels'. Labels (including the Labels Layer in napari) have a pseudocolor scheme where each label (i.e. object) has a specific value, and that value is represented by a color. When these labels are then measured, each label object is measured independently and represented in one row. With few objects of interest in low-throughput processes, this can make sense, but, a label image with 100 objects will result in a spreadsheet with 100 rows. Accordingly, even measuring 10 images with 100 objects each leads to 1000 rows. To many scientists, these are both small object numbers and small image numbers, so you can imagine how quickly and easily datasets can be in the hundreds of thousands or millions of rows.

Furthermore, many many properties of images can be labeled, from area (which is scaled properly throughout this plugin to real units), to perimeter, to solidity, to sphericity. Thus, measuring label properties in Python generally requires knowledge of python to make sense of this long multi-variate data. Especially when it comes to grouping data by treatments or doing counts or other aggregating functions on any measurement of the labels.

The Measure Widget seeks to address the most common usability cases for high-throughput analyses by providing human readable outputs. Furthermore, treatment metadata mapping can easily be shared from a more advanced researcher to a novice, for reproducibility of more involved analyses.

Initial Batch Measurement with the Widget#

Measure-batch

  1. Label Directory: Select the directory containing the Labels you desire to measure -- in this case choose the directory from the Workflow Widget. This image file can contain any number of labels (or non-labels, but those should not be measured). Channels will populate both Label Image select and Intensity Images.
  2. Image Directory: An Optional directory -- choose the ConcatenatedImages directory to populate the original channel images to the Intensity Images select box.
  3. Region Directory: Another Optional directory intended for 'ROI'/Region of Interest labels -- not used for this pipeline.
  4. Label image: Using multi-selection, select cell-labels, cyto-labels, and nuclei-labels. We will measure each object in each image.
  5. Intensity images: Using multi-selection, select nucleoli-labels (to measure the number of nucleoli inside the label), conA and mitoTDR (to measure the underlying intensity of the channel on the label).
  6. Region Props. This is a list of the measurements for each label. For this example, at least select label, area, intensity_mean and solidity. label is the identity, and is recommended to always be checked. Otherwise you can measure shape features like area, eccentricity, and solidity or you can measure intensity features like the mean, max, min, etc. Note, that measuring something like the intensity max of an intensity image that represents an ROI serves as a means to identify if it is inside (i.e. the value of the ROI) or outside (i.e. 0) the region.
  7. At this point, you could hit the Measure button and it will measure all label channels in each image in batch. However, for this example we also want to add some identification and treatment data to the output. This example data comes from wells with no treatment, so we will generate some ourselves to explain the concept, but this should be straightforward enough to apply to your own data. To use the ID Regex and Tx Map tags we use dictionaries of key: value pairs where the key becomes the column name, and the value contains the regular expression to search for.
  8. ID Regex tab. This dictionary extracts information from the filename with regular expression patterns. These data all come from plate1 but if we had multiple plates we could extract the plate number with the following regex r'(plate\d{1,2})-' whatever is inside the () is considered the 'group' that gets returned. In this case we can provide the dictionary to return the identifying number of the plate and the well position. We specifically need the well position in order to map it to the treatment map. Copy and paste this into ID Regex
{
    'plate': r'plate(\d{1,3})_',
    'well': r'_(\w+?)_site',
    'site': r'_site(\d{1,3})_',
}

Batch-ID-regex

  1. Tx Map tab. This dictionary maps well positions to an overall platemap. This time, the key remains the column identification, but then another dictionary is used to map the treatments inside, see below for example. The platemap is expected to be of standard configuration, but can include wells that are not imaged. First press 'Update Treatment ID Choices' to use the previous regex for Well ID. Select well for Treatment ID and 384 for Number of Wells. We are going to pretend the platemap has the following treatments:
{
    'media': {
        'HBSS': ['A1:C24'],
        'DMEM': ['D1:F24'],
    },
    'treatment': {
        'control': ['A12:P14'],
        'drug': ['A15:P18'],
    }
}

Batch-tx-map

  1. Press the Measure button! We have all the options set to richly annotate our data with identifying info and treatments... in batch!

Grouping the data#

Navigate to the Output Directory and find the measure_props...csv for your data! You can see each measure for each label, but it's hard to read interpret this way.

label_name id site well plate label area intensity_mean-nucleoli-labels intensity_mean-conA intensity_mean-MitoTDR solidity row column media treatment
cell-labels plate1_A14_site1_Ch1__0__plate1_A14_site1_Ch1 1 A14 1 1 1469.167104 0.7454598711189221 256.04100761570004 295.11511423550087 0.7832071576049552 A 14 HBSS control
cell-labels plate1_A14_site1_Ch1__0__plate1_A14_site1_Ch1 1 A14 1 2 505.6448000000001 0.089361702 407.0757446808511 389.1506382978723 0.9767248545303407 A 14 HBSS control
cell-labels plate1_A14_site1_Ch1__0__plate1_A14_site1_Ch1 1 A14 1 3 336.092416 1.2586427656850192 233.87580025608196 326.2509603072983 0.9455205811138013 A 14 HBSS control

Instead, we want to group the data by useful metrics. Navigate to the Grouping tab. Select the output measure_props...csv for Measured Data Path; the selection is interpreted to fill the remaining information in the tab. If we include the id name in our grouping column, then it will summarize each individual image. If you then also select other identifying information, like site, well, plate, etc. then this information will be kept in the summarized file. Ultimately, data will be grouped by the most fine-grained group (in this case, each image, aka the id). So, if you wanted to just know differences between treatments you could do group only by treatment; caution this hides your raw data and just reduces the information to the aggregate function.

For this pipeline, we are going to group by: id, label_name (which label channel it is), site, well, plate, media, and treatment. This will summarize the data by id at the finest (each file), but preserve all that metadata. Then, we keep Count Column set to label so that it counts the number of each object in the image. Finally, we are going to aggregate other measured features. Select Aggregation Columns: intensity_mean-conA (to measure the intensity of ER) and intensity_mean_MitoTDR (mitochondria), and area (to compare the size of each object). Then observe how the there are multiple Aggregation Functions that by default is set to mean.

Next, check Pivot Wider. This will place each individual label channel in the columns, rather than replicating in rows. This is generally more human-readable and familiar for non-coding statistical work.

Finally, press Group Measurements button! You now have the output dataset.

batch-group

Make observations#

One of the best parts of summarizing your data is quickly checking for quality control. Investigate measure_props...grouped.csv

  1. Do we get the same number of rows that we would expect? (hint, it should be the number of images, with how we grouped)
  2. Are there the same number of nuclei as cytoplasms in each image? Should there be?
  3. Is the intensity of a certain marker localized more to the cytoplasm or the nucleus?
  4. Is the are of the whole cell larger than the cytoplasm and nucleus alone? Does nucleus + cytoplasm = cell?
id site well plate media treatment label_count label_count.1 label_count.2 area_mean area_mean.1 area_mean.2 intensity_mean-MitoTDR_mean intensity_mean-MitoTDR_mean.1 intensity_mean-MitoTDR_mean.2 intensity_mean-conA_mean intensity_mean-conA_mean.1 intensity_mean-conA_mean.2
nan nan nan nan nan nan cell-labels cyto-labels nuclei-labels cell-labels cyto-labels nuclei-labels cell-labels cyto-labels nuclei-labels cell-labels cyto-labels nuclei-labels
plate1_A14_site1_Ch1__0__plate1_A14_site1_Ch1 1 A14 1 HBSS control 79.0 79.0 79.0 1458.3978094177216 1214.6478728101267 243.74993660759498 325.2266947462635 307.7767026467916 412.45047953549573 283.3011895008876 264.05000585134076 377.55476585283094
plate1_A14_site2_Ch1__0__plate1_A14_site2_Ch1 2 A14 1 HBSS control 92.0 92.0 92.0 1270.5623624347827 1015.8361933913045 254.74487930434788 335.0570967957602 319.5738507170695 405.50202331367353 273.7174073017615 256.90458122232934 349.3471722687742
plate1_B13_site1_Ch1__0__plate1_B13_site1_Ch1 1 B13 1 HBSS control 59.0 59.0 59.0 1805.06988040678 1502.4123834576274 302.6574969491526 308.182553594441 295.40835758750995 377.9935857420644 292.5031657524073 274.4186976700165 391.6966321932782
plate1_B13_site2_Ch1__0__plate1_B13_site2_Ch1 2 B13 1 HBSS control 71.0 71.0 71.0 1558.5194041690143 1291.1595267605635 267.3598774084507 299.17016394582697 285.23325714705294 363.05445180651634 305.7340276405519 284.01603790001866 406.23380996542903
plate1_C12_site1_Ch1__0__plate1_C12_site1_Ch1 1 C12 1 HBSS control 127.0 127.0 127.0 1203.229621417323 944.5570237480316 258.67259766929135 343.52778951543183 329.7049438466621 398.5256715321834 283.1283666399538 270.7264526431119 333.90274822634825
plate1_C12_site2_Ch1__0__plate1_C12_site2_Ch1 2 C12 1 HBSS control 124.0 124.0 124.0 1166.5923096774195 921.3840805161292 245.20822916129035 348.5899736336976 331.9816108022042 415.9173276097387 287.77845139732943 273.9409912708669 343.3100275440203
plate1_D16_site1_Ch1__0__plate1_D16_site1_Ch1 1 D16 1 DMEM drug 137.0 137.0 137.0 1136.5299405547446 848.7231084379563 287.80683211678837 324.09437073563026 314.2939434267824 361.2811199284404 341.1003129872079 326.6836244548794 397.2258103183944
plate1_D16_site2_Ch1__0__plate1_D16_site2_Ch1 2 D16 1 DMEM drug 126.0 126.0 126.0 1220.6002488888892 932.7191263492065 287.88112253968256 305.43130665340715 298.09709415581307 335.4623238823955 334.3002097571851 323.0418712295005 381.9689568813829
plate1_E18_site1_Ch1__0__plate1_E18_site1_Ch1 1 E18 1 DMEM drug 147.0 147.0 147.0 1054.613018122449 800.8962803809525 253.71673774149664 345.9989260804927 335.50803523526207 383.15517282809606 296.09051609563323 283.99028924506246 339.21053128779465
plate1_E18_site2_Ch1__0__plate1_E18_site2_Ch1 2 E18 1 DMEM drug 174.0 173.0 174.0 894.9381222988507 649.1033999537573 249.56520165517242 351.5135374204032 347.28641992275476 370.0639647086021 282.8140508927973 276.74482673528496 304.67453165821416