Example Pipeline Tutorial#

The goal of this example pipeline is to get the user familiar with working with napari-ndev for batch processing and reproducibility (view Image Utilities and Workflow Widget). In addition, this example pipeline thoroughly explains the Measure Widget, since this is a shared use across many pipelines.

This Example Pipeline does not cover how napari-ndev is used for high-throughput annotations, the machine learning tools (APOC Widget), and designing your own workflows. This information will instead be covered in the interactive tutorials that follow.

Image Utilities#

We are going to start with the Image Utilities widget in order to concatenate the CellPainting images. This will show a common use of the Image Utilities plugin, wherein various file formats can be managed and saved in to a common OME-TIFF format, including channel names and physical pixel scaling.

Batch Concatenation

Choose Directory selects where images will be saved.
Select files individual or multiple files can be selected. Select the first 5 images (representing the 5 channels of 1 image).
Metadata dropdown. We will add in names to save the channels with, according to information that is useful. This could be the fluorophore (e.g. Hoescht 33342) or other identifying information (e.g. nuclei).
1. Channel Name(s): copy and paste ['H33342', 'conA', 'SYTO14', 'WGA_Phall', 'MitoTDR']. The format you want to use is a list [] of strings 'a','b','etc.'
2. Scale, ZYX. Set Y and X to 0.656. Z will be ignored since images are 2D.
Batch Concat. Pressing this button will iterate through all files in the folder, selecting them in groups of 5 (i.e. the number of original files selected) and then saving them with the above parameters.

Investigate the images#

Image Utilities

If you want to investigate the raw images press Open File(s) this will open the original images with their known scale (1,1,1). Each image will open as grayscale, and will not be layered.

Now, investigate your concatenated images. Go to Select Files and find the folder ConcatenatedImages inside the Choose Directory previously chosen. Select the first image and Open File(s). This time, the images will be open to the scale we set (0,0.656,0.656) and with a default layering and pseudo-coloring. This is how all images get passed down throughout the plugin.

Example workflow#

Once images are in a format that is helpful for analysis, we can proceed with other widgets. This does mean that some images do not need to be processed with the Image Utilities Widget; for example, some microscopes properly incorporate scale and channel names into the image metadata. For this tutorial, we are going to use the Workflow Widget to pre-process, segment, and label features of the image with a pre-made custom workflow file (see cellpainting\scripting_workflow.ipynb to see how). The intent of the Workflow Widget is to easily reproduce This custom workflow was designed initially with the napari-assistant which will be explored further in the following tutorial sections.

The goal for this workflow is to segment the nucleus, cell area (based on a voronoi tessellation of the nuclei), cytoplasm (cell area - nucleus), and the nucleoli. We will later measure the properties of these objects using the Measure Widget.

Workflow Example

Image Directory choose the ConcatenatedImages found in the previous parent folder.\
Result Directory create a folder to save the output images into.
Workflow File navigate to scripted_cellpainting_workflow.yaml

Now, you will now see the UI automatically update to show the roots (input images of the Workflow file). Furthermore, these roots will be populated by the channel names of the images in the chosen directory. In this workflow there are three root images required: (1) Root 0: cyto_membrane is WGA_Phall, (2) Root 1: nuclei is H33342, and (3) Root 2: nucleoli is SYTO14.

Workflow-roots

Next, switch to the Tasks tab. In this tab, the leaves or workflow tasks that sit at the terminals of task tree are automatically selected. However, we are also interested in visualizing the nuclei. So, hold control or command on your keyboard and also click nuclei-labels to add this task to the batch workflow. If all workflow tasks you are interested in are represented as leaves than you can even skip this tab!

Workflow-tasks

Finally, press Batch Workflow. The Image Directory will be iterated through with the workflow. The Progress Bar will show updates and a log file will be saved to show the input parameters and progress of the batch processing, including any possible errors.

Workflow notes#

Just as we selected an additional task for the workflow, any number of tasks can be acquired from the workflow and if Keep Original Images is checked, these will also be saved in the resulting batch processed images. As such, the workflow widget can also be used to easily visualize intermediate steps of the Workflow to investigate how something was achieved and share that information. Below, napari is showing every original channel and every task in this workflow as a grid in napari; all of this is saved into one single file.

Workflow-all

Coming Soon: the ability to use layers in the workflow as roots to do single image Workflows and adding them into napari immediately!

The Measure Widget provides the ability to measure images in batch, group important information, and even utilize metadata to map sample treatments and conditions. This widget is the newest addition the napari-ndev, in part because it has taken me a long time to conceptualize how to make image measurements accessible in batch, so I am particularly looking for usage feedback. For detailed usage instructions see the Measure Widget Example.

How measuring in Python generally works#

It is often most helpful to represent a segmented image as 'labels'. Labels (including the Labels Layer in napari) have a pseudocolor scheme where each label (i.e. object) has a specific value, and that value is represented by a color. When these labels are then measured, each label object is measured independently and represented in one row. With few objects of interest in low-throughput processes, this can make sense, but, a label image with 100 objects will result in a spreadsheet with 100 rows. Accordingly, even measuring 10 images with 100 objects each leads to 1000 rows. To many scientists, these are both small object numbers and small image numbers, so you can imagine how quickly and easily datasets can be in the hundreds of thousands or millions of rows.

Furthermore, many many properties of images can be labeled, from area (which is scaled properly throughout this plugin to real units), to perimeter, to solidity, to sphericity. Thus, measuring label properties in Python generally requires knowledge of python to make sense of this long multi-variate data. Especially when it comes to grouping data by treatments or doing counts or other aggregating functions on any measurement of the labels.

The Measure Widget seeks to address the most common usability cases for high-throughput analyses by providing human readable outputs. Furthermore, treatment metadata mapping can easily be shared from a more advanced researcher to a novice, for reproducibility of more involved analyses.

Measure-batch

Label Directory: Select the directory containing the Labels you desire to measure -- in this case choose the directory from the Workflow Widget. This image file can contain any number of labels (or non-labels, but those should not be measured). Channels will populate both Label Image select and Intensity Images.
Image Directory: An Optional directory -- choose the ConcatenatedImages directory to populate the original channel images to the Intensity Images select box.
Region Directory: Another Optional directory intended for 'ROI'/Region of Interest labels -- not used for this pipeline.
Label image: Using multi-selection, select cell-labels, cyto-labels, and nuclei-labels. We will measure each object in each image.
Intensity images: Using multi-selection, select nucleoli-labels (to measure the number of nucleoli inside the label), conA and mitoTDR (to measure the underlying intensity of the channel on the label).
Region Props. This is a list of the measurements for each label. For this example, at least select label, area, intensity_mean and solidity. label is the identity, and is recommended to always be checked. Otherwise you can measure shape features like area, eccentricity, and solidity or you can measure intensity features like the mean, max, min, etc. Note, that measuring something like the intensity max of an intensity image that represents an ROI serves as a means to identify if it is inside (i.e. the value of the ROI) or outside (i.e. 0) the region.
At this point, you could hit the Measure button and it will measure all label channels in each image in batch. However, for this example we also want to add some identification and treatment data to the output. This example data comes from wells with no treatment, so we will generate some ourselves to explain the concept, but this should be straightforward enough to apply to your own data. To use the ID Regex and Tx Map tags we use dictionaries of key: value pairs where the key becomes the column name, and the value contains the regular expression to search for.
ID Regex tab. This dictionary extracts information from the filename with regular expression patterns. These data all come from plate1 but if we had multiple plates we could extract the plate number with the following regex r'(plate\d{1,2})-' whatever is inside the () is considered the 'group' that gets returned. In this case we can provide the dictionary to return the identifying number of the plate and the well position. We specifically need the well position in order to map it to the treatment map. Copy and paste this into ID Regex

{
    'plate': r'plate(\d{1,3})_',
    'well': r'_(\w+?)_site',
    'site': r'_site(\d{1,3})_',
}

Batch-ID-regex

Tx Map tab. This dictionary maps well positions to an overall platemap. This time, the key remains the column identification, but then another dictionary is used to map the treatments inside, see below for example. The platemap is expected to be of standard configuration, but can include wells that are not imaged. First press 'Update Treatment ID Choices' to use the previous regex for Well ID. Select well for Treatment ID and 384 for Number of Wells. We are going to pretend the platemap has the following treatments:

{
    'media': {
        'HBSS': ['A1:C24'],
        'DMEM': ['D1:F24'],
    },
    'treatment': {
        'control': ['A12:P14'],
        'drug': ['A15:P18'],
    }
}

Batch-tx-map

Press the Measure button! We have all the options set to richly annotate our data with identifying info and treatments... in batch!

Grouping the data#

Navigate to the Output Directory and find the measure_props...csv for your data! You can see each measure for each label, but it's hard to read interpret this way.

label_name	id	site	well	plate	label	area	intensity_mean-nucleoli-labels	intensity_mean-conA	intensity_mean-MitoTDR	solidity	row	column	media	treatment
cell-labels	plate1_A14_site1_Ch1__0__plate1_A14_site1_Ch1	1	A14	1	1	1469.167104	0.7454598711189221	256.04100761570004	295.11511423550087	0.7832071576049552	A	14	HBSS	control
cell-labels	plate1_A14_site1_Ch1__0__plate1_A14_site1_Ch1	1	A14	1	2	505.6448000000001	0.089361702	407.0757446808511	389.1506382978723	0.9767248545303407	A	14	HBSS	control
cell-labels	plate1_A14_site1_Ch1__0__plate1_A14_site1_Ch1	1	A14	1	3	336.092416	1.2586427656850192	233.87580025608196	326.2509603072983	0.9455205811138013	A	14	HBSS	control

Instead, we want to group the data by useful metrics. Navigate to the Grouping tab. Select the output measure_props...csv for Measured Data Path; the selection is interpreted to fill the remaining information in the tab. If we include the id name in our grouping column, then it will summarize each individual image. If you then also select other identifying information, like site, well, plate, etc. then this information will be kept in the summarized file. Ultimately, data will be grouped by the most fine-grained group (in this case, each image, aka the id). So, if you wanted to just know differences between treatments you could do group only by treatment; caution this hides your raw data and just reduces the information to the aggregate function.

For this pipeline, we are going to group by: id, label_name (which label channel it is), site, well, plate, media, and treatment. This will summarize the data by id at the finest (each file), but preserve all that metadata. Then, we keep Count Column set to label so that it counts the number of each object in the image. Finally, we are going to aggregate other measured features. Select Aggregation Columns: intensity_mean-conA (to measure the intensity of ER) and intensity_mean_MitoTDR (mitochondria), and area (to compare the size of each object). Then observe how the there are multiple Aggregation Functions that by default is set to mean.

Next, check Pivot Wider. This will place each individual label channel in the columns, rather than replicating in rows. This is generally more human-readable and familiar for non-coding statistical work.

Finally, press Group Measurements button! You now have the output dataset.

batch-group

Make observations#

One of the best parts of summarizing your data is quickly checking for quality control. Investigate measure_props...grouped.csv

Do we get the same number of rows that we would expect? (hint, it should be the number of images, with how we grouped)
Are there the same number of nuclei as cytoplasms in each image? Should there be?
Is the intensity of a certain marker localized more to the cytoplasm or the nucleus?
Is the are of the whole cell larger than the cytoplasm and nucleus alone? Does nucleus + cytoplasm = cell?

id	site	well	plate	media	treatment	label_count	label_count.1	label_count.2	area_mean	area_mean.1	area_mean.2	intensity_mean-MitoTDR_mean	intensity_mean-MitoTDR_mean.1	intensity_mean-MitoTDR_mean.2	intensity_mean-conA_mean	intensity_mean-conA_mean.1	intensity_mean-conA_mean.2
nan	nan	nan	nan	nan	nan	cell-labels	cyto-labels	nuclei-labels	cell-labels	cyto-labels	nuclei-labels	cell-labels	cyto-labels	nuclei-labels	cell-labels	cyto-labels	nuclei-labels
plate1_A14_site1_Ch1__0__plate1_A14_site1_Ch1	1	A14	1	HBSS	control	79.0	79.0	79.0	1458.3978094177216	1214.6478728101267	243.74993660759498	325.2266947462635	307.7767026467916	412.45047953549573	283.3011895008876	264.05000585134076	377.55476585283094
plate1_A14_site2_Ch1__0__plate1_A14_site2_Ch1	2	A14	1	HBSS	control	92.0	92.0	92.0	1270.5623624347827	1015.8361933913045	254.74487930434788	335.0570967957602	319.5738507170695	405.50202331367353	273.7174073017615	256.90458122232934	349.3471722687742
plate1_B13_site1_Ch1__0__plate1_B13_site1_Ch1	1	B13	1	HBSS	control	59.0	59.0	59.0	1805.06988040678	1502.4123834576274	302.6574969491526	308.182553594441	295.40835758750995	377.9935857420644	292.5031657524073	274.4186976700165	391.6966321932782
plate1_B13_site2_Ch1__0__plate1_B13_site2_Ch1	2	B13	1	HBSS	control	71.0	71.0	71.0	1558.5194041690143	1291.1595267605635	267.3598774084507	299.17016394582697	285.23325714705294	363.05445180651634	305.7340276405519	284.01603790001866	406.23380996542903
plate1_C12_site1_Ch1__0__plate1_C12_site1_Ch1	1	C12	1	HBSS	control	127.0	127.0	127.0	1203.229621417323	944.5570237480316	258.67259766929135	343.52778951543183	329.7049438466621	398.5256715321834	283.1283666399538	270.7264526431119	333.90274822634825
plate1_C12_site2_Ch1__0__plate1_C12_site2_Ch1	2	C12	1	HBSS	control	124.0	124.0	124.0	1166.5923096774195	921.3840805161292	245.20822916129035	348.5899736336976	331.9816108022042	415.9173276097387	287.77845139732943	273.9409912708669	343.3100275440203
plate1_D16_site1_Ch1__0__plate1_D16_site1_Ch1	1	D16	1	DMEM	drug	137.0	137.0	137.0	1136.5299405547446	848.7231084379563	287.80683211678837	324.09437073563026	314.2939434267824	361.2811199284404	341.1003129872079	326.6836244548794	397.2258103183944
plate1_D16_site2_Ch1__0__plate1_D16_site2_Ch1	2	D16	1	DMEM	drug	126.0	126.0	126.0	1220.6002488888892	932.7191263492065	287.88112253968256	305.43130665340715	298.09709415581307	335.4623238823955	334.3002097571851	323.0418712295005	381.9689568813829
plate1_E18_site1_Ch1__0__plate1_E18_site1_Ch1	1	E18	1	DMEM	drug	147.0	147.0	147.0	1054.613018122449	800.8962803809525	253.71673774149664	345.9989260804927	335.50803523526207	383.15517282809606	296.09051609563323	283.99028924506246	339.21053128779465
plate1_E18_site2_Ch1__0__plate1_E18_site2_Ch1	2	E18	1	DMEM	drug	174.0	173.0	174.0	894.9381222988507	649.1033999537573	249.56520165517242	351.5135374204032	347.28641992275476	370.0639647086021	282.8140508927973	276.74482673528496	304.67453165821416