Identifying Cells | Part 1

kathleen wang
4 min readFeb 3, 2021

A couple weeks ago, I found a competition that piqued my interest. I’m keeping the details vague on purpose because I haven’t made my submission yet, but I will definitely make an update post with my code and results.

The goal of the competition is to use a machine learning algorithm to detect a specific type of cell in some slices. They gave us 8 training images and 5 testing images.

I started with a little bit of EDA to familiarize myself with the dataset. Each image was gigantic — many were over 1GB, and the largest was a whopping 4GB! — and so my computer wasn’t able to open the images using a traditional image viewer. So instead, I resized each image and plotted them.

Next, I visualized the masks. The masks provided were actually RLE encoded representations, and someone was nice enough to post the function they made to turn the RLE image into a numpy array. I then took this numpy array and turned it into an image.

I overlaid these masks on the images to see my target cells.

Here is a close up of the targets:

To prepare my images for training, I resized them to about 50%, then turned them into greyscale in an effort to make their file sizes a bit smaller. I then made bounding boxes around each mask and recorded the x and y coordinates for each bounding box in a CSV.

Visualization of the bounding boxes

The model I used was Faster R-CNN. I tried to train my model with the whole resized images but, at 100MB, the files were still too large for my computer to handle. I ended up writing a function to slice my images and masks into tiny little 500kB tiles. All of these tiny tiles added up to 1500 training images for my model.

Here are my results:

The white squares are my predictions and the black squares are ground truth.

And here is a slice from one of their test images to be used for submission.

My next step is to figure out how to put my tiles back together, or at least figure out how to convert each tile’s bounding box coordinates into coordinates that would match the ones that would be found in the original image. Then I need to figure out a way to circle the cell so that I don’t include any non-target cells.

I’m so excited about how well my model performed, even though I had only trained for one epoch. I will probably try training my model for a few more epochs just to see what happens.

Getting this far was definitely a journey. I would never have been able to do any of this just 3 months ago. The data science bootcamp I am currently enrolled in has taught me so much in so little time. I found myself constantly referencing the lessons and labs to help me through this project. Even though the topics are different, the underlying principles were the same.

I would love to continue to work on projects like this, and hopefully pursue this as a career.

--

--