Skip to content
Generic filters
Exact matches only

A One-Stop Guide to Computer Vision — part 1

Head over here to check out the documentation.

Semantic Segmentation

Semantic segmentation aims to classify every pixel in an image. It will look at the full context of the image before making segment level prediction. After which, the model will use different colours to overlay all the pixels of an image to separate them by class. In this section, background classes are labelled as -1.

1. Download the image

This section is exactly the same as Object Detection. You may skip this step if you wish to

image_url = ""image_filepath = 'dog.jpg',path = image_filepath)

2. Transform the data

There isn’t a one-line code for this segment. You have to define your own functions to transform your data:

from import transformstransform_fn = transforms.Compose([
transforms.Normalize([.485, .456, .406], [.229, .224, .225])
image = transform_fn(image)
print("data type:",image.dtype)
print("min value:",image.min().asscalar())
print("max value:",image.max().asscalar())
image = image.expand_dims(0)

3. Prepare the model:

network = gcv.model_zoo.get_model('fcn_resnet50_ade',pretrained=True)

4. Unpack the results

To speed things up, we will apply .demo to our network. The longer version can be found under instance segmentation.

output = network.demo(image)
output = output[0] # since there is only 1 image in this batch
prediction = mx.nd.argmax(output,0).asnumpy() # to get index of largest probability

5. Colour our image

Now, let’s colour our image to segment them:

from gluoncv.utils.viz import get_color_palleteprediction_image = get_color_pallete(prediction, 'ade20k')

Instance Segmentation

Instance segmentation is able to identify that one person is different from another person. Instead of a boundary box, we can predict the exact boundary and colour those pixels.

1. Download the image

image ='' +

2. Transform the data

x, orig_img =

3. Prepare the model

network = gcv.model_zoo.get_model('mask_rcnn_resnet50_v1b_coco', pretrained=True)

4. Unpack the results

ids, scores, bboxes, masks = [xx[0].asnumpy() for xx in network(x)]

5. Paint over image

# paint segmentation mask on images directly
width, height = orig_img.shape[1], orig_img.shape[0]
masks, _ = gcv.utils.viz.expand_mask(masks, bboxes, (width, height), scores)
orig_img = gcv.utils.viz.plot_mask(orig_img, masks)
# identical to Faster RCNN object detection
fig = plt.figure(figsize=(10, 10))
ax = fig.add_subplot(1, 1, 1)
ax = gcv.utils.viz.plot_bbox(orig_img, bboxes, scores, ids,
class_names=network.classes, ax=ax)

Congratulations! You managed to complete the 4 main computer vision tasks using only a single framework!

Source: kxcd