Introduction

The detection and identification of objects in images is a task that people can perform very well. Computers have traditionally found it more difficult. How do you tell a computer what constitutes the edge of an object? How can you define the difference between a dog and a cat or a motorbike and bicycle using the sort of rules that a computer can apply? The answer is that you don’t. The contemporary machine learning based solution to the problem of object identification is to train a neural network by providing it with thousands of classified images and then allow the network to classify images based on its past experience. The term “deep learning” has become commonly used to describe the process of training multi layered neural networks. There are many potential uses for this technology. Recently some major advances have been made in automated object detection. Google have been funding work that can be applied to produce driverless cars. Some of the details are shown in this Ted talk.

https://www.ted.com/talks/joseph_redmon_how_computers_learn_to_recognize_objects_instantly?language=en#t-142736

The actual source code for YOLO is open source, so can be used by anyone. For this technology to be applied it needs to be simple to set up.

It has been possible to carry out object recognition using neural networks from both R and python for many years. However the number of steps involved to produce results looked daunting. So when I saw a post that offered object detection using YOLO with just three lines of R code I had to try it.

https://heartbeat.fritz.ai/object-detection-in-just-3-lines-of-r-code-using-tiny-yolo-b5a16e50e8a0

Setting up the package

The package is a wrapper for running the C coded darknet engine.

https://pjreddie.com/darknet/yolo/

#devtools::install_github("bnosac/image", subdir = "image.darknet", build_vignettes = TRUE)

As an alternative to using the R package it is also quite simple to download the C code make it and run a classifier in the shell.

# com<-"git clone https://github.com/pjreddie/darknet
# cd darknet
# make"
# system(com)
# setwd("darknet")
# con<-"wget https://pjreddie.com/media/files/yolov3.weights"
# system(con)

Setting up the detection model

The first step from R is to set up the model that will be used for object detection. There are a range of options. The fastest is probably the tiny.yoio model.

library(image.darknet)
detect <- image_darknet_model(type = 'detect', 
 model = 'tiny-yolo-voc.cfg', 
 weights = system.file(package='image.darknet', 'models', 'tiny-yolo-voc.weights'), 
 labels = system.file(package='image.darknet', 'include', 'darknet', 'data', 'voc.names'))

The next step is to read in a directory of images with objects to detect.

path<-"/home/rstudio/webpages/examples/yolo/images"
## Get all pngs and jpgs
a<-c(dir(path=path,pattern="png"),dir(path=path,pattern="jpg"))

Now form a function to apply to all the images.

f<-function(x){
fl<-paste(path,x,sep="/")
 pred <- image_darknet_detect(file =fl, 
                          object = detect,
                          threshold = 0.19)
  system(sprintf("mv predictions.png pred/%s",x))
  pred
}

Applying the function will produce a new image with the predictions shown. However the output is not captured in R itself. The results are invisible. This is fine for visual inspection, but would not be useful for carrying out statistical analysis on the types and numbers of objects detected in a series of images. To fix this required a minor “hack”. The output of the C coded model was redirected from stdout to a file that can be read back into R.

## Redirecting stdout
# A trick is needed in order to capture the output when the model is run.  The output is directed to stdout, which is the console. A cpp function redirects this to a text file which can then be read into R
#  Without this, the only output from the code is the classified image.
# This is OK for visual inspection, but not for analysis.

library(Rcpp)
cppFunction('void redir(){FILE* F=freopen("capture.txt","w+",stdout);}')
redir(); 

d<-lapply(a, f) ## Nothing is captured directly when the function is applied.
d<-data.frame(txt=unlist(readLines("capture.txt"))) # This line reads in the output
system("rm capture.txt")

Reformatting the output

The output that the classifier sends to stdout is comprised of lines of text. To be useful these lines need to formatted as a data table. This involves some grepping, gsubbing and separating. It might be more elegant to use sed in the shell, but the code works.

library(dplyr)
library(tidyr)

## Take out all the lines that we don't need.
d %>% filter(!grepl("Boxes",txt))->d
d %>% filter(!grepl("pandoc",txt))->d
d %>% filter(!grepl("unnamed",txt))->d

## Find the lines that contain the file names. Make a logical vector called "isfile"
d$isfile<-grepl("/home",d$txt)

## Take out the path for the file names
d$txt<-gsub(path,"",d$txt)


## Make a column called file that contains either file names or NA
d$file<-ifelse(d$isfile,d$txt,NA)
## All the other lines of text refer to the objects detected
d$object<-ifelse(!d$isfile,d$txt,NA)
## Fill down
tidyr::fill(d, "file") ->d
## Take out NAs and select the last two columns
d<-na.omit(d)[,3:4]
# Separate the text that is held in two parts 
d %>% separate(file, into=c("file","time"),sep=":")->d
d %>% separate(object, into=c("object","prob"),sep=":")->d
d %>% filter(!is.na(prob))->d

Results

The results form a data frame that can be analysed further in R. Notice that the classification time per image is just under three seconds. This is acceptable for many purposes. Real time object tracking on video streams needs much faster processing. This is generally achieved through using a graphics processing unit (GPU) on a high power PC of the type used for gaming.

aqm::dt(d)

images showing predictions

There are quite a few errors. Squirrels are mistaken for dogs or cats, as are polar bears and a rather cute looking rat. This comes down to the way the model was trained. Images of dogs and cats were used, but no bears nor squirrels. So the model does its best to identify them as either cats or dogs. In order to distinguish between different mammals or bird species the model would need to be trained specifically for the purpose, which is a very time consuming process. Some of the objects are missed completely including snakes, which would be very difficult to pick up even if images of them were used for training the network. If images in the training set are being missed the issue can be addressed by lowering the detection threshold although this will lead to more “noise” being incorrectly identified as objects.

However, despite the errors, if the algorithm were being used in the context of monitoring activity at a bird feeder it could distinguish between squirrels and birds, even while it was classifying the squirrels as dogs or cats. Although an object may be missed on any single image it would be picked up if present on a series of images. Also, although the absolute number of objects may be eroneous, the number of objects classified will be closely related to the true number. So the automated classifier whould be capable of providing a useful index of activity.

Avocets500.jpg

blue-tit.jpg

clouds.jpg

feeder.jpg

Polarbears.jpg

rat-driving-car.jpg

snake-smooth.jpg

sparrows.jpg

squirrel.jpg

squirrel2.jpg

traffic.jpg

test_image.png