Introduction

The detection and identification of objects in images is a task that people can perform very well. Computers have traditionally found it more difficult. How do you tell a computer what constitutes the edge of an object? How can you define the difference between a dog and a cat or a motorbike and bicycle using the sort of rules that a computer can apply? The answer is that you don’t. The contemporary machine learning based solution to the problem of object identification is to train a neural network by providing it with thousands of classified images and then allow the network to classify images based on its past experience. The term “deep learning” has become commonly used to describe the process of training multi layered neural networks. There are many potential uses for this technology. However for it to be applied it needs to be simple to set up.

It has been possible to carry out object recognition using neural networks from both R and python for many years. However the number of steps involved to produce results looked daunting. So when I saw a post that offered object detection with three lines of R code I had to try it.

https://heartbeat.fritz.ai/object-detection-in-just-3-lines-of-r-code-using-tiny-yolo-b5a16e50e8a0

Setting up the package

The package is a wrapper for running the C coded darknet engine.

https://pjreddie.com/darknet/yolo/

#devtools::install_github("bnosac/image", subdir = "image.darknet", build_vignettes = TRUE)

As an alternative to using the R package it is also quite simple to download the C code make it and run a classifier in the shell.

# com<-"git clone https://github.com/pjreddie/darknet
# cd darknet
# make"
# system(com)
# setwd("darknet")
# con<-"wget https://pjreddie.com/media/files/yolov3.weights"
# system(con)

Setting up the detection model

The first step from R is to set up the model that will be used for object detection. There are a range of options. The fastest is probably the tiny.yoio model.

library(image.darknet)

detect <- image_darknet_model(type = 'detect', 
 model = 'tiny-yolo-voc.cfg', 
 weights = system.file(package='image.darknet', 'models', 'tiny-yolo-voc.weights'), 
 labels = system.file(package='image.darknet', 'include', 'darknet', 'data', 'voc.names'))

The next step is to read in a directory of images with objects to detect.

path<-"/home/rstudio/webpages/examples/yolo/images"
## Get all pngs and jpgs
a<-c(dir(path=path,pattern="png"),dir(path=path,pattern="jpg"))

Now form a function to apply to all the images.

f<-function(x){
fl<-paste(path,x,sep="/")
 pred <- image_darknet_detect(file =fl, 
                          object = detect,
                          threshold = 0.19)
  system(sprintf("mv predictions.png pred/%s",x))
  pred
}

Applying the function will produce a new image with the predictions shown. However the output is not captured in R itself. The results are invisible. This is fine for visual inspection, but would not be useful for carrying out statistical analysis on the types and numbers of objects detected in a series of images. To fix this required a minor “hack”. The output of the C coded model was redirected from stdout to a file that can be read back into R.

## Redirecting stdout
# A trick is needed in order to capture the output when the model is run.  The output is directed to stdout, which is the console. A cpp function redirects this to a text file which can then be read into R
#  Without this, the only output from the code is the classified image.
# This is OK for visual inspection, but not for analysis.

library(Rcpp)
cppFunction('void redir(){FILE* F=freopen("capture.txt","w+",stdout);}')
redir(); 

d<-lapply(a, f) ## Nothing is captured directly when the function is applied.
d<-data.frame(txt=unlist(readLines("capture.txt"))) # This line reads in the output
system("rm capture.txt")

Reformatting the output

The output that the clssifier sends to stdout is comprised of lines of text. To be useful these lines need to formatted as a data table. This involves some grepping, gsubbing and separating. It might be more elegant to use sed in the shell, but the code works.

library(dplyr)
library(tidyr)

## Take out all the lines that we don't need.
d %>% filter(!grepl("Boxes",txt))->d
d %>% filter(!grepl("pandoc",txt))->d
d %>% filter(!grepl("unnamed",txt))->d

## Find the lines that contain the file names. Make a logical vector called "isfile"
d$isfile<-grepl("/home",d$txt)

## Take out the path for the file names
d$txt<-gsub(path,"",d$txt)


## Make a column called file that contains either file names or NA
d$file<-ifelse(d$isfile,d$txt,NA)
## All the other lines of text refer to the objects detected
d$object<-ifelse(!d$isfile,d$txt,NA)
## Fill down
tidyr::fill(d, "file") ->d
## Take out NAs and select the last two columns
d<-na.omit(d)[,3:4]
# Separate the text that is held in two parts 
d %>% separate(file, into=c("file","time"),sep=":")->d
d %>% separate(object, into=c("object","prob"),sep=":")->d
d %>% filter(!is.na(prob))->d

Results

The results form a data frame that can be analysed further in R. Notice that the classification time per image is just under three seconds. This is acceptable for many purposes. Real time object tracking on video streams needs much faster processing. This is generally achieved through using a graphics processing unit (GPU) on a high power PC of the type used for gaming.

aqm::dt(d)

images showing predictions

There are quite a few errors. Squirrels are mistaken for dogs or cats, as are polar bears and a rather cute looking rat. This comes down to the way the model was trained. In order to distinguish between different mammals or bird species th model would need to be trained specifically for the purpose. Some of the objects are missed.

However .. if the algorithm were being used in the context of monitoring activity at a bird feeder it could distinguish between squirrels and birds, even if it classified squirrels as dogs or cats. Although the object may be missed on any single image it would be picked up if present on a series of images. Also, although the absolute number of objects may be eroneous, the number of objects classified will be related to the true number. So the automated classifier whould be capable of providing a useful index of activity.

Avocets500.jpg

blue-tit.jpg

clouds.jpg

feeder.jpg

Polarbears.jpg

rat-driving-car.jpg

snake-smooth.jpg

sparrows.jpg

test_image.png