An Advanced Motion Detection Algorithm with Video Quality Analysis for Video ......

An Advanced Motion Detection Algorithm with Video Quality Analysis for Video Surveillance Systems
An Advanced Motion Detection Algorithm.pdf (Size: 2.41 MB / Downloads: 92)
Abstract
Motion detection is the first essential process in the
extraction of information regarding moving objects and makes
use of stabilization in functional areas, such as tracking, classification,
recognition, and so on. In this paper, we propose a novel and
accurate approach to motion detection for the automatic video
surveillance system. Our method achieves complete detection of
moving objects by involving three significant proposed modules:
a background modeling (BM) module, an alarm trigger (AT)
module, and an object extraction (OE) module. For our proposed
BM module, a unique twophase background matching procedure
is performed using rapid matching followed by accurate
matching in order to produce optimum background pixels for
the background model. Next, our proposed AT module eliminates
the unnecessary examination of the entire background region,
allowing the subsequent OE module to only process blocks
containing moving objects. Finally, the OE module forms the
binary object detection mask in order to achieve highly complete
detection of moving objects. The detection results produced by
our proposed (PRO) method were both qualitatively and quantitatively
analyzed through visual inspection and for accuracy,
along with comparisons to the results produced by other stateof
theart methods. The analyses show that our PRO method
has a substantially higher degree of efficacy, outperforming other
methods by an F1 metric accuracy rate of up to 53.43%.
Index Terms—Background model, entropy, morphology, motion
detection, video surveillance.
I. Introduction
IN THE LAST DECADE, video surveillance systems have
become an extremely active research area due to increasing
levels of terrorist activity and general social problems. This
has led to motivation for the development of a strong and
precise automatic processing system, an essential tool for
safety and security in both public and private sectors. The
need for advanced video surveillance systems has inspired
progress in many important areas of science and technology
including traffic monitoring [1], [2], transport networks, traffic
flow analysis, understanding of human activity [3], [4], home
Manuscript received October 22, 2009; revised February 8, 2010; accepted
June 16, 2010. Date of publication October 18, 2010; date of current version
February 24, 2011. This work was supported by the National Science Council,
under Grant NSC 982218E027008. This paper was recommended by
Associate Editor I. Ahmad.
The author is with the Department of Electronic Engineering, National
Taipei University of Technology, Taipei 106, Taiwan (email:
schuang[at]ntut.edu.tw).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TCSVT.2010.2087812
nursing, monitoring of endangered species, observation of
people and vehicles within a busy environment [5]–[12], along
with many others.
The design of an advanced automatic video surveillance
system requires the application of many important functions
including, but not limited to, motion detection [13]–[25],
classification [26], tracking [27], [28], behavior [29], activity
analysis, and identification [30], [31]. Motion detection is one
of the greatest problem areas in video surveillance as it is not
only responsible for the extraction of moving objects but also
critical to many computer vision applications including objectbased
video encoding, human motion analysis, and humanmachine
interactions [32]. Therefore, our focus here is the
further development of the motion detection phase for an
advanced video surveillance system.
The three major classes of methods for motion detection
are background subtraction, temporal differencing, and optical
flow [13]. Background subtraction [14]–[23], [33] is
the most popular motion detection method and consists of
the differentiation of moving objects from a maintained and
updated background model, which can be further grouped
into parametric type and nonparametric type [33]. Based on
the implicit assumption along with the choice of parameters,
the parametric model may achieve perfect performance corresponding
to the real data along with parametric information
[22]. On the contrary, the nonparametric model is heavily data
dependent without any parameters [22], [33]. Apart from background
subtraction, two other motion detection methods—
optical flow and temporal differencing—are discussed in [25].
While the optical flow method shows the projected motion on
the image plane with successful approximation of the complex
background handling, it often requires levels of computational
complexity that are very high and which subsequently create
difficulties in its implementation [34]. The temporal differencing
method, while effectively adapting to environmental
changes, often results in incomplete detection of the shapes of
moving objects, due to the limitations in temporal differencing
with a sensitive threshold for noisy and local consistency
properties of the change mask [35].
The currently implemented method for background subtraction
accomplishes its objective by subtracting each pixel of
the incoming video frame from the background model, thus
generating an absolute difference. It then applies a threshold
to get the binary objects detection mask [20]. Threshold
10518215/$26.00 c 2010 IEEE
2 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 1, JANUARY 2011
selection is a critical operation and can be conducted by a
variety of previously researched methods [36]–[40]. Although
the currently implemented background subtraction method is
convenient for implementation, the noise tolerance in the video
frame relies on the determined threshold. Functionalities such
as object classification, tracking, behavior, and identification
are then performed on the regions where moving objects have
been detected.
The computational costs of traditional foreground analysis
methods are usually relatively expensive for the video surveillance
systems based on the traditional optical flow implementation
[34]. For more accurate motion detection design,
foreground analysis is always needed for the most popular
background subtraction method in order to achieve the analysis
of the motion information [34].
With respect to background maintenance, the pixellevel
processes and regionlevel processes should be clearly designed
into the background subtraction approach [41]. This
is because pixellevel processes can handle the adaptation
to changing background at each pixel independently without
pixels group observation, and regionlevel process can refine
the raw classification of the pixellevel with regard to interpixel
relationships [41].
This paper presents a novel background subtraction method
which generates a background model using the selected suitable
background candidates. Then, through the use of an alarm
trigger (AT) module, it detects the pixels of moving objects
within the regions determined to significantly feature objects.
The organization of the proposed (PRO) method is as follows.
1) A twophase background matching procedure is used to
select suitable background candidates for generation of
an updated background model.
2) A blockbased entropy evaluation with morphological
operations is conducted through a triggered blockbased
alarm module.
3) Production of motion detection is completed through the
automatic threshold selection algorithm.
When compared to other stateoftheart methods included
in the performance study, our method proved to be of higher
efficacy. This was indicated by both qualitative and quantitative
results through analysis using a wide range of natural
video sequences. The remainder of our paper is organized as
follows. Section II presents a condensed overview of the various
background subtraction approaches used for comparison.
Section III contains our proposed motion detection method.
Section IV presents the experimental results achieved by our
PRO method compared to those of other methods. Section V
contains our concluding remarks.
II. Related Work
The major purpose of background subtraction is to generate
a reliable background model and thus significantly improve
the detection of moving objects [35], [42]. Some stateofthe
art background subtraction methods include simple background
subtraction (SBS), running average (RA) [14], −
estimation (SDE) [16], multiple − estimation (MSDE)
[19], simple statistical difference (SSD) [20], RA with discrete
cosine transform (DCT) domain [21], and temporal median
filter (TMF) [23]. These methods are briefly reviewed in the
following sections.
A. Simple Background Subtraction
Both the reference image B(x, y) and the incoming video
frame It(x, y) are obtained from the video sequence. A binary
motion detection mask D(x, y) is calculated as follows:
D(x, y) =
1, if It(x, y) − B(x, y) > τ
0, if It(x, y) − B(x, y) ≤ τ
(1)
where τ is the predefined threshold which designates pixels as
either the background or the moving objects in a video frame.
If the absolute difference between a reference image and
an incoming video frame does not exceed τ, the pixels of
the detection mask are labeled “0,” which means it contains
background, otherwise, active ones are labeled “1,” which designates
it as containing moving objects. A significant problem
experienced by the SBS method in most real video sequences
is that it fails to respond precisely when noise occurs in
the incoming video frame It(x, y) and static objects occur
in the reference image B(x, y) [20]. Note that the reference
image B(x, y) represents the fixed background model, which
is selected from the test frames [20].
B. Running Average
The problem can be countered by using the RA [14] to
generate the adaptive background model for adaptation to
temporal changes in the video sequence. RA differs from SBS
in that it updates each background image frame Bt(x, y) of the
adaptive background model frequently in order to ensure the
reliability of motion detection.
The previous background frame Bt−1(x, y) and the new
incoming video frame It(x, y) are then integrated with the
current background image. The adaptive background model
is attained using the simple adaptive filter as follows:
Bt(x, y) = (1 − β)Bt−1(x, y) + βIt(x, y) (2)
where β is an empirically adjustable parameter. While a
large coefficient β leads to a faster background updating
speed, it also causes the creation of artificial trails behind
moving objects in the background model. In other words, if
objects remain stationary long enough, they become part of
the background model.
The binary motion detection mask D(x, y) is based on the
SBS method and is defined as follows:
D(x, y) =
1, if It(x, y) − Bt(x, y) > τ
0, if It(x, y) − Bt(x, y) ≤ τ
(3)
where It(x, y) is the current incoming video frame, Bt(x, y)
is the current background model, and τ is an experimentally
predefined threshold to generate the binary motion detection
mask.
HUANG: AN ADVANCED MOTION DETECTION ALGORITHM WITH VIDEO QUALITY ANALYSIS FOR VIDEO SURVEILLANCE SYSTEMS 3
C. − Estimation
In accordance with the pixelbased decision framework, the
temporal statistics of the pixels of the original video sequence
is calculated by a new background subtraction method called
SDE method [16]. In the first background estimate, the calculation
makes use of the sgn function in order to estimate the
background intensity. The sgn function is expressed as follows:
sgn(a) =
⎧⎨
⎩
1, if a > 0
0, if a = 0
−1, if a < 0
(4)
where a is the input real value. Then the background estimation
formula is expressed as follows:
Bt(x, y) = Bt−1(x, y) + sgn(It(x, y) − Bt−1(x, y)) (5)
where Bt(x, y) is the current background model, Bt−1(x, y)
is the previous background model, and It(x, y) is the current
incoming video frame. The intensity of the background model
increases or decreases by a value of one through the evaluation
of the sgn function at every frame. The image of absolute difference
t(x, y) is then calculated as the estimative difference
between It(x, y) and Bt(x, y) as follows:
t(x, y) = It(x, y) − Bt(x, y). (6)
In a similar fashion, the timevariance Vt(x, y) is calculated
by utilizing the sgn function which measures motion activity
in order to determine whether each pixel should be designated
as “background” or “moving object.”
Vt(x, y) = Vt−1(x, y) + sgn(N × t(x, y)
−Vt−1(x, y)) (7)
where Vt(x, y) is the current timevariance, Vt−1(x, y) is the
previous timevariance, and N is the predefined parameter
which ranges from 1 to 4.
Based on the generated current timevariance Vt(x, y), the
binary motion detection mask D(x, y) is detected as follows:
Dt(x, y) =
1, if t(x, y) > Vt(x, y)
0, if t(x, y) ≤ Vt(x, y).
(8)
D. Multiple − Estimation
The SDE method is characterized by its updating period
which features a constant time in which the background model
is generated. This in turn causes a constraint when used for
certain complex scenes, as in scenes with many moving objects
or those with moving objects exhibiting variable motion [19].
Thus, in this situation, the MSDE method is proposed in
order to build the adaptive background model. The background
model formula is expressed as follows:
bi
t(x, y) = bi
t−1(x, y) + sgn(bi−1
t (x, y) − bi
t−1(x, y)) (9)
where bi
t(x, y) is the current ith reference background,
bi
t−1(x, y) is the previous ith reference background, and
bi−1
t (x, y) is the current (i−1)th reference background. Additionally,
the reference difference i
t(x, y) and reference timevariance
vi
t(x, y) are also computed as follows:
vi
t(x, y) = vi
t−1(x, y) + sgn(N × i
t(x, y)
−vi
t−1(x, y)) (10)
where i
t(x, y) = It(x, y) − bi
t(x, y).
The confidence adaptive background model Bt(x, y) can be
calculated after bi
t(x, y) and vi
t(x, y) are determined, yielding
the formula as follows:
Bt(x, y) =
i∈[1,R]
αi(bi
t (x,y))
vit
(x,y)
i∈[1,R]
αi
vit
(x,y)
(11)
where each αi is the predefined confidence value, i is the
reference number, R is the total number of i, and Bt(x, y)
is the confidence adaptive background model. According to
[19], R is experimentally set to 3 and confidence values α1,
α2, and α3 are set to 1, 8, and 16, respectively. Notice that the
binary moving objects mask D(x, y) is generated by the same
approach SDE based on the confidence adaptive background
model Bt(x, y).
For the certain complex scenes, the MSDE method can
detect multiple moving objects with higher degrees of accuracy
than the SDE method. This is because the MSDE method
generates the binary moving objects mask D(x, y) based on
the multimodal background model Bt(x, y), a procedure which
requires greater computational complexity. 

