|
The
Particle Filter was developed to address the problem of tracking
contour outlines through heavy image clutter (Isard and Blake, 1996;
1998).
The filter's output at a given time-step, rather than being a single
estimate of position and covariance as in a Kalman filter, is an
approximation of an entire probability distribution of likely joint
angles. This allows the filter to maintain multiple hypotheses and
thus be robust to distracting clutter.
With about 32 DOFs for joint angles to be determined for each frame,
there is the potential for exponential complexity when evaluating
such a high dimensional search space. MacCormick (2000)
proposed Partitioned Sampling and Sullivan (1999)
proposed Layered Sampling to reduce the search space by partitioning
it for more efficient particle filtering. Although Annealed Particle
Filtering (Deutscher et al., 2000)
is an even more general and robust solution, it struggles with efficiency
which Deutscher (2001)
improves with Partitioned Annealed Particle Filtering.
The Particle Filter is a considerably simpler algorithm than the
Kalman Filter. Moreover despite its use of random sampling, which
is often thought to be computationally inefficient, the Particle
Filter can run in real-time. This is because tracking over time
maintains relatively tight distributions for shape at successive
time steps and particularly so given the availability of accurate
learned models of shape and motion from the human-movement-recognition
(CHMR) system. Here, the particle filter has:
3 probability distributions in problem
specification:
1.
Prior density p(x) for the state x
Ø
joint angles x in previous frame
2.
Process density p(xt|xt-1)
Ø
kinematical and clone-body-models
(xt-1: previous
frame, xt:
next frame)
3.
Observation density p(z|x)
Ø
image z in previous frame
one probability distribution
in solution specification:
1.
State Density p(xt|Zt)
Ø
where xt is the joint angles in next
frame Zt
1. Prior density: Sample s¢t from the prior density p(xt-1|zt-1) where xt-1=joint angles in previous
frame, zt-1.
The sample set are possible alternate values for joint angles. When
tracking through background clutter or occlusion, a joint angle
may have N alternate possible values (samples) s with respective weights w,
where prior density,
p(x) ≈ St-1
= {(s(n),w(n)), n=1..N} = a sample
set
(St-1 is the sample set for the previous frame, w(n) is the nth weight of the nth sample s(n) )
For the next frame, a new
sample is selected, s¢t = st-1 by finding the smallest i for which c(i) ≥ r, where c(i) = ∑tw(i) and r is a random number {0,1}.
2. Process density: Predict st from the process density p(xt|xt-1= s¢t(n)). Joint angles are predicted for the next frame using the
kinematic model, body model & error minimisation. A joint angle, s in the next
frame is predicted by sampling from the
process density,
p(xt|xt-1 = s¢t(n)) which encompasses the
kinematic model, clone-body-model and cost function minimisation.
In this prediction step both edge and region information is used.
The edge information is used to directly match the image gradients
with the expected model edge gradients. The region information is
also used to directly match the values of pixels in the image with
those of the clone-body-model’s 3D colour texture map. The prediction
step involves minimizing the cost functions (measurement likelihood density):
edge error
Ee using edge information (see Equation 2 in Appendix):
region error
Er using region information(see Equation 3 in Appendix):
3. Observation density: Measure and weigh the new position
in terms of the observation density, p(zt|xt). Weights wt
= p(zt|xt = st) are estimated
and then weights ∑nw(n)
= 1 are normalized. The new position in terms of the observation
density, p(zt|xt) is then
measured and weighed with forward
smoothing:
Smooth weights wt over 1..t, for n trajectories
Replace each sample set
with its n trajectories
{(st,wt)} for 1..t
Re-weight all w(n) over 1..t
Trajectories tend to merge
within 10 frames
O(Nt) storage prunes down to O(N)
In
this research, feedback from the CHMR system utilizes the large
training set of skills to achieve an even larger reduction of the
search space. In practice, human movement is found to be highly
efficient, with minimal DOFs rotating at any one time. The equilibrium
positions and physical limits of each DOF further stabilize and
minimize the dimensional space. With so few DOFs to track at any
one time, a minimal number of particles are required, significantly
raising the efficiency of the tracking process. Such highly constrained
movement results in a sparse domain of motion projected by each
motion vector.
Because the temporal variation of related joints and other parameters
also contains information that helps the recognition process infer
skill boundaries, the system computes and appends the temporal derivatives
and second derivatives of these features to form the final motion
vector. Hence the motion vector includes joint angles (32 DOF),
body location and orientation (6 DOF), centre of mass (3 DOF), principle
axis (2 DOF) all with first and second derivatives.
|