|
|
|
ABSTRACT |
Studies to optimise take off angles for height or distance have usually involved either a time-consuming invasive approach of placing markers on the body in a laboratory setting or using even less efficient manual frame-by-frame joint angle calculations with one of the many sport science video analysis software tools available. This research introduces a computer-vision based, marker-free, real-time biomechanical analysis approach to optimise take-off angles based on speed, base of support and dynamically calculated joint angles and mass of body segments. The goal of a jump is usually for height, distance or rotation with consequent dependencies on speed and phase of joint angles, centre of mass COM) and base of support. First and second derivatives of joint angles and body part COMs are derived from a Continuous Human Movement Recognition (CHMR) system for kinematical and what-if calculations. Motion is automatically segmented using hierarchical Hidden Markov Models and 3D tracking is further stabilized by estimating the joint angles for the next frame using a forward smoothing Particle filter. The results from a study of jumps, leaps and summersaults supporting regular knowledge of results feedback during training sessions indicate that this approach is useful for optimising the height, distance or rotation of skills. |
Key words:
Gymnastics, jumping, three-dimensional kinematics, computer vision
|
Key
Points
- Computer-vision based marker-free tracking.
- Real-time biomechanical analysis.
- Improve tracking using a forward smoothing Particle filter.
- Automatically segment using hierarchical Hidden Markov Models.
- Recognize skills using segmented motion.
- Optimize take-off angles using speed, base of support, joint angles and mass of body segments.
- Optimize height, distance or rotation of skills.
|
Sport skills are tracked and biomechanically analysed by either requiring athletes to wear joint markers/identifiers (an approach with has the disadvantage of significant set up time) or manually marking up video frame-by-frame. Such complex and time consuming approaches to tracking and analysis is an impediment to daily use by coaches and has barely changed since it was developed in the 1970s. Using a less invasive approach free of markers, computer vision research into tracking and recognizing full-body human motion has so far been mainly limited to gait or frontal posing (Moeslund and Granum, 2001). Various approaches for tracking the whole body have been proposed in the image processing literature using a variety of 2D and 3D body models. However cylindrical, quadratic and ellipsoidal (Drummond and Cipolla, 2001; Kakadiaris and Metaxas, 1996; Pentland and Horowitz, 1991; Wren et al., 1997) body models of previous studies do not contour accurately to the body, thus decreasing tracking stability. To overcome this problem, in this research 3D clone-body-model regions are sized and texture mapped from each body part by extracting features during the initialisation phase (Cham and Rehg, 1999). This clone-body-model has a number of advantages over previous body models: It allows for a larger variation of somatotype (from ectomorph to endomorph), gender (cylindrical trunks do not allow for breasts or pregnancy) and age (from baby to adult). Exact sizing of clone-body- parts enables greater accuracy in tracking edges, rather than the nearest best fit of a cylinder. Texture mapping of clone-body-parts increases region tracking and orientation accuracy over the many other models which assume a uniform colour for each body part. Region patterns, such as the ear, elbow and knee patterns, assist in accurately fixing orientation of clone-body-parts. Neither joint markers nor manual frame-by-frame mark-up provide volume and 3D centre-of-mass (COM) estimates of a 3D body model - invaluable for 3D biomechanical analysis. In this study, joint angle velocities, together with the size and mass of body segments enabled more accurate optimisation of take-off angles supporting the goal of a jump whether for height, distance or rotation with consequent dependencies on phase of joint angles and base of support.
Clone-Body- ModelThe clone-body-model proposed in this paper consists of a set of clone-body-parts, connected by joints, similar to the representations proposed by Badler et al., 1993. Clone-body-parts include the head, clavicle, trunk, upper arms, forearms, hands, thighs, calves and feet. Degrees of freedom are modeled for gross full body motion. Degrees of freedom supporting finer resolution movements are not yet modeled, including the radioulnar (forearm rotation), interphalangeal (toe), metacarpophalangeal (finger) and carpometacarpal (thumb) joint motions. Each clone-body-part consists of a rigid spine with pixels radiating out (Figure 1). Each pixel represents a point on the surface of a clone-body-part. Associated with each pixel is: radius or thickness of the clone-body-part at that point; colour as in hue, saturation and intensity; accuracy of the colour and radius; and the elasticity inherent in the body part at that point. Although each point on a clone-body-part is defined by cylindrical coordinates, the radius varies in a cross section to exactly follow the contour of the body as shown in Figure 2. Automated initialisation assumes only one person is walking upright in front of a static background initially with gait being a known movement model. Anthropometric data (Pheasant, 1996) is used as a Gaussian prior for initializing the clone-body-part proportions with left-right symmetry of the body used as a stabilizing guide from 50th percentile proportions. Such constraints on the relative size of clone-body-parts and on limits and neutral positions of joints help to stabilize initializations. Initially a low accuracy is set for each clone-body-part with the accuracy increasing as structure from motion resolves the relative proportions. For example, a low colour and high radius accuracy is initially set for pixels near the edge of a clone-body-part, high colour and low radius accuracy for other near side pixels and a low colour and low radius accuracy is set for far side pixels. The ongoing temporal resolution following self occlusions enables increasing radius and colour accuracy. Breathing, muscle flexion and other normal variations of body part radius are accounted for by the radius elasticity parameter.
Kinematic ModelThe kinematical model tracking the position and orientation of a person relative to the camera entails projecting 3D clone-body-model parts onto a 2D image using three chained homogeneous transformation matrices as illustrated in Figure 3 (see Equation 1 in Appendix). Joint angles are used to track the location and orientation of each body part, with the range of joint angles being constrained by limiting the DOF associated with each joint. A simple motion model of constant angular velocity for joint angles is used in the kinematical model. Each DOF is constrained by anatomical joint-angle limits, body-part inter-penetration avoidance and joint-angle equilibrium positions modelled with Gaussian stabilizers around their equilibria. To stabilize tracking, the joint angles are predicted for the next frame. The calculation of joint angles, for the next frame, is cast as an estimation problem which is solved using a Particle filter (Condensation algorithm).
Particle FilterThe Particle Filter was developed to address the problem of tracking contour outlines through heavy image clutter (Isard and Blake, 1996; 1998). The filter’s output at a given time-step, rather than being a single estimate of position and covariance as in a Kalman filter, is an approximation of an entire probability distribution of likely joint angles. This allows the filter to maintain multiple hypotheses and thus be robust to distracting clutter. With about 32 DOFs for joint angles to be determined for each frame, there is the potential for exponential complexity when evaluating such a high dimensional search space. MacCormick (2000) proposed Partitioned Sampling and Sullivan (1999) proposed Layered Sampling to reduce the search space by partitioning it for more efficient particle filtering. Although Annealed Particle Filtering (Deutscher et al., 2000) is an even more general and robust solution, it struggles with efficiency which Deutscher (2001) improves with Partitioned Annealed Particle Filtering. The Particle Filter is a considerably simpler algorithm than the Kalman Filter. Moreover despite its use of random sampling, which is often thought to be computationally inefficient, the Particle Filter can run in real-time. This is because tracking over time maintains relatively tight distributions for shape at successive time steps and particularly so given the availability of accurate learned models of shape and motion from the human-movement-recognition (CHMR) system. Here, the particle filter has: 3 probability distributions in problem specification: one probability distribution in solution specification: 1. Prior density: Sample s†²t from the prior density p(xt-1|zt-1) where xt-1=joint angles in previous frame, zt-1. The sample set are possible alternate values for joint angles. When tracking through background clutter or occlusion, a joint angle may have N alternate possible values (samples) s with respective weights w, where prior density, p(x) ≈ St-1 = {(s(n),w(n)), n=1..N} = a sample set (St-1 is the sample set for the previous frame, w(n) is the nth weight of the nth sample s(n) ) For the next frame, a new sample is selected, s†²t = st-1 by finding the smallest i for which c(i) ≥ r, where c(i) = ∑tw(i) and r is a random number {0,1}. 2. Process density: Predict st from the process density p(xt|xt-1= s†²t(n)). Joint angles are predicted for the next frame using the kinematic model, body model & error minimisation. A joint angle, s in the next frame is predicted by sampling from the process density, p(xt|xt-1 = s†²t(n)) which encompasses the kinematic model, clone-body-model and cost function minimisation. In this prediction step both edge and region information is used. The edge information is used to directly match the image gradients with the expected model edge gradients. The region information is also used to directly match the values of pixels in the image with those of the clone-body-model’s 3D colour texture map. The prediction step involves minimizing the cost functions (measurement likelihood density): edge error Ee using edge information (see Equation 2 in Appendix): region error Er using region information(see Equation 3 in Appendix): 3. Observation density: Measure and weigh the new position in terms of the observation density, p(zt|xt). Weights wt = p(zt|xt = st) are estimated and then weights ∑nw(n) = 1 are normalized. The new position in terms of the observation density, p(zt|xt) is then measured and weighed with forward smoothing: Smooth weights wt over 1..t, for n trajectories Replace each sample set with its n trajectories {(st,wt)} for 1..t Re-weight all w(n) over 1..t Trajectories tend to merge within 10 frames O(Nt) storage prunes down to O(N) In this research, feedback from the CHMR system utilizes the large training set of skills to achieve an even larger reduction of the search space. In practice, human movement is found to be highly efficient, with minimal DOFs rotating at any one time. The equilibrium positions and physical limits of each DOF further stabilize and minimize the dimensional space. With so few DOFs to track at any one time, a minimal number of particles are required, significantly raising the efficiency of the tracking process. Such highly constrained movement results in a sparse domain of motion projected by each motion vector. Because the temporal variation of related joints and other parameters also contains information that helps the recognition process infer skill boundaries, the system computes and appends the temporal derivatives and second derivatives of these features to form the final motion vector. Hence the motion vector includes joint angles (32 DOF), body location and orientation (6 DOF), centre of mass (3 DOF), principle axis (2 DOF) all with first and second derivatives.
PerformanceHundreds of jumps and leaps were tracked and classified using a 2GHz, 640MB RAM Pentium IV platform processing 24 bit colour within the Microsoft DirectX 9, Intel OpenCV environment under Windows XP. The video sequences were captured with a Logitech USB 2.0 camera at 30 fps, 320 by 240 pixel resolution. Each person jumped in front of a stationary camera with a static background and static lighting conditions with minimal shadows. Only one person was in frame at any one time. Tracking began when the whole body was visible which enabled initialisation of the clone-body-model. The skill error rate quantifies CHMR system performance by expressing, as a percentage, the ratio of the number and magnitude of joint angle tracking errors to the number of joint angles in the reference set. Depending on the skill, CHMR system skill error rates can vary by an order of magnitude. The CHMR system results are based on a set of a total of 240 jump patterns, from straight jumps and split leaps (Figure 5) to jumping backward into flic-flacs (Figure 4). These were successfully tracked and evaluated with their respective biomechanical components quantified where a skill error rate of only 3.8% was achieved. Motion blurring lasted about 10 frames on average with the effect of perturbing joint angles within the blur envelope. Given a reasonably accurate angular velocity, it was possible to sufficiently de-blur the image. There was minimal motion blur arising from rotation about the longitudinal axis during a twisting salto due to a low surface velocity tangential to this axis from minimal radius with limbs held close to a straight body shape. This can be seen in Figure 6 where the arms exhibit no blurring from twisting rotation, contrasted with motion blurred legs due to a higher tangential velocity of the salto rotation. The CHMR system also failed for loose clothing. Even with smoothing, joint angles surrounded by baggy clothes permutated through unexpected angles within an envelope sufficiently large as to invalidate the tracking and evaluation.
ConclusionsThe 3.8% error rate attained in this research is not yet evaluating a natural world environment nor is this a real-time system with up to seconds to process each frame. The CHMR system did achieve 96.2% accuracy for the reference test set of skills. Although this 96.2% recognition rate was not as high as the 99.2% accuracy achieved, a larger test sample of skills were evaluated in this paper. To progress towards the goal of lower error rates, the following improvements seem most important: The results suggest that this approach has the potential to assist coaches and athletes optimise jump based skills during regular sessions by automatically displaying and logging biomechanical parameters of specific skills involving jumping and leaping.
|
AUTHOR BIOGRAPHY |
|
Richard Green |
Employment: Senior Lecturer. |
Degree: BSc, ME, PhD |
Research interests: Computer vision, biomechanics, biomedical. |
E-mail: richard.green@canterbury.ac.nz |
|
|
|
REFERENCES |
Badler N.I., Phillips C.B., Webber B.L. (1993) Simulating humans. New York, NY. Oxford University Press.
|
Cham T.J., Rehg J. M. (1999) Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.
|
Deutscher J., Blake A., Reid I. (2000) Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.
|
Deutscher J., Davison A., Reid I. (2001) Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.
|
Drummond T., Cipolla R. (2001) Proceedings of IEEE International Conference on Computer Vision. Real-time tracking of highly articulated structures in the presence of noisy measurements.
|
Isard M.A., Blake A. (1996) 4th European Conference on Computer Vision. Visual tracking by stochastic propagation of conditional density. England. Cambridge.
|
Isard M.A., Blake A. (1998) Proceedings of 6th International Conference on Computer Vision.
|
Kakadiaris I., Metaxas D. (1996) Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.
|
MacCormick J., Isard M.A. (2000) Proceedings of European Conference on Computer Vision. Partitioned sampling, articulated objects and interface-quality hand tracking.
|
Moeslund T.B., Granum E. (2001) A survey of computer vision-based human motion capture. Computer Vision and Image Understanding 18, 231-268.
|
Pentland A., Horowitz B. (1991) Recovery of nonrigid motion and structure. IEEE Transactions on PAMI 13, 730-742.
|
Pheasant S. (1996) Anthropometry, ergonomics and the design of work. Bodyspace. Taylor & Francis.
|
Rehg J.M., Kanade T. (1995) Proceedings of Fifth International Conference on Computer Vision.
|
Sullivan J., Blake A., Isard M., MacCormick J. (1999) Proceedings of 7th International Conference on Computer Vision.
|
Wren C., Azarbayejani A., Darrell T., Pentland A. (1997) “Pfinder: Real-time tracking of the human body”. IEEE Transactions on PAMI 19, 780-785.
|
|
|
|
|
|
|