|
MODELLING A TENNIS MATCH
FORWARD
RECURSION
The state of a tennis match between two players is represented by
a scoreboard. The scoreboard shows the points, games and sets won
by each player, and is updated after each point has been played.
It is assumed that the conditional probability of the server winning
the point depends only on the data shown on the scoreboard. This
enables the progress of the match to be modelled using forward recursion.
An additional assumption is that the probabilities of each player
winning a point on his own service remain constant throughout the
match.
DEVELOPMENT
OF GENERATING FUNCTIONS OF DISTRIBUTIONS
The forward recursion enables the probabilities of various possible
scoreboards to be calculated. These probabilities can be collected
in the form of probability generating functions, or moment generating
functions (using the transformation v
= eu).
Lemma:
If X and Y are independent random variables and Z
= X + Y then: mZ(t) =
mX(t) * mY(t).
It becomes convenient at times to take logarithms, and work in terms
of cumulant generating functions, since KZ(t)
= KX(t) + KY(t).
The higher order cumulants depend on powers of the scale for the
random variable, and for the purposes of communication it is useful
to transform them into non-dimensional statistics (i.e. numbers)
such as the coefficients of variation, skewness and kurtosis.
THE
INVERSION OF THE CUMULANTS USING NORMAL POWER APPROXIMATION
This gives a continuous approximation to a discrete distribution
(Pesonen, 1975).
The formula is asymptotic and works reasonably well for unimodal
distributions with the coefficient of skewness less than 2 and the
coefficient of kurtosis less than 6. i.e. tails die off at least
as fast as the exponential distribution.
THE NUMBER OF POINTS IN A GAME
Let
X be a random variable of the number of points played in
a game. Let fpgA(x) represent
the distribution of the number of points played in a game for player
A serving, where fpgA (x) =
P(X = x). This gives the following:
f
pgA(4) = NpgA(4,0)
+ NpgA(0,4)
f
pgA(5) = NpgA(4,1)
+ NpgA(1,4)
f
pgA(6) = NpgA(4,2)
+ NpgA(2,4)
f
pgA(x) = NpgA(3,3)[p2A
+ (1 - pA)2][2pA(1
- pA)] (x-8)/2
if x = 8, 10, 12, .....
where:
NpgA(a,b) represents the probability
of reaching point score (a,b) in a game for player A serving.
pA represents the probability of player A winning
a point on serve.
Croucher,
1986
gives
algebraic expressions for calculating NpgA(a,b).
Let
m(t) denote the moment generating function X.
Generating functions can be used to describe a distribution, such
as f pgA(x) for all x.
It is well established (Stuart and Ord,1987) that the mean, variance,
coefficient of skewness and coefficient of kurtosis of X can
be obtained from generating functions.
The
moment generating function for the number of points in a game for
player A serving, mpgA(t), becomes:
∑xetxf pgA(x)=e4tf
pgA(4)+e5tf
pgA(5)+e6tf
pgA(6)+
[NpgA (3,3)(1-NpgA(1,1))e8t] / [1-NpgA(1,1)e2t]
The
mean number of points in a game MpgA ,
with the associated variance VpgA are
calculated from the moment generating function using Mathematica
and given as:
MpgA
= 4{pA
(1-pA)[6pA2 (1-pA)2 -1]-1}/{1-2pA(1-pA)}
VpgA = 4pA
(1-pA)[1-pA (1-pA)( (1-12pA) (1-pA)(3- pA
(1-pA)
(5+12pA2) (1-pA)2)))] / [1-2pA(1-pA)]2
Similar
expressions can be obtained for the coefficient of skewness SpgA,
and the coefficient of kurtosis KpgA .
Let
UpgA represent the standard deviation
of the number of points in a game for player A serving. Let
CpgA represent the coefficient of variation
of the number of points in a game for player A serving. It follows
that UpgA = √V pgA
and CpgA = UpgA
/ MpgA.
Table
1 represents MpgA , UpgA
, CpgA , SpgA
and KpgA for different values of
pA. Notice that the mean and standard deviation
are greatest when pA = 0.50, but the coefficients
of skewness and kurtosis are greatest when pA approaches
1 or 0. The generating functions to follow are for player A serving
first in the tiebreaker game or set.
The
moment generating function for the number of points in a tiebreaker
game, mpgTA(t) becomes:
mpgTA(t) =
e7tf pgTA(7)+e8tf
pgTA(8)+e9tf
pgTA(9)+
e10tf pgTA(10)+e11tf
pgTA(11)+e12tf
pgTA(12)+
Npg TA (6,6)(1-Npg
TA (1,1))e14t /
[1-Npg TA (1,1)e2t]
where:
fpgTA(x) represents the distribution
of the number of points played in a tiebreaker game.NpgTA(a,b)
represents the probability of reaching point score (a,b)
in a tiebreaker game.
The
moment generating functions for the number of games in a tiebreaker
set, mgsTA(t) and advantage
set, mgsA(t) become:
mgsTA(t)=e6tf
gsTA(6)+e7tf
gsTA(7)+e8tf
gsTA(8)+
e9tf gsTA(9)+e10tf
gsTA(10)+e12tf
gsTA(12)+ e13tf
gsTA(13)
mgsA(t) =
e6tf gsA(6)+e7tf
gsA(7)+e8tf
gsA(8)+e9tf
gsA(9)+ e10tf
gsA(10)+ NgsA
(5,5)(1-NgsA(1,1))e12t
/ [1- NgsA (1,1)e2t]
where:
fgsTA(x) represents the distribution
of the number of games played in a tiebreaker set. fgsA(x)
represents the distribution of the number of games played in an
advantage set. NgsA(c,d)
represents
the probability of reaching (c,d) in an advantage set.
THE NUMBER OF POINTS IN A SET
THE PARAMETERS OF DISTRIBUTIONS OF THE NUMBER OF POINTS
IN A SET
Let
mpgA+(t) and mpgA-(t)
be the moment generating functions of the number of points in a
game when player A wins and loses a game on serve respectively.
Let mpgB+(t) and
mpgB-(t) be the moment
generating functions of the number of points in a game when player
B wins and loses a game on serve respectively. Let s(c,d)
be the moment generating function of the number of points
in a set conditioned on reaching game score (c,d). It can
be shown that
s(6,1) = 3[mpgA+(t)]3[mpgB-(t)]2[mpgA+(t)mpgB+(t)
+ mpgA-(t)mpgB-(t)]
and
s(1,6) =3[mpgA-(t)]3[mpgB+(t)]2[mpgA+(t)mpgB+(t)+ mpgA-(t)mpgB-(t)].
Similar
conditional moment generating functions can be obtained for reaching
all score lines (c,d) in a set. The moment generating function for
the number of points in a tiebreaker set becomes:
mpsTA(t) = NgsTA(6,0)s(6,0)
+ NgsTA (6,1)s(6,1)+
NgsTA(6,2)s(6,2)+NgsTA(6,3)s(6,3)
+NgsTA(6,4)s(6,4)+NgsTA(7,5)s(7,5)+NgsTA(0,6)s(0,6)+
NgsTA(1,6)s(1,6)+NgsTA(2,6)s(2,6)+
NgsTA(3,6)s(3,6)+
NgsTA(4,6)s(4,6)
+NgsTA (5,7)s(5,7)+
NgsTA (6,6)s(6,6)mpgTA
(t)
A
similar moment generating function can be obtained for the number
of points in an advantage set.
Let
MpsA , UpsA , CpsA
, SpsA
and KpsA represent the mean, standard
deviation, and coefficients of variation, skewness and kurtosis
for the number of points
in an advantage set. Let MpsTA , UpsTA
, CpsTA , SpsTA and
KpsTA represent the mean, standard
deviation, and coefficients of variation, skewness and kurtosis
for the number of points in a tiebreaker set. Table
2 represents MpsA , UpsA
, CpsA , SpsA ,
KpsA , MpsTA
, UpsTA , CpsTA , SpsTA
and KpsTA for different values
of pA and pB. The table
covers values in the interval 0.50 ≤ pA ≤
pB ≤ 0.75 as this is the main area of interest
for men’s tennis. It can be observed that: MpsA
> MpsTA , UpsA
> UpsTA , CpsA
> CpsTA, SpsA > SpsTA
and KpsA > KpsTA
.
The
mean number of points in a set is affected by the mean number of
points in a game and the mean number of games in a set. The mean
number of points in a game is greatest when pA or
pB = 0.50. For a tiebreaker set, when pA
= pB = 0.50, MpgA
= MpgB = 6.75, MgsTA
=9.66 and MpsTA = 65.83. When pA
= pB = 0.70, MpgA
= MpgB = 5.83, MgsTA
= 10.94 and MpsTA = 66.22. For
this latter case, even though the mean length of games is shorter,
the mean number of points in a tiebreaker set overall is
greater since more games are expected to be played. Both
players have a 0.90 probability of holding serve, which means that
very few breaks of serve will occur and there is a 0.38 probability
of reaching a tiebreaker. This is further exemplified in
an advantage set, where for pA = pB
= 0.70, MpsA = 86.43. This
is also highlighted by the coefficients of variation, skewness and
kurtosis being much greater for an advantage set, compared to a
tiebreaker set, when pA and pB are
both “large”.
APPROXIMATING
THE PARAMETERS OF DISTRIBUTIONS OF THE NUMBER OF POINTS IN A SET
The
moment generating function for the number of points in an advantage
set mpsA(t), when pA
= 1 - pB, becomes:
mpsA(t)=[fgsA(6)](mpgAB)6+[fgsA(7)](mpgAB)7+[fgsA(8)]
(mpgAB)8 +[fgsA(9)](mpgAB)9
+ [fgsA(10)](mpgAB)10
+ NgsA
(5,5)(1-NgsA
(1,1))(mpgAB)12 /
[1-NgsA (1,1)(mpgAB)2]
where:
mpgAB(t) = [mpgA(t)+mpgB
(t)]/2 is the average (in this case equal) of two moment
generating
functions.
Taking
the natural logarithm of the moment generating function gives an
alternative generating function known as the cumulant generating
function. Let κpgA(t)=ln[mpgA(t)]
represent the cumulant generating function for the number of
points in a game. This relationship can be inverted to give
mpgA (t) = exp(κpgA(t)).
The
moment generating function, mpsA (t),
can be written as:
mpsA(t) = f gsA(6)exp(6κpgAB(t))+f
gsA(7)exp(7κpgAB(t))+
f gsA(8)exp(8κpgAB(t))+f
gsA(9) exp(9κpgAB(t))+
f gsA(10)exp(10κpgAB(t))+NgsA(5,5)exp(12κpgAB(t))
[1-NgsA(1,1)]/[1-NgsA(1,1)
exp(2κpgAB(t))],
when pA = 1 - pB
where:
κpgAB(t) = [κpgA(t)+
κpgB (t)]/2 is the average
(in this case equal) of two cumulant generating
functions.
This
can be expressed as:
mpsA(t) = mgsA
(κpgAB(t)) (1)
Similarly,
the following result is established for mpsTA(t),
when pA = 1 - pB:
mpsTA(t)=mgsTA(κpgAB(t))+NgsTA(6,6)exp(12κpgAB(t))
(exp(κpgTAB(t))-exp(κpgAB(t))) (2)
Notice
the last term does not vanish due to the difference in the scoring
system for a tiebreaker game compared with a regular game. Equations
(1) and (2) can be used to obtain approximate results for the parameters
of distributions for the number of points in a set, when pA
is not equal to 1 - pB.
THE
NUMBER OF POINTS IN A MATCH
From this point an advantage match is considered as a match where
the first four sets played are tiebreaker sets and the fifth set
is an advantage set.
The
moment generating functions for the number of points in an advantage
and tiebreaker match, mpm(t) and mpmT(t),
when pA = 1 - pB become:
mpmT(t) = msm(κpsTAB
(t))
mpm(t) = msm(κpsTAB
(t)) + Nsm(2,2) exp(4κpsTAB (t))
(exp(κpsAB (t)) - exp(κpsTAB (t)))
where: κpsTAB (t)
= [κpsTA (t)+
κpsTB (t)]
/ 2 and κpsAB (t)
= [κpsA (t)+
κpsB (t)]
/ 2
The following approximation results can be established for the number
of points in a match, similar to the approximation results established
for the number of points in a set:
mpmT(t) ≈ msm(κpsTAB
(t)) for all values of pA and pB.
mpm(t) ≈ msm(κpsTAB
(t)) + Nsm(2,2) exp(4κpsTAB (t)) (exp(κpsAB (t)) - exp(κpsTAB (t))) for all values of pA
and pB.
Approximation
results for distributions of points in a match, could also be established
for tennis doubles by using the above results established for singles.
The probability of a team winning a point on serve is estimated
by the averages of the two players in the team.
When
pA = 1-pB, the distribution
of number of points played each set if player A serves first in
the set, is equal to the number of points played each set if player
B serves first in the set. This leads to the following
result:
The
number of points played each set in a match are independent, if
pA = 1 - pB.
Suppose
Z=X+Y, where X and Y are independent,
then it is well known that mZ(t) = E[eZt]=E[eXt]E[eYt]=mX(t)mY(t).
By taking logarithms it follows that κZ(t)
= κX(t) + κY (t).
An extension of this property of cumulants is given by the following
theory (Brown, 1977)
and can be applied to points in a tiebreaker match when the number
of points played each set in a match are independent. When the independence
assumption fails to hold the theory remains approximately correct
according to the approximation result established for points in
a tiebreaker match.
|