Several authors have carried out probabilistic analyses of tennis (Carter and Crews, 1974; Miles, 1984). A common assumption is that player A has a constant probability PA of winning a point on his/her service and that player B also has a constant probability PB of winning a point on service. Under this assumption and the assumption that points are independent, it can be shown that the better player does not always win and that each player has a constant probability of winning each set, no matter who serves first in the set (Pollard, 1983). Player A is the better player if PA is greater than PB.
There is little published research on testing whether players do have constant probabilities on service, that points (and hence games and sets) are independent and identically distributed (iid). A ‘first game effect’ in a match, namely that fewer breaks occur in the first game of the match, has been identified (Magnus and Klaassen, 1999). However, it would appear that any non-iid effects such as the ‘hot- hand effect’ (in which winning a point, game or set increases ones chances of winning the next point, game or set) and the opposite effect, the ‘back- to-the-wall effect’, are small when analyzing large data sets (Klaassen and Magnus, 2001).
Many players believe, and commentators often state, that the winner of a set of tennis is not infrequently determined by merely a couple of points within that set. Given that a set lasts about (say) 60 points on average, and the couple of critical points can occur almost anywhere in the set, it would appear to be difficult to use statistical methods to identify a couple of non-iid points amongst approximately 60 other iid points. It would be like ‘searching for a needle in a haystack’.
In this paper we focus on sets rather than points. If sets are not iid, it follows that points and games cannot be strictly iid, even if only a very small percentage of points contribute to the non-iid nature of the data. The data consists of ten years (1995 to 2004) of the four major annual tournaments for men’s singles. These tournaments are the Australian Open, the French Open, Wimbledon and the US Open, are known as the Grand Slam tournaments, and are played on different types of surfaces. Using W to represent a set won by the eventual winner of the match and L to represent a set lost by the eventual winner, there are several possible match categories from WWW to LLWWW. Each of the 4883 singles matches for this period were classified into the relevant categories, and the frequencies of the categories were analysed to check for lack of independence of set outcomes.
Assuming without loss of generality that player A is the better player, the results of a best-of-five sets singles match can be recorded as WWW, WWLW, WLWW, LWWW, WWLLW, WLWLW, WLLWW, LWWLW, LWLWW, LLWWW, and LLL, LLWL, LWLL, WLLL, LLWWL, LWLWL, LWWLL, WLLWL, WLWLL and WWLLL where W represents a set won by player A, and L represents a set lost by player A. When we do not know who the better player is, a win in three sets for example (WWW or LLL above) is simply a win WWW to the winner of the match (not necessarily player A). Thus, when we do not know who the better player is, the above twenty outcomes reduce to the ten mutually exclusive outcomes WWW, WWLW, WLWW, LWWW, WWLLW, WLWLW, WLLWW, LWWLW, LWLWW and LLWWW where W represents a set won by the eventual winner of the match and L represents a set lost by the eventual winner.
The data consisted of ten years of men’s singles grand slam results. There were 4883 matches in total, and spurious data such as matches where one player ‘retired’ (presumably injured) before the match was finished were omitted. The number of matches in each of the above categories was:
WWW 2330; WWLW 503; WLWW 487; LWWW 609; WWLLW 151; WLWLW 135; WLLWW 186; LWWLW 138; LWLWW 156; LLWWW 188
The first model fitted involved a constant probability, p, of player A (the notionally or theoretically better player) winning each set. A short and simple search using a spreadsheet showed that the value of p which minimized Chi-Squared was 0.769, and the results are given in Table 1. For example, the expected value for the row WWW in Table 1 is 4883*(0.769*0.769*0.769 + 0.231*0.231*0.231) = 2280.77, allowing for both a win and a loss by the theoretically better player.
The value of Chi-Squared was 38.68 with 8 degrees of freedom, so the fit is a poor one. This is not surprising as a constant p-value for all matches is clearly unrealistic. It can be seen from the Obs-Exp column in Table 1 that there was a greater number of three sets and five sets results observed than was expected under this model. Also, for the four sets matches, this model underestimated the number of LWWW matches, and overestimated the other two categories. Similarly, for the five sets matches, the model underestimated the number of WLLWW and LLWWW matches.
In order to attempt to overcome the shortage of three and five sets matches expected under the above model, it was decided to model the data using two p values, one greater than 0.769 and the other less than it, and combine the results. The value greater than 0.769 would increase the proportion of three set matches, and the value less than 0.769 would increase the proportion of five set matches. Thus, for simplicity, the data was modeled as consisting of 2 types of matches-‘close’ matches (with p less than 0.769) and ‘not- so- close’ matches (with p greater than 0.769).
Half the matches were assumed to be ‘close’, and half ‘not- so- close’. Symmetric values about 0.769, p1 and p2, were considered, and the two p values which minimized Chi-Squared were identified. These two values were p1 = 0.705 and p2 = 0.833. The results for this model are given in Table 2. For example, the expected value for row 4 (LWWW) of Table 2 is given by 4883(0.5*((1-p1)*p1*p1*p1 + p1*(1-p1)*(1-p1)*(1-p1)) + 0.5*((1-p2)*p2*p2*p2 + p2*(1-p2)*(1-p2)*(1-p2))) = 541.71, allowing for both a win and a loss by the theoretically better player.
The value of Chi-Squared for this model was 35.59 with 7 degrees of freedom, so the fit is again a poor one. Whilst this is a better fit with respect to the proportion of three and five set matches, the number of LWWW matches is still underestimated under this model, as is the number of WLLWW and LLWWW matches.
It is noted here as an aside that if we remove the restriction that exactly half of the matches have a p-value of p1 and half of them have the value p2 whilst keeping p1 = 0.705 and p2 = 0.833, a slightly smaller value of chi-squared can be obtained. The lowest Chi-Squared value obtained was 34.35 with 6 degrees of freedom when the proportion of matches with p1 = 0.705 was 0.53, and the proportion of matches with p2 = 0.833 was 0.47. Thus, for this model (and indeed for the others considered in this paper), modifying the proportion of ‘close’ and ‘not- so- close’ matches had negligible effect on the Chi-Squared values. For this reason, no further reports on this modification are given in this paper.
It can be seen from Table 2 that, under this model, the expected number of matches in each of the 3 four sets categories are equal. Correspondingly, the expected number of matches in each of the 6 five sets categories are also equal. It is clear that this characteristic remains true even if we fitted more (or even many many more!) than just two p values to the data. Further, it follows that if the p-value is constant for each set within each match (but possibly different for each of the 4883 matches) the expected number of matches in each of the 3 four set categories would be equal, and that the expected number of matches in each of the 6 five set categories would also be equal. It is possible to fit the best-fitting model to this data such that the 3 four set categories have equal expected values and the 6 five set categories also have equal expected values. Note that this is simply a data fitting exercise, and that there is no assumed underlying p-value(s) such as in the above analyses. When this is done, the expected values for the three, four and five set categories are 2322.2, 534.0, and 159.8 respectively, and the Chi-Squared value is 33.17 with 7 degrees of freedom. Again the fit is not a good one and we conclude that the p-values for each set (within each match) are not constant.
It can be seen from Table 2 that the (Obs-Exp) value was positive for the categories LWWW, WLLWW, LWLWW and LLWWW. These categories represent situations in which the winner (typically, but not always, the better player, player A) was behind (in sets) at some stage in the match. Thus, the data suggests that the better player might ‘try harder’ or ‘lift his game’ in situations in which he is behind. In order to address this ‘trying harder when behind’ effect, it was assumed that player A lifted his probability of winning a set by D1 when he was behind in the set score. A closer look at the data also suggests that player A might be ‘on- a- roll’ when he has just won a set and as a consequence lifts his probability of winning the next set. In order to address this ‘on- a- roll’ effect, it was assumed that player A lifted his probability of winning a set by D1 when he won the previous set. The categories WLLWW and LLWWW noted above represent situations in which the winner (probably more often player A) lost two sets in a row. These are situations in which player A has a real need to make an extra special effort to lift his game. Thus, it was further assumed that player A lifted his probability of winning a set by an amount D2 (anticipated to be somewhat bigger than D1) for the remainder of the match immediately after having lost two sets in a row (there are 3 such match categories). It is for reasons of simplicity that the parsimonious model with only two lifted levels was tested.
Given that p1 and p2 are increased by D1 or D2 in certain situations, it seemed appropriate, in order to get a reasonable overall fit, to lower both their ‘starting’ values (ie, those for set1) from those in Table 2. Given this, the notion of symmetric p-values about 0.769 also seemed irrelevant. The values of p1 and p2, D1 and D2 which minimized Chi-Squared were p1 = 0.704 and p2 = 0.798, D1 = 0.035 and D2 = 0.110, and the results are given in Table 3. For example, the expected value for the number of LLWWW matches is 4883 (0.5 * ((1-p1) * (1-p1-D1) * (p1+D2) * (p1+D2) * (p1+D2) + p1 * (p1+D1) * (1-p1-D1) * (1-p1) * (1-p1-D2)) + 0.5 * ((1-p2) * (1-p2-D1) * (p2+D2) * (p2+D2) * (p2+D2) + p2 * (p2+D1) * (1-p2-D1) * (1-p2) * (1-p2-D2))) = 186.68.
The value of Chi-Squared was 1.83 with 5 degrees of freedom, so the fit is a good one indicating that the model fits the data well.
In order to carry out a simple check on the model, it was decided to break the data into two time periods (1995-1999 and 2000-2004), and check for consistency across the periods (Table 4). The above parameter values or estimates for p1, p2, D1 and D2 based on the full 10 year period 1995 to 2004 were used ‘as estimates’ for the period 1995 to 1999 (2448 matches) and for the period 2000-2004 (2435 matches). The fits were surprisingly good, with Chi-Squared values of 1.81 and 3.00 respectively. ( It is clearly quite likely that lower values of Chi-Squared could be obtained by fitting p1, p2, D1 and D2 values specific to each period, but there is little point in doing this.
There appeared to be no evidence in the data that the weaker player could lift his game in situations where it would have been useful for him to do so.
The 4883 completed men’s singles matches at grand slam tournaments for the period 1995-2004 have been analysed to test the hypothesis that the probability of winning a set within a match is constant. This hypothesis was rejected.
A model which fits the data well has been found. It is a model in which the better player lifts his probability of winning a set in certain situations. These situations are
(i)when he is behind in the set score, needs to lift his game, and lifts his probability of winning the next set by (on average) 0.035,
(ii)when he has just won a set, is ‘on-a-run’’, and lifts his probability of winning the next set by (on average) 0.035, and
(iii)when he has just lost two sets in a row, desperately needs to lift his game, and lifts his probability of winning each remaining set by (on average) 0.110.
The results of this study are quite encouraging for the better player, but perhaps somewhat discouraging for the weaker player. The findings indicate that the weaker player needs to be ‘on his guard’ for a change in fortunes when the match is ‘going well’ for him.
The results of the analysis in this paper show that often the better player can increase his probability of winning a set by quite a substantial amount when it is really necessary to do so in order to reduce his probability of losing the match. A set can often be won rather than lost by winning just one, two, or a few particular important points (Morris, 1977). Thus, it would appear from the analysis in this paper that the better player is more able to lift his play on particularly important points than is the weaker player.
Further studies might include whether women’s matches (although only best-of-three sets) have comparable characteristics or whether there are gender differences in this regard.
It would appear that the methodology used in this paper has a range of sporting applications, particularly for the often occurring situation in which the better player or team does not always win a match, or the ‘best’ player or team does not always win a series of matches. Another area of application might be assessment in which the ‘best’ student (or persons being assessed) does not always come first.