Analysis of Pushing Forces During the Water Polo Eggbeater: Reliability and Validity of a Novel Approach

Water polo players benefit from greater odds of success when maintaining their tactical position against their opponents. This study evaluated the reliability and validity of a water-based resistance test to replicate this skill.Thirty-three water polo players participated in this study (19 males and 14 females, 14 from senior and 19 from junior national teams). Data were collected during two regular training sessions, separated by one week, using a load cell to instrument a weight stack resistance setup on the pool deck. Performance parameters such as mean force, maximum force, mean peak force and total impulse were defined with custom Python scripts. Test-retest reliability was assessed using intra-class correlations (ICC3,1). Group comparisons were explored be-tween male and female players. Level of significance was set at p < 0.05. The reliability findings were high to very high for the mean force, maximum force, mean peak force, inter-stroke range, and total impulse (ICC 0.85 - 0.93, p < 0.01). Group comparisons showed significantly greater values in male players for these variables (p < 0.01, ES = 1.05 - 9.36) with large to very large effect sizes. However, there was no significant difference in endurance measured between sexes (p = 0.88, ES = 0.04). This study presents a methodology with satisfactory metrological qualities for field applications using simple and affordable equipment. The testing apparatus presented in this study can readily be replicated in a variety of training environments by practitioners working with water polo teams. Coaches can use this approach to evaluate individual player progress or to compare performance across a group of water polo players.


Introduction
Water polo player positions include goalkeepers and various field positions, such as drivers, centers, and defenders (Smith, 1998).The last two provide a key tactical contribution to the team by occupying the space immediately in front of the goalkeeper cage, which yields a much higher scoring chance than shooting from the periphery (Perazzetti et al., 2023a).A recent study of over 5000 goals from NCAA matches concluded that for every meter further from the center position, the odds of scoring decreased by 29% (Gullikson et al., 2020).Therefore, the ability to maintain this position against the push-back from the defensive team separates successful center forwards from the others (Botonis et al., 2019).From a defensive perspective, the ability to push these invading players away from the goalkeeper cage is likewise a key tactical advantage (Lupo et al., 2016;Perazzetti et al., 2023b).Unfortunately, landbased measures of lower extremity dynamic strength show very low correlation with water-based abilities in water polo players (Platanou, 2005).Indeed, water polo players produce this type of movement with a different action than dryland jumping using the eggbeater kick, by "alternating the circular and continuous movements of the legs, producing an upward force and maintaining players afloat in a vertical position" (Uljevic et al., 2013).Therefore, sport-specific evaluations are required.
While the ability to push opponents is closely associated with players' strength, there are limited sport-specific approaches to assess these qualities in the daily training environment (Sanders, 1999a).Often, the description of leg strength relies on the ability to jump high out of the water (Gobbi et al., 2013;Platanou and Varamenti, 2011) or to swim across the pool in an upright position with the arms out of the water as quickly as possible (McCluskey et al., 2010).Alternately, players have been tested by measuring time to swimming upright exhaustion with gradual increasing weighed vests (Melchiorri et al., 2015).This method allowed more rigor in the execution of the task and showed that expert players were able to sustain these loads for significant more time than non-expert water polo players (p < 0.01, effect size 3.5 -6.0).However, the setup for this experiment requires the players to be upright and are not specific to the ability to push an opponent in the water in a horizontal plane.Instead, other authors have described stationary swimming tests where athletes swim away from a load cell while being attached with elastic tethers to the edge of the pool (Muniz-Pardos et al., 2019).The amount of resistance provided with elastic tethers in such a setup is not continuous, as it increases proportionally to the distance away from the starting point.Therefore, analyses of the forces expressed by participants against this type of resistance cannot inform coaches about the cyclical patterns employed to accomplish the push (Bratuša and Dopsaj, 2016;Dopsaj et al., 2003).Nevertheless, it has been used successfully in swimmers to evaluate mean force and maximum force in 10-, 30-or 60-seconds challenges (Dopsaj, 2010;Dopsaj et al., 2003;Joaquim Baratto de Azevedo et al., 2021).In water polo, a comparable approach was used with 28 youth players who were attached at the end of a non-elastic rope to a fixed point and instructed to perform 10-second maximal swim tests using the lower body only (Stirn et al., 2014).This method provides a resulting force curve to analyze for every individual with which the

Research article
authors further proposed that alternating eggbeater motion yielded greater average force compared with simultaneous motion (128 ± 26N vs 111 ± 22N).The greatest maximal forces were recorded with the simultaneous motion however (244 ± 37N vs 189 ± 36N), which has implications for using the optimal movement based on the duration of the task.This approach still cannot directly discriminate between participants' performance at pushing the same amount of resistance.
In training, coaches also use the resistance provided by a stationary weight stack or containers filled with water tied with a long rope to provide a constant resistance against the players in the water (Muniz-Pardos et al., 2019).In theory, this type of resistance would simulate more closely the sustained task requirements of pushing an opponent in the water.Consequently, we propose a method to measure the forces generated while performing an eggbeater kick by instrumenting this type of apparatus.By embedding the instrument within training equipment, coaches can readily repeat these measurements and evaluate the changes obtained from each training cycle (Abernethy et al., 1995).
The main goal of this study was to develop and validate a method to assess sport-specific pushing forces in water polo players.The hypotheses related to these metrological properties were that: (1) reliability should be sufficient for the appropriate implementation of this procedure in the daily training environment (intra-class correlation >0.80) and (2) male players would demonstrate greater strength compared to the female players.

Subjects
A total of 33 water polo players participated in this study.The male group included 19 players (3 international level and 16 national level), whereas the female level group included 14 players (11 international and 3 national level) (McKay et al., 2022).The mean age, height and weight were 19.5 ± 2.8 years, 187.1±5.1cm and 89.1 ± 13.1Kg for the male group (23.4 ± 3.6 years, 174.0 ± 5.5cm and 75.2 ± 12.7Kg for the female group).Anthropometric measurements were obtained prior to participation in the study in accordance with the standards from the International Society for Advancement of Kinanthropometry (International Society for Advancement of, 2001).All participants trained more than five days per week, reported no training restrictions or injury, and were part of either the junior or senior Canadian national water polo teams.Data was collected as part of regular training activities, and players provided their written consent for anonymized data to be included in research.The protocol was approved by the ethics board of École de Technologie Supérieure de Montréal in accordance with the principles of the Declaration of Helsinki (case number H20230401).

Procedures
One weight stack (Adjustable dual pulley system, Atlantis Strength Inc., Canada) with a four-pulley system was instrumented with a waterproof load cell (STS-1000, Chatillon Ametek, Florida, USA) with a sampling rate of 100 Hz and a maximum capacity of 1000 lbf.This portable and cost-efficient measurement device could be attached to any other resistance equipment used by a water polo club.A non-elastic rope of 10 m was attached to the load cell and connected to a thick rubber loop at its extremity (Figure 1).The alignment of the cord was such that the slope was near 0°, as previous authors have shown that varying the slope angle can change the resulting performance measures (Joaquim Baratto de Azevedo et al., 2021).In each instance, the tests were filmed with an underwater view with a Canon VIXIA camera model HFG20 at 60 Hz frame rate fixed to a metal pole on a rolling cart.The camera was positioned 1.5 m perpendicularly away from the participants at a depth of 1.0 m underwater.The video footage was added to provide experimental context, but not analyzed for further kinematic insights.Data was collected during regular training sessions after the players performed a typical water polo warmup (freestyle swimming first, then throwing activities for approximately 10 minutes and 15 minutes respectively).A subset of 27 players were tested a second time with a target of seven days between measurements (six players were unavailable for re-testing due to injury or travel commitments, 4 male and 2 female).The reliability subset of 27 players included 12 international players and 15 national level players.On each occasion, participants began the test in the water with the strap worn across the chest and the rope near its taut position, with the arms facing forward and their hands out of the water.The players' bodies were in a horizontal plane, parallel with the surface of the water, facing directly away from the weight stack (Figure 1).Coaches provided the mean weight resistance used by both the male and female teams in training, which was used as a standard resistance for each sex.A sex-specific weight resistance was chosen to reduce the bias from known differences in strength between male and female water polo players (Croteau et al., 2021).
Once in position, players were instructed to "push as far as they can without backing down" for ten seconds against a standard weight stack resistance (54.4Kg for women and 81.6Kg for men) using an alternating eggbeater kick with their hands outside of the water.The examiner used a commercial chronometer from a smartphone and the countdown was shouted out loud for the participants to hear.This duration was chosen based on coaches' experience of the typical time during which players are required to maintain position in front of the goalkeeper's cage, as well as the physiological parameters for strength evaluation in water polo (Keiner et al., 2020).This was reflected in studies of time-motion analysis of water polo as well, where the mean contact time between male players was 9.8 ± 3.4s (Platanou, 2004;Platanou and Geladas, 2006).It is also the same duration reported in similar studies of water polo players, therefore allowing comparison with previous authors (Dopsaj, 2010;Joaquim Baratto de Azevedo et al., 2021;Yanagi, 1995).

Data processing
Raw data were extracted from the load cell with the For-ceTest 3.1 software (Ametek, USA) and analyzed using a custom Python script.Data were filtered using the Savitzky-Golay algorithm (window = 21 and polynomial = 2).Force peaks were identified with the function find_peaks (prominence = 0, distance = 20, height = Fsetpoint/2, threshold = None, width = 10) included in ScyPi library.The start point was defined as the moment when the force became greater than half of the force set point.This removed the portion of the task where the elastic components of the setup were deformed (i.e., cord and rubber loop).The end of the measure was defined at the last force peak before the cessation of the experiment.Sometimes players performed further kicks after the end of the test or instead stopped too soon.To prevent these extra kicks to bias the results, we arbitrarily chose, among the last three force peak, the last one that did not differ by more than ± 20% from the previous force peak.Thus, we eliminated force peaks with an excessive variation (e.g., a double kick at the end of the task) and included only relevant strokes in the test measures.The force set point corresponds to the actual resistance from the weight stack that the athletes must overcome after considering the pulley system.It is expressed in Newtons from the resistance in pounds and then divided by four (number of pulleys in this weight stack): The filtered signal was used to calculate overall mean force, maximum force, mean peak force, inter-stroke force range, mean stroke duration and variability, number of strokes, total impulse, endurance index, and time to first peak.The endurance index represents a ratio between the total impulse performed during the task and an ideal scenario in which a player would have been able to maintain the set point force for the entire test duration.

Statistical analysis
Reliability was assessed through intra-class correlation (ICC3,1) with a two-way mixed effects and absolute agreement (Currell and Jeukendrup, 2008).Reliability was described as very high (ICC > 0.90), high (0.70 < ICC < 0.89), or moderate (0.50 < ICC < 0.69) (Plichta and Kelvin, 2012).Bland-Altman plots were drawn to illustrate bias and limits of agreement between the two sessions.(Hays and Reeve, 2008).The same comparisons were also made with results relative to body weight to reduce systematic bias from participant physical size (Croteau et al., 2021).These calculations were only performed on variables that minimally showed high to very high intra-class correlations to limit spurious findings.Effect sizes were interpreted as trivial (<0.2), small (<0.5), moderate (<0.8) or large (≥0.8) (Cohen, 2013).Normality of the data distribution was assessed for each variable using Kernel density plots and the Shapiro-Wilk test (Evans, 1996).The complete statistical analysis was conducted in R (version 4.1.0)(Team, 2022).All results are expressed with 95%CI and significance level set to p = 0.05.

Results
On average, the second test was performed 8.0 ± 3.1 days after the initial session.Overall mean force, maximum force, mean peak force, inter-stroke force range, total impulse, and endurance index showed a high level of reliability (ICC > 0.74, p < 0.01) (Table 1).Bland-Altman analyses showed small bias between repetitions for these variables, with an average relative bias of 4.05% (Figure 2).SEM are further described in Table 1 to provide absolute measures of error for each variable.Compared to the female players, the male group was composed of a significantly greater number of national level players (χ 2 (1) = 10.56,p < 0.01), and therefore was significantly younger (W = 50, p < 0.01) but also both taller (W = 275, p < 0.01) and heavier (W = 216, p < 0.01).Group comparisons iden-tified significantly greater values of large to very large effect size in male players for mean force, maximum force, mean peak force, inter-stroke range and total impulse (p < 0.01, ES = 1.05-9.36).However, the endurance index showed no difference between both groups (p = 0.88, ES = 0.04).All comparisons remained significant when evaluating relative strength measurements (p < 0.01 to 0.04), however effect sizes between groups were smaller (ES = 0.71 -1.22).See Table 2 for full list of comparisons.

Discussion
The main objective of this study was to evaluate reliability and validity of a standardized testing approach to strength in water polo players.These objectives were achieved with the development of novel parameters specific to the performance of a ten-second resisted eggbeater task.However, factors describing the onset of the task were not sufficiently reliable to help evaluate player performances.The outcome variables of mean force, maximum force, mean peak force, inter-stroke range and total impulse were useful to distinguish between male and female players.
In the current study, the male players (170 ± 12N mean force and 204 ± 7N maximum force) demonstrated greater force compared with the female players (122 ± 10N mean force and 144 ± 6N maximum force) (Figure 3).These female values are greater than those of 60 -112N reported by Yanagi et al (1995) in a group of 15 female Japanese water polo (Yanagi, 1995).The testing regimen evaluated vertical eggbeater performance in their study, which may explain the lower values.However, the male values resemble those reported previously by Dopsaj et al ( 2010) (140 ± 21N mean force and 191 ± 36N maximum force in 14 senior level male water polo players attached to a PVC rope) (Dopsaj, 2010) as well as Stirn et al (2014) (128 ± 26N mean force and 189 ± 36N maximum force in 28 youth male water polo players aged 14-16 years old with elastic tether) (Stirn et al., 2014).In contrast, Abad et al (2022) have recently reported higher values for mean (389 ± 70N) and maximum forces (672 ± 135N) in a group of 32 professional Brazilian water polo players aged 22.2 ± 4.4 years in a non-elastic rope tethered 10-second test (Abad et al., 2022).The range of values found here may be explained partly by age group differences, but it suggests that there may also still be persistent differences in testing methods (Taylor, 2001).
This current study differed from previous methods by imposing a fixed resistance on the water polo players executing the task as opposed to being tethered with elastic tubing or non-elastic ropes (Dopsaj, 2010;Dopsaj et al., 2003;Stirn et al., 2014).Reliability of these previous methods have shown good to very good results across a variety of swimming protocols and populations (Kjendlie and Thorsvald, 2005;Nagle Zera et al., 2021).Most often, authors found that mean force estimates were more reliable than maximum force means, with lower coefficients of variation (5.4 -8.9% vs 11.6 -14.7%, n = 13) (Taylor, 2001), greater internal consistency (Cronbach's α = 0.869 -0.995) (Dopsaj et al., 2003;Kjendlie and Thorsvald, 2005), and greater intra-class correlations (ICC = 0.975 vs 0.861, p < 0.001, n = 19) (Nagle Zera et al., 2021).In the current study, the reliability coefficients for overall mean force and maximum force productions showed comparable results (ICC = 0.91 [0.85 -0.95] and 0.93 [.088 -0.96] respectively), suggesting that within-subject variations are small across repeated sessions for these performance variables (Weir, 2005).However, time-based variables such as mean stroke duration, the standard deviation of mean stroke duration (Joaquim Baratto de Azevedo et al., 2021), the total number of strokes and the time to first peak showed poor correlation across the two testing sessions (ICC = 0.38 -0.46).
The results suggest that the early phase of the test is prone to measurement error because of the hysteresis of the system (Psycharakis et al., 2011).This was observed on video analysis, where the elastic components of the system yield a certain amount of deformation at the onset of the test.Consequently, time to first peak was not dependable between testing sessions (ICC = 0.41, 95%CI = 0.14 -0.62) in this study, which in turn made the estimation of a rate of force development entirely unreliable (ICC = 0.25, relative bias = 32.8%)(Koo and Li, 2016).The difficulty to obtain reliable values for these brief time windows are also reported by previous authors (Dopsaj, 2010).
Additionally, the current study also developed variables that characterize the ability of water polo players to maintain a high level of force output throughout the entire 10 seconds of the test.The first is an endurance index similar to what has been explored previously (Figure 4) (Morouço et al., 2012), whereas the second is a measure of total impulse (force x time).Of these, the most reproducible characteristic was impulse (ICC = 0.88, 95%CI = 0.79 -0.93), which can readily compare overall performance across participants using different techniques  to achieve the task goal.This information is most useful to coaches, as it can identify players whose technique relies on sudden bursts of force as opposed to a sustained force.Technical recommendations can directly result from these observations and improve performance.
The next objective of this study was to evaluate the validity of the test by comparing male and female players.Indeed, the male group showed significantly greater values for mean force, maximum force, mean peak force, interstroke range and total impulse (p < 0.01).Furthermore, these results showed a large to very large effect size across all five variables as well (ES g = 1.05 -9.36).Conversely, the endurance indices were not significantly different between the two groups (p = 0.88).This is consistent with expected strength differences between sexes (Bartolomei et al., 2021).Unfortunately, no other comparison of similar water-based strength tests is available between groups of water polo players.Instead, previous authors have explored construct validity for similar tests by measuring the correlation with swimming performances (Currell and Jeukendrup, 2008;Morouço et al., 2014).Overall, mean and maximum force production showed moderate to strong correlation with swimming performance across shorter distances (r = 0.57 -0.82, p < 0.01) (Joaquim Baratto de Azevedo et al., 2021;Nagle Zera et al., 2021).Altogether, these findings consistently demonstrate that pushing tasks are a valid method to assess strength performance in water polo players, with a reasonable relationship to ecological game situations (Melchiorri et al., 2020).

Limitations
The data for this study was collected as part of regular training sessions.This was done to use the same weight stack resistances that coaches use to train the participants.However, the consequence is that not all players began the testing process after having done the same warm-up.Given that this warm-up was self-paced, some began the test after choosing to perform more energetic routines.This may have biased the results, with some players having greater energetic reserves at the outset of the test.
The choice to position the hands forward and out of the water was made by the research team to standardize the test and compare results across athletes.However, match duels are also highly influenced by using the upper body, which was difficult to regulate in the current study.Additionally, the initial discussions with the coaching experts revealed that many alternative testing positions would be accessible with this technology, but the research team chose a protocol that could show adequate reliability as opposed to perfect ecological validity.
The target of testing all players one week apart was maintained for most participants.This period was preferred to a shorter comparison time because the weekly training sessions differ between days but repeat on a weekly cycle.However, six players were unavailable to repeat the testing because of academic commitments, and others were not present at the training session seven days later.Therefore, some testing was done on the closest available days to the retest date.This may have led to greater bias between sessions for those participants, but the sample is too small to ascertain.Future research may control for potential differences in training load between testing sessions using player-reported measures of rating of perceived exertion (RPE) (Croteau et al., 2023;Lupo et al., 2014).
The sample is comprised of players from both international and national levels, however the sex distribution between the groups is not balanced.The international players included significantly more females (p < 0.01) and were significantly older than the national level players (p < 0.01).Nevertheless, the use of standard resistance for males that was different than females was used to control for sex and decrease its effect on the outcomes.The participants volunteered for this study, and at the time of data collection, a greater portion of senior national team female players were present.A more balanced distribution of player levels would be optimal to evaluate the effect of player experience on these strength variables.Moreover, a larger sample could explore whether player position show consistent differences in performance.
Finally, future research could focus on evaluating the kinematic parameters collected with video analysis to investigate the relationship with kinetic variables described in this study.This would allow comparison of these findings with studies based primarily on analysis of eggbeater movement characteristics (Platanou, 2006;Sanders, 1999b).

Conclusion
The study successfully evaluated the reliability and validity of a water polo strength test against a fixed resistance.This study is original compared to previous work thanks to the addition of more kinetic parameters derived from the analysis of the force outputs.This fixed-resistance setup provides a more ecological condition for the players to demonstrate how they would succeed in resisting the same opponent.The instruments required are portable and cost-effective.The strength variables developed to assess the outcomes showed high to very high reliability, whereas timebased variables only showed small to moderate reliability.Validity was explored through known groups, where male players demonstrated higher strength, but not superior endurance to the female players.The practical applications of this testing apparatus are readily available for coaches and sport scientists working with water polo teams.Similar load cells can be attached to a weight stack such as done here, or to a resistance provided by water jugs on a pulley.The data are simple to obtain, and the parameters that were explored in this study can inform on both individual progress or to make comparisons across players.The specific analysis of these biomechanical markers can also serve to evaluate the changes made by modifying technical execution of the eggbeater motion: changing the speed of execution, the range of motion at the hips, knees or ankles, as well as using an alternating or simultaneous kick to push the resistance.

Figure 1 .
Figure 1.Experimental setup with weight stack resistance attached to the load cell to record force from participants.
-class correlation, CI = confidence interval, Sig=significance level, SEM = standard error of measurement, LOA = limit of agreement, N = Newtons, s = seconds

Figure 2 .
Figure 2. Bland and Altman diagram for mean peak force (values indicate the bias, upper limit of agreement, and lower limit of agreement)

Figure 3 .
Figure 3.Comparison of (A) maximum strength relative to body weight between males and females, and (B) endurance index between males and females.

Figure 4 .
Figure 4. Force recordings during the pushing task with example for high (A) and low (B) endurance indices.