College Policy Debate Forums
June 21, 2018, 09:40:23 PM *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News: IF YOU EXPERIENCE PROBLEMS WITH THE SITE, INCLUDING LOGGING IN, PLEASE LET ME KNOW IMMEDIATELY.  EMAIL ME DIRECTLY OR USE THE CONTACT US LINK AT THE TOP.
 
   Home   Help Search Login Register  
Pages: [1] 2
  Print  
Author Topic: Possible Wake Forest experiment - please read and comment  (Read 8039 times)
glarson
Sr. Member
****
Posts: 477


« on: October 21, 2011, 10:01:07 PM »

Over the years, Wake has uniquely served the community by participating in a number of experiments designed to improve debate practice and the management of our tournaments.  I have been in conversation with Jarrod and we would like to get reactions to a potential experimental change in the way that pairings get constructed in HL rounds.

In brief, the goal of doing power-pairing historically is to accomplish the following:

1)  Teams with the same record and/or the same opportunity to clear into elimination rounds should have met essentially the same strength of opposition.  In earlier days with random preset scheduling, teams routinely complained about their brutal draw while some presumably “weaker” team benefited from a cakewalk.  Having presets based on ABCD designations helps balance the "random" rounds.  Having teams with the same record meet each other during power pairing takes a step towards equalizing strength of opposition and helps everyone “find their level.”  The move to use opposition records to determine pull-ups to balance brackets took another step toward this goal by “punishing” the team(s) that had up until that point in time had the weakest opposition record(s).  But there are still several anomalies.  First, “opposition records” are a relatively blunt tool to distinguish strength of opposition since they have limited range and discrimination.  Second, balancing brackets with opposition records is too “time-bound.”  The team that is pulled up based on three or four rounds worth of records might not have actually been disadvantaged based on the subsequent performance of the team they meet while many debates that are “straight up” might, in fact, be significant mismatches based on subsequent performance.  This anomaly is particularly problematic in tournaments (like the NDT) that stipulate that a team can only be pulled up once until all other eligible teams in the bracket have also been pulled up.  A pull up in an early round (which actually might not even create a mismatch) can prove to be the most advantageous thing that could happen since it vaccinates against future pull-ups.

2)  Power-pairing is also designed to help provide a rational basis for seeding in elimination rounds.  But we worry both about whether points are sufficiently reliable to achieve the task while also worrying about having the “powers’ hit too early or too late in prelim rounds.  We typically have 1 HH round to achieve the former but worry about whether it influences the latter by determining elim sides in critical debates and by impacting the bracket.

3)  Finally, as we’ve moved to an increased reliance on HL pairings, we believe that power-pairing should “protect” the best teams by rewarding them for getting more points.  Of course, this gives perhaps too much weight to early rounds in a tournament and also might give too much weight to points, particularly idiosyncratic points.

While we’ve agonized about the anomalies noted above, the basic pairing strategies in policy debate have not changed much in 25 years.  As noted above, the only real substantive changes have been the use of opposition records to determine who gets pulled up and the gradual elimination of HH rounds so that most tournaments now have 0 or more likely 1.

There are two solutions that I am proposing for Wake HL powered rounds.  The first is sufficiently mundane that I anticipate that we will adopt it unless someone identifies an issue that hasn’t yet been considered.  The second represents a bigger change.  As a result, Jarrod would like a vigorous discussion in the next two weeks that will result in a recommendation.

PROPOSAL #1:  The simplest and by far least controversial change is to modify the way we calculate “strength of opposition.”  At present we almost invariably use opposition record, which as noted above is a fairly blunt tool with too little range or discrimination.
I would propose that we calculate the “strength of opposition” as the average seed of the opponents in a team’s prior rounds (the current seed as opposed to the seed at the time they met).  As an example, at Kentucky going into round 5 there were 11 undefeated teams, meaning that one team had to be advanced into that bracket.  There were 37 3-1 teams which had opp wins ranging from 6 to 11.  Fortunately, the math appeared simple since in an odd round only one team has to be pulled up and since there was exactly 1 team with 6 opp wins.  But when calculating the average current seed of each of the 4 opponents to date, that team with 6 opp wins did NOT appear to have faced the weakest opposition – by quite some margin (an average seed of 85 as opposed to a 7 opp win team with an average seed of 100).  In situations where multiple teams need to be pulled up in even rounds, the anomalies can be even greater.  So I propose that we pull up teams based on the more complete strength of opposition calculation as opposed to simple opp record.

PROPOSAL #2:   Changing the procedure for pulling up teams doesn’t address all of the issues that we confront, however, particularly in the construction of a HL bracket.  As noted, points are a valuable but imperfect indicator that can either reward or punish in HL rounds.  If I get unusually high points for whatever reason, I am rewarded by consistently facing weaker competition.  More critically, if I get unusually low points, I am punished by facing consistently stronger competition.  And while the HL mechanism arguably enhances the third goal of power pairing above, it doesn’t really do anything to achieve the arguably more important goal of trying to ensure an equal strength of opposition within the cohort of teams that have the same record or who have the same possibilities of breaking.

At Kentucky, we had significant evidence that we are not doing a good job of equalizing strength of opposition.  There are at least a couple ways of measuring this.  First, we can measure the range between the highest average strength of opposition and the lowest average strength of opposition within each WL bracket at the end of the tournament.  Just considering the 4,5, and 6-win brackets, arguably the most important in determining seed order and who clears, we had the following results at Kentucky:

6 wins – 29 (1st) – 78.5 (83rd)
5 wins – 36.8 (6th) – 86.1 (103rd)
4 wins – 33 (3rd) – 107.6 (139th)

To explain the numbers, the above means that within the 4 win bracket, the least lucky team met 8 opponents with an average seed of 33 (the third strongest schedule overall) while the most lucky team met 8 opponents with an average seed of 107.6 (the 139th strongest schedule overall).  Needless to say, we can’t say that power pairing produced an outcome where teams with the same record had even remotely the same strength of opposition.  To make the point more poignant, the 4-4 team that had the 3rd strongest schedule would have cleared had they had 5 wins.  Their fate can be contrasted with the 5-win team that, in fact, cleared having had the 103rd strongest schedule. 

Given this anomaly, at several times over the last 15 years, folks have proposed to me that we should do power pairing based on opposition records rather than based on speaker points.  But as usually conceived, this doesn’t achieve the stated objective.  The problem is immediately apparent.  If the top of the bracket is defined as the team with the highest prior “strength of opposition” they should face a weaker team as opposed to a team with the lowest “strength of opposition.”  The team with the lowest strength of opposition may not be a WEAK team at all and as a result might not help equalize the overall strength of opposition of the first team at all.  We have an apples vs oranges problem.  But there IS a hybrid statistical solution that does an admirable job of taking into consideration differences in strength of opposition and differences in strength to preserve the goals of HL in #3 while enhancing the balancing of strength of opposition in #1. 

Our problem has always been that we consider the various components of seeding to be an ordered process as opposed to a summative process.  For instance we define the bracket as:  WL, then HL points, then 2HL points, then opp record, etc.  What we should do is create an aggregate measure that includes the quality statistics on one hand (seed order based on whatever system we choose) and quality of opposition records on the other.  We get remarkably good outcomes if we simply sum (or average) a team’s SEED as computed at the end of a round and their opponents’ average SEED at the end of that round.  We then construct a HL bracket based on that aggregate statistic.  As noted, I don’t think the first change above is at all controversial.  But this one might be.  We tend to be a pretty conservative bunch.  But that said, I’m becoming convinced that this method provides a more defensible and testable result.

So my proposal is this.  I will construct the sort order within high-low brackets based on the SUM of a team’s current seed (determined by our standard procedures for sorting teams – wins, drop HL, total, etc.) and the average of their prior opponents’ seed.  This creates a statistic that is commensurable and takes into consideration BOTH the need to sort from strong to weak AND the need to sort for strong prior opposition to weak prior opposition.

Fortunately, the outcomes are directly testable against Kentucky, a tournament with the same distribution of pairing strategies (2 presets ABCD, HH in round five, HL in 3,4,5,6,7,8), essentially the same number of teams and comparable fields.  So while we can never have a controlled experiment, there will be a number of statistical ways to determine whether the outcome was better worse or no different.  Otherwise, we’ll be tempted to rely solely on the most poignant anecdotes.

I’d be happy to discuss the details of implementation and any issues that folks might have with this proposal.

Thanks,

GARY
Logged
hansonjb
Full Member
***
Posts: 223



« Reply #1 on: October 22, 2011, 12:36:23 AM »

gary

as i think you know, i 100% agree with what you are talking about. for anyone that reviews the stats, runs tab, this is a serious problem with the way we have been doing power matching. some teams get a decidedly easier draw toward their final seeding versus others who can face a much much more difficult draw.

two questions:

1. does your first proposal possibly over account for speaks itself? if a 3-1 with high points is seed 14 and 3-1 with low points is seed 48, that seems like a pretty wide disparity when those teams might not be so different (although perhaps they are).

2. have you tested your second proposal? i did some tests about 3 years ago of something quasi-similar--it was more just making the brackets looser so that teams within one loss/one win difference could be matched against each other and then paired teams based on opp records--so definitely not the same but in the ball park of what you are suggesting. i tested it out and found a 10 to 25% improvement in the final opp difficulty for the teams (as in they were more equal in opposition difficulty for teams with the same record). your approach is more nuanced as it uses seeding and so may very well and probably will do better.

oh wait, maybe your second proposal keeps the same brackets as exist now? (eg 3-1's hit 3-1's, 2-2's hit 2-2's, etc.). when i tried that, i got almost no improvement in equality of opposition difficulty--as i remember it was about 5%.
« Last Edit: October 22, 2011, 12:39:26 AM by hansonjb » Logged

jim hanson Smiley
seattle u debate forensics speech rhetoric
glarson
Sr. Member
****
Posts: 477


« Reply #2 on: October 22, 2011, 07:32:41 AM »

Thanks Jim,

Let me clarify a couple of things

(snip) 1. does your first proposal possibly over account for speaks itself? if a 3-1 with high points is seed 14 and 3-1 with low points is seed 48, that seems like a pretty wide disparity when those teams might not be so different (although perhaps they are). (snip)

A couple of thoughts - first, I tested a variety of weightings between my seed and the average of my opponent's seeds.  In attempting to balance all three goals of power-pairing I got best results balancing them 1 to 1 (half is my performance, half is the strength of my opposition).  This is also the easiest to conceptually describe (i.e. it makes a certain sort of sense).  You are right that the difference between seed 14 and 48 in the same bracket is pretty large, but it should be noted that for each bracket tested, the difference between the top and the bottom in seed (the point difference is smaller) is typically a little bit smaller than the disparity in the average seed of opponents (strength of opposition) between the team that has the strongest SOP and the weakest.  So when the sort occurs, differences in SOP will always be slightly more salient than differences in seeding (probably a good thing).  It should be noted that there are even more arcane ways to do the stats.  But I 'm not sure they would do any better and at the end of the day, we don't want to lose inspectability altogetherr.  While it would take some time, the current proposal can still be reverse engineered.

 
2. have you tested your second proposal? i did some tests about 3 years ago of something quasi-similar--it was more just making the brackets looser so that teams within one loss/one win difference could be matched against each other and then paired teams based on opp records--so definitely not the same but in the ball park of what you are suggesting. i tested it out and found a 10 to 25% improvement in the final opp difficulty for the teams (as in they were more equal in opposition difficulty for teams with the same record). your approach is more nuanced as it uses seeding and so may very well and probably will do better.

oh wait, maybe your second proposal keeps the same brackets as exist now? (eg 3-1's hit 3-1's, 2-2's hit 2-2's, etc.). when i tried that, i got almost no improvement in equality of opposition difficulty--as i remember it was about 5%.

First, let me make a quick observation about testing.  While I can do a lot of evaluation of the outcome of a completed tournament (e.g. Kentucky) and can look at what happens with alternative pairings of a single round, it is never possible to fully test an alternative in any lab other than a tournament.  Each pairing creates a set of winners and losers that creates different subsequent pairings.  There is no “controlled experiment.”  But that doesn’t mean that we’re throwing dice.  The conceptual basis for the experiment can be well understood (the goal of this dialogue).  And there will be a number of metrics that will permit the evaluation of the outcomes.  And for what it’s worth, there is very little downside risk in the experiment.  The range of anomalous things that already happen at a tournament (massive pull-ups in round 8 due to side skews, etc.) means that nobody could say, the experiment prevented the outcome that SHOULD have happened (namely me clearing).

Regarding the last paragraph.  Yes, the proposal will still continue to pair within WL brackets.  I’ve actually tested an even more radical alternative that loosens the requirement that teams have the same record (we already have a lot of pull-ups).  But those experiments created worse rather than better results.  And our goal is not that everyone in the tournament have more equal strength of opposition but rather that teams with the same record or opportunity to clear should have similar strength of opposition.  In fact, at the end of the day, an absolute ideal scenario would be that there is a very high correlation between one’s own seed and the average seed of my opponents.  The top teams progressively face the toughest opposition.

Logged
jkwarden
Newbie
*
Posts: 4


« Reply #3 on: October 24, 2011, 11:40:40 AM »

I love this idea
Logged
jgonzo
Jr. Member
**
Posts: 81


« Reply #4 on: October 27, 2011, 02:22:09 PM »

Gary,

as I understand it, your proposal #2 creates a new measure of seeding within a given bracket. Brackets would still be based on total wins, but instead of relying on high-low points, a summative sort would be conducted based on traditional seeding (h/l points, etc) and the average opponent's seed.

For those of us on a more sports-based knowledge statistical, I get the sense that you are proposing something akin to the OPS stat in baseball, which suggests that rather than measuring a player's batting average, then RBIs, then home runs to measure their effectiveness, a stat that measured the combination of on-base percentage AND slugging percentage is best. Similarly, in the case at hand, you are creating a measure of a team's success AND the relative strength of opposition that they have faced....

Assuming I'm not totally misunderstanding this, it seems to make perfect sense. My prediction is that most people will find the idea very enticing and great, until they decide that they've drawn an opponent who is "too good," at which point both you and your system will be the devil incarnate.

Josh.
Logged
glarson
Sr. Member
****
Posts: 477


« Reply #5 on: October 27, 2011, 03:09:56 PM »

You're understanding is correct and the comparison to hybrid statistics like OPS in baseball is apt.

The issue we've had is that we have traditionally considered the various statistics we use to be an ordered set of tie-breakers.  We set one statistic as primary (e.g. HL points) and then only go to a second statistic (e.g. total points) when the first one is tied.  We don't ever get to strength of opposition until we've determined that there is still a tie after three or four tie-breakers.  As long as it is an ordered set we have to make either/or decisions as to which is most important, never getting to subsequent variables unless everything is still tied.  But we should be making our pairing decisions based on a both/and rather than an either/or methodology.

This is particularly true since the use of strength of opposition fails even more than speaker points if it is defined as the principal or first variable in a set of ordered tie-breakers.  The goal is not to have the team with the highest SOP face the team with the lowest SOP.  To address differences in SOP the team with a higher SOP needs to face a weaker team and a team with a lower SOP needs to face a stronger team.  This balance can only be achieved if we incorporate the strength variable and the strength of opposition variable into a hybrid measure.

THe final reason for making the proposed change is to provide a more commersurable measure between quality variables on the one hand and strength of opposition variables on the other.  There is no good way to add a number like HL speaker points and a number like opposition wins together since they have a different magnitude, range, and statistical distribution.  There are some complicated statistical transformations that can be done on each to make them strictly commensuragble but's much easier to convert the former into my seed going into the round and the latter as the average seed of my prior opponents going into the round.
Logged
mschnall
Newbie
*
Posts: 4


« Reply #6 on: November 02, 2011, 02:16:17 PM »

Gary --

Respectfully, the analogy to OPS is off-base.  Your proposed exercise does not blend statistics that are understood to measure different aspects of a team's performance (a la reaching base and hitting for power).  Instead, it blends metrics each of which is intended to reflect on the team's performance as a whole.  The more apt baseball statistical analogy for this exercise is "range factor."  Like that statistic, your exercise would attempt to blend existing metrics to create a measure more precise than its component parts.  The term used for such an exercise in the scientific community is "false precision" -- and there is a reason that, once the baseball statistical movement gained traction, range factor was quickly supplanted by defensive statistics that, while still imprecise, are at least based on better data.

You begin with the premise that pairing teams based on their wins and points at a given tournament is imprecise because it fails to take into account the strength of their opposition.  This is a premise that deserves some examination on its merits, since there still is rather broad consensus that ranking teams based on wins and points is the most appropriate metric for deciding elimination round qualification and seeding.  If it is the most precise method we have for deciding elimination seeding, why is it less appropriate for use in preliminary round seeding?

Granting the premise for the sake of discussion, however, your proposal to incorporate opponents' wins-points seeding in determining a new form of seeding for purposes of pairing within brackets does not seem likely to do much other than complicate the process of pairing so much that a human observer will have great difficulty in identifying pairing errors.

1) Fundamentally, it makes no sense to look at opponents' seeding based on wins and points if you are trying to correct an imprecision inherent in seeding teams based on wins and points.  If you believe that the team being paired may have more or fewer wins (or points) than they "should" based on their strength of opposition, there is no reason to suppose that the wins and points of the team's opponents are any more reflective of their actual strength.  It might be more effective to look at performance over the course of the season, or reputational strength.  In college sports, some computer metrics look at opponents' opponents' record -- I don't know whether people think this is effective or not, but it might be worth looking into.

2) Using seed numbers rather than opposing wins introduces a bunch of noise that arguably is not meaningful and may even run counter to the ultimate goal.  (This, by the way, is an argument against both of your proposals.)  Yes, there may be a significant difference in strength between the strongest team in a bracket and the least strong.  But your basic premise, if true, suggests that wins-points seeding may not be the best way to evaluate that difference even within a bracket.  More importantly, if uneven strength of opposition distorts win numbers, then there may be a significant degree of *inversion* within the seed numbers.  If, for instance, the top 3-3 teams after 6 rounds are much better than many of the 4-2 teams, then you may have a number of teams seeded in the low 50s (top 3-3s in a 150-team tournament) that should be in the mid 20s (mid-high 4-2s), while other teams in the high 40s/low 50s (bottom 4-2s) should really be in the mid 70s (middle 3-3s).  Because of the way binomial distributions work, it seems to me that those discrepancies would tend to be larger than discrepancies in the number of wins, particularly in the case of teams in the middle of the bracket where a late-round pairing decision could affect qualification for elims. 

3) The negative effects of introducing inaccurate strength of opposition data are likely to be magnified if the procedure is employed early in a tournament, when differences in number of wins result in larger skews in seeding.  This is particularly true in the case of pullups, where the strength of opposition data is skewed by differing numbers of opposition wins against the teams in the same bracket. 

4) Your proposal also assumes that seed numbers are additive in a meaningful sense -- i.e., that the *average* of opposing seeds faced by two teams is an apt comparison.  This probably is not true for most teams, although the most compelling counterexamples will vary from team to team.  A team on the border of clearing would much rather face three other teams on the border of clearing (e.g. around the 66th percentile) than face the worst team in the tournament followed by the top two teams, even though the average of the ordinal seeds would be the same.  A team around the 33rd percentile might prefer to face the top team and the worst two teams, rather than facing three teams of similar strength (and might learn the most by facing three teams skewed somewhat around its own level).  Average strength of opposition often will not correlate closely with competitive equity.

5) The goal of correcting imbalances in strength of opposition is fundamentally at odds with pairing within brackets.  If you really wanted teams to be paired with a balanced slate of other teams across the course of the tournament, you would not have your undefeated (or winless) teams keep debating each other.  There is little reason to expect that pairing teams within brackets based on strength of opposition will end up equalizing the strength of opposition by any objective measure.

The last two practical observations suggest that we might also want to question your underlying assumption that equalizing strength of opposition is a primary goal of power pairing.  Perhaps the goal is instead to ensure that teams in contention for elimination rounds have an adequate opportunity to debate one another and determine their own destiny.  Perhaps power pairing also serves the educational goal of giving teams a critical mass of debates against opponents of comparable skill levels.  Alternatively, if the community thinks that equalizing strength of opposition -- or giving teams of all levels an equal opportunity to debate one another -- is an important goal, the best approach may not be to tweak a power-matching system that is fundamentally at odds with that goal.  A better approach is probably to pair a greater number of rounds on a preset basis using some form of strength ranking. 

Finally, one factual error that might point to a bug in your strength-of-opposition computation: The 4-4 team at Kentucky that I am pretty sure you are referring to as your "poignant" example did not in fact have a strength-of-opposition of 33 -- it was closer to 50.  The computational error, assuming I am looking at the correct example, was in disregarding the second round pairing against a team that withdrew before the tournament was completed, rather than assigning a seeding to that opponent based on its performance prior to withdrawing.

-- Matt
Logged
glarson
Sr. Member
****
Posts: 477


« Reply #7 on: November 02, 2011, 05:55:13 PM »

Matt,  I appreciate the respectful critique.  I will do my best to answer in kind.  My objective all along has been to generate a meaningful discussion regarding our tournament practice.  I’ll answer your objections in context.

<<Respectfully, the analogy to OPS is off-base.  Your proposed exercise does not blend statistics that are understood to measure different aspects of a team's performance (a la reaching base and hitting for power).  Instead, it blends metrics each of which is intended to reflect on the team's performance as a whole.  The more apt baseball statistical analogy for this exercise is "range factor."  Like that statistic, your exercise would attempt to blend existing metrics to create a measure more precise than its component parts.  The term used for such an exercise in the scientific community is "false precision" -- and there is a reason that, once the baseball statistical movement gained traction, range factor was quickly supplanted by defensive statistics that, while still imprecise, are at least based on better data.>>

I’ll begin by noting that the analogy to OPS is Josh’s not mine.  I just agreed it was apt.  But I don’t want to push that too far.  My analogy is primarily based on the fact current practice treats all of our various statistics as an ordered set of tie-breakers.  The derived statistic that I’m proposing is not an attempt to gain false precision but rather to merge two formally dissimilar metrics to resolve one singular issue.  We could say that baseball players should be ranked by OBP and that ties should be based on slugging pct.  But for good or ill that’s not what OPS does.  It attempts to add them together to create a metric where strength in one dimension can offset weakness in another.  But I’m not saying my statistic is like OPS as an attempt to better rank debate teams.  I’m just saying that it is a derived statistic.  

<<You begin with the premise that pairing teams based on their wins and points at a given tournament is imprecise because it fails to take into account the strength of their opposition.  This is a premise that deserves some examination on its merits, since there still is rather broad consensus that ranking teams based on wins and points is the most appropriate metric for deciding elimination round qualification and seeding.  If it is the most precise method we have for deciding elimination seeding, why is it less appropriate for use in preliminary round seeding?>>

This is the nub of the argument and perhaps a place where I haven’t been clear.  I’m NOT saying that my hybrid seed + average seed of opposition is a better statistic for rank ordering the teams in a tournament (at least not in this experiment).  As you note, both my seed and the average seed or my prior opponents is based on the traditional sort orders we use to determine seed now.  I’m just saying that IF (and this is the IF that you challenge below) the principal (or one of the principal) goals of power pairing is to “equalize quality of opposition for teams with the same W/L record and/or the same opportunity to clear” then pairing H/L based on seed order within brackets fails to meet that objective – indeed it isn’t designed to attempt to maximize that objective.  As I noted in my original post, there are a few practices that do help to equalize strength of opposition.  It is the reason that we pre-rank teams to create balanced presets.  We have teams face opponents with the same numbers of wins and losses (except for pull-ups).  This “tends” to equalize strength of opposition but not entirely.  We pull-up teams based on opp wins, a move that was explicitly adopted to balance strength of opposition but which has been a relatively blunt tool to accomplish that.

<<Granting the premise for the sake of discussion, however, your proposal to incorporate opponents' wins-points seeding in determining a new form of seeding for purposes of pairing within brackets does not seem likely to do much other than complicate the process of pairing so much that a human observer will have great difficulty in identifying pairing errors.>>

I’m NOT saying that it is a better way to seed or a better metric to ensure that the best team in a bracket faces the worst in the bracket.  I’m saying that there might not be a good reason to presume that our ultimate goal should be to have the best team in the bracket face the worst BUT rather that one of our primary objections should be to use pairing within the bracket to help us equalize strength of opposition particularly as who I face within the bracket will impact my ability to clear.   So why wouldn’t propose that we just sort based on strength of opposition.  Unfortunately, this doesn’t help at all.  IF the goal is to equalize strength of opposition, then the team with a stronger prior strength of opposition should face a “weaker” team as opposed to a team with a “weaker” prior strength of opposition.  The hybrid statistic allows the system to balance both considerations that might otherwise be in tension.

<<1) Fundamentally, it makes no sense to look at opponents' seeding based on wins and points if you are trying to correct an imprecision inherent in seeding teams based on wins and points.  If you believe that the team being paired may have more or fewer wins (or points) than they "should" based on their strength of opposition, there is no reason to suppose that the wins and points of the team's opponents are any more reflective of their actual strength.  It might be more effective to look at performance over the course of the season, or reputational strength.  In college sports, some computer metrics look at opponents' opponents' record -- I don't know whether people think this is effective or not, but it might be worth looking into.>>

Once again, I’m not trying to correct an imprecision.  I’m trying to introduce the equalization of strength of opposition into the pairing calculus.  If that’s not an objective, then we don’t need to do this.

<<2) Using seed numbers rather than opposing wins introduces a bunch of noise that arguably is not meaningful and may even run counter to the ultimate goal.  (This, by the way, is an argument against both of your proposals.)  Yes, there may be a significant difference in strength between the strongest team in a bracket and the least strong.  But your basic premise, if true, suggests that wins-points seeding may not be the best way to evaluate that difference even within a bracket.  More importantly, if uneven strength of opposition distorts win numbers, then there may be a significant degree of *inversion* within the seed numbers.  If, for instance, the top 3-3 teams after 6 rounds are much better than many of the 4-2 teams, then you may have a number of teams seeded in the low 50s (top 3-3s in a 150-team tournament) that should be in the mid 20s (mid-high 4-2s), while other teams in the high 40s/low 50s (bottom 4-2s) should really be in the mid 70s (middle 3-3s).  Because of the way binomial distributions work, it seems to me that those discrepancies would tend to be larger than discrepancies in the number of wins, particularly in the case of teams in the middle of the bracket where a late-round pairing decision could affect qualification for elims.>>

This is a very important critique and one that I’ve attempted to model in a variety of ways.  Actually, if as you say points are a better predictor than wins, opp records based solely on wins are even worse than I suggested that they might be.  But as you say, we have traditionally committed ourselves to a rank ordering that places wins above points in all cases.  I’m not criitiquing that assumption in any case.  This isn’t about whether our current sort orders to determine seeding is the best.  I’m essentially assuming that it is.  So how do I account for my opponent’s strength?  I use the same metric (admitting its potential flaws).  Converting my seed to an ordinal rank and doing the same for my opponents creates a commensurability for the statistics that is impossible in any of our sort-order systems.  Since the sample of data increases over time, the noise actually appears to be reduced.  But this is an empirical question that can be evaluated.  I will supply all of the statistics after the tournament for open comparison between Kentucky and Wake.

<<3) The negative effects of introducing inaccurate strength of opposition data are likely to be magnified if the procedure is employed early in a tournament, when differences in number of wins result in larger skews in seeding.  This is particularly true in the case of pullups, where the strength of opposition data is skewed by differing numbers of opposition wins against the teams in the same bracket. >>

This will be one of my tests.  Truth be told, all current systems produce “skews” in early rounds where one team in a 2-0 bracket may face a team that will ultimately go 2-6 while another will face a team that will go undefeated.  

<<4) Your proposal also assumes that seed numbers are additive in a meaningful sense -- i.e., that the *average* of opposing seeds faced by two teams is an apt comparison.  This probably is not true for most teams, although the most compelling counterexamples will vary from team to team.  A team on the border of clearing would much rather face three other teams on the border of clearing (e.g. around the 66th percentile) than face the worst team in the tournament followed by the top two teams, even though the average of the ordinal seeds would be the same.  A team around the 33rd percentile might prefer to face the top team and the worst two teams, rather than facing three teams of similar strength (and might learn the most by facing three teams skewed somewhat around its own level).  Average strength of opposition often will not correlate closely with competitive equity.>>

Given that it is an iterative process, I’m not sure that I would agree with the last sentence, though I’ll be open to discuss outcomes after the tournament is completed.

<<5) The goal of correcting imbalances in strength of opposition is fundamentally at odds with pairing within brackets.  If you really wanted teams to be paired with a balanced slate of other teams across the course of the tournament, you would not have your undefeated (or winless) teams keep debating each other.  There is little reason to expect that pairing teams within brackets based on strength of opposition will end up equalizing the strength of opposition by any objective measure.>>

This is important.  We are not saying that we need to balance strength of opposition equally for all participants in the tournament.  We do have some additional objectives.  We do want to create the best possible final seed order by having the better teams battle it out.  But how we balance this is an open question.  I’ve played with a more radical proposal that eliminates the notion that teams should debate in W/L brackets.  I’ve imagined two brackets, one for teams that mathematically can clear and one for teams that mathematically can’t.  While intriguing I don’t think we’re prepared to go that far.  What I can say is the teams at a tournament like Wake who are in the 4 or 5 win bracket should as much as possible be able to say, I got to this point by facing roughly the same strength of opposition as other teams that have gotten to this point AND if I’ve had a tougher road to my break round I should be rewarded by facing a somewhat weaker team than if I have had a cakewalk (even if my points don’t entirely confirm it).

<<The last two practical observations suggest that we might also want to question your underlying assumption that equalizing strength of opposition is a primary goal of power pairing.  Perhaps the goal is instead to ensure that teams in contention for elimination rounds have an adequate opportunity to debate one another and determine their own destiny.  Perhaps power pairing also serves the educational goal of giving teams a critical mass of debates against opponents of comparable skill levels.  Alternatively, if the community thinks that equalizing strength of opposition -- or giving teams of all levels an equal opportunity to debate one another -- is an important goal, the best approach may not be to tweak a power-matching system that is fundamentally at odds with that goal.  A better approach is probably to pair a greater number of rounds on a preset basis using some form of strength ranking.>>

This IS the core discussion that I want us to have.  Having been involved in competitive debate for 45 years, I think that equalizing strength of opposition has been and should be one of our principal objectives.  But I’m not discounting other objectives.  And for what’s it’s worth, having more presets creates problems of its own, particularly at tournaments where teams have a relatively small number of common opponents or tournaments to do comprehensive rankings.  And as I said, I don’t think that we are saying that we want to equalize strength of opposition across the board but rather consider it within the matrix of how we do pairings within win/loss brackets.

<<Finally, one factual error that might point to a bug in your strength-of-opposition computation: The 4-4 team at Kentucky that I am pretty sure you are referring to as your "poignant" example did not in fact have a strength-of-opposition of 33 -- it was closer to 50.  The computational error, assuming I am looking at the correct example, was in disregarding the second round pairing against a team that withdrew before the tournament was completed, rather than assigning a seeding to that opponent based on its performance prior to withdrawing.>>

Good observation.  In point of fact, within hours of the original posting, I double-checked all of the math and changed the computation as a result.  Just as we have anomalies when teams have had early byes, we do have cases where a team might withdraw.  There are two possible solutions and I’m prepared to do either.  The first option is to treat it just like a bye and to average the strength of opposition against all of the other rounds.  In that case, the strength of opposition I believe would have been about 37 (47 if we asume place where the team was when they withdrew), not quite as poignant but still a huge difference.  The other option would be to freeze a withdrawn team at the seed they were when they withdraw.  Since most withdrawals are due to illness, it isn’t entirely predictable where they would have been when they withdrew or where they would have ended.  So either solution creates a potential anomaly but not really worse than the ones that we already get with byes and severe side skews that cause multiple pull-ups.  One of the things we need to avoid is a belief that this or any current pairing algorithm has some sort of scientific validity.  Rather than adopt a false precision, I’m always reminding folks that there isn’t a “right” alternative that all of our approximations are either coming closer to or farther away from.  And since any pairing produces a set of outcomes that impacts all subsequent pairings, the results can’t be measured in a computer simulation.  They have to be done in an actual tournament.  For that reason, I applaud Wake for considering this option and welcome critique from all participants when it is done

GARY
« Last Edit: November 02, 2011, 06:21:39 PM by glarson » Logged
mschnall
Newbie
*
Posts: 4


« Reply #8 on: November 03, 2011, 12:06:44 PM »

Gary --

This sounds like a fair response on a point by point basis, but I am having trouble seeing how your experiment makes sense in light of the responses.  Mightn’t you be better off, prior to any experiment, having the community discussion that you say you want over the subject of whether, to what extent and in what form there ought to be an objective of equalizing strength of opposition?  You could also discuss how the “strength” of a given team ought to be measured.

1) Your approach continues to assume that wins and points within a tournament or a portion of a tournament is an appropriate measure of “strength” for purposes of calculating strength of opposition.  I question that assumption.  I thought that you did as well, based on your comments about anomalous speaker points, particularly early in a tournament.  Given that you now say you are not questioning the traditional seeding order as a measure of strength (at least for purposes of this exercise), let me make my critique more clear:

a) Wins and points at most measure performance within a given tournament and not overall strength.  Strong teams have bad debates; weak teams have good debates; some teams debate better against certain opponents or arguments or on a particular side of the topic.  The strongest team does not always win and most people would agree that is a good thing.

b) Wins and points do not necessarily measure quality of performance even within a given tournament, because, as you observe, teams have not necessarily faced opponents of the same strength.  Trying to ensure that teams face opponents of the same strength based on a measure that does not accurately reflect opponents’ strength, and then trying to assess the success of the experiment by looking at the same flawed measure sounds like a bad case of garbage in, gospel out. 

c) As further described below, the pairing system is designed to cause different teams to face opponents of differing strength.  Tinkering with power pairing therefore will not (and should not) allow you to accurately measure strength based on records.

This is why I say that you are creating false precision, and the main reason why I would suggest any “improvement” or “regression” that you identify from the type of data you are gathering will be illusory.

2) An independent flaw in your metric arises from the attempt to average opponent strengths that are not additive in a meaningful sense. 

Balancing strength of opposition is not an end in itself -- it is a goal because we think it promotes multiple values: competitive equity, equal access (in the sense that teams have a comparable opportunity to debate teams across the competitive spectrum) and education (because there are things to be learned from debating opponents at all levels).  As I pointed out, however, the distribution of opponent strengths can be just as important to achieving those ultimate goals as is the “average.”  Facing three borderline elim teams is not equivalent to facing the top two teams in the tournament plus the worst one.  That is why the portion of the pairing process that focuses on balancing opponent strength -- i.e., the assignment of preset opponents -- typically divides teams into bands.  This helps to level out distribution as well as average strength. 

As I understand your position, you do not quarrel with any of this in principle, although you express some skepticism as to the frequency with which distributional effects would have a material effect on competitive equity.  I think, however, that you underestimate the degree to which this flaw undermines not only the mechanics of measuring and trying to equalize opponent strength, but even the basic premise that there is a problem in the first place.  You may not be looking at the most relevant data.

3) You say that you think equalizing strength of opposition should be one of the principal objectives of a pairing system.  But you do not seriously answer the point that it is not and cannot be an objective of power pairing because power pairing is fundamentally at odds with equalizing strength of opposition.  That is particularly so if we equate strength of opposition (as your proposed algorithm does) with performance within the tournament.

a) You have things precisely backwards when you say that power pairing “‘tends’ to equalize strength of opposition.”  The basic tenet of power pairing is that teams with better records (i.e., by your measure, stronger teams) should debate one another and teams with worse records (weaker teams) should debate one another.  Structurally, power pairing creates differences in strength of opposition, not equality. 

b) A couple of times you have come close to saying that the goal should be balancing strength of opposition within brackets or within some other undefined subgroups within the tournament, not balancing strength overall.  This is problematic on its own terms but even if the community were to accept such a goal, power pairing still would push in the opposite direction.

(i) To the extent there is a goal of balancing strength of opposition, altering that goal to create separate “classes of citizens” within a tournament ought to be controversial. 

* It seems vulnerable to critique on grounds of elitism. 
* It focuses single-mindedly on one of the three purposes of balanced opposition (competitive equity) while undermining the other two (equal access and education). 
* The definition of the appropriate subgroups is problematic both from a principled point of view and a practical one.  I’ll let you tangle with the principles and just observe practically that when you try to break teams down into groups statistically you will be stymied by the fact that a team’s statistically predicted performance can change dramatically over the course of a tournament. 

(ii) As a structural matter, moreover, power pairing is still at odds even with such a modified conception of balancing strength of opposition.  Let me illustrate with a simple example, assuming four preset debates and four power-matched debates. 

Since you at one point referred to “an equal strength of opposition within the cohort of teams that have the same record or who have the same possibilities of breaking,” I assume you will agree that any reasonable set of “classes” of teams would place two 5-3 teams that are near the borderline for elimination rounds in the same class.  If, however, one of those teams began the tournament 0-3, winning its next five debates, while the other team began 5-0 and lost the last three debates, power pairing however defined will by design cause the first team to debate a weaker slate of opponents than the second team. 

We can presume that the opponents faced by the two teams in the four preset rounds are balanced, since that is actually a principal goal of a preset schedule.  Focusing then on the latter four rounds, power pairing will have required the first team to debate opponents in the 1-3, 2-3, 3-3 and 4-3 brackets, while the second team will have debate opponents in the 4-0, 5-0, 5-1 and 5-2 brackets.  By design, power pairing creates a wide discrepancy in the strength of opposition.

c) Trying to incorporate into the power pairing system a strength-equalization goal that is at odds with the purpose and design of the power pairing process does not “balance” strength-equalization with the actual goals of power pairing.  Instead, it produces a system that will no longer predictably accomplish any of those goals. 

In your proposed system, a team with high points and low strength of opposition is equally likely to debate (1) another similarly situated team (accomplishing the goal of equalizing strength of opposition but defeating the goal of high-low pairing), (2) a team with low points and high strength of opposition (accomplishing the goal of high-low pairing but aggravating the differences in strength of opposition), or (3) a team with medium points and medium strength of opposition (accomplishing neither goal).  And the disparities become even worse if you bring into the mix a team with high points and low strength of opposition that is pulled up from a lower bracket and therefore has the lowest ordinal seeding.

4) Trying to force the power pairing process to serve a function for which it is not, and cannot effectively be, designed strikes me as an unwise use of resources.  It seems to me far better to acknowledge that there are different and competing goals in the pairing process -- (A) equalizing strength of opposition for all teams, (B) providing teams with a critical mass of debates against teams of a comparable skill level to their own, (C) allowing teams in contention for elimination rounds the opportunity to debate one another and determine their own destiny -- and that you have tools available to accomplish each of those goals.  Each tournament director could then focus on how best to balance those competing goals and which tools to employ to that end.

-- Matt
Logged
glarson
Sr. Member
****
Posts: 477


« Reply #9 on: November 03, 2011, 10:53:58 PM »

I knew when I decided to start this thread that it would be inevitable that I would find an interlocutor who would write a thoughtful critique with even longer e-mails than I can typically muster and that the discussion would happen at a time when I wouldsimply be unable to answer in a complete or timely fashion due to my extremely busy administrative schedule.  While I do believe that Matt's critique can be answered and deserves to be answered, I'm representing Wheaton at a conference with a very busy agenda.  As a result, I won't be able to get back to this discussion until Sunday at the earliest.

But one charge does need to be addressed more immediately.  Matt answers why I would do this research at Wake BEFORE having a discussion with the community as to whether it is an appropriate objective to balance the strength of opposition among teams that have the same win-loss record and/or who are in a similar position with respect to their opportunity to clear.  It is precisely for that reason that I started this thread two weeks before the decision was made to go ahead and conduct the test.

I apologize for not being able to give a more complete defense at this time.  I'm still not sure that Matt and I aren't arguing past each other on several of the key arguments.

GARY
Logged
mschnall
Newbie
*
Posts: 4


« Reply #10 on: November 04, 2011, 09:58:29 AM »

Gary --

You certainly will not offend me by replying at your convenience.

Also, to be clear, I was not leveling any "charge" about the timing of your experiment, just making a suggestion that some more discussion might be worthwhile under the circumstances. 

Your original post invited discussion of two specific proposed changes.  It did not explicitly invite discussion of why we value the objective of equalizing strength of opposition, what we think that objective means (e.g., overall or within subgroups), or how it is properly balanced against competing goals such as pairing similarly-skilled teams together for debates and having the teams in contention for elimination rounds debate one another.  And, at least in this forum, it has not resulted in any such discussion with the exception of my most recent post. 

It is obviously up to you to determine whether such a discussion would be helpful in deciding how to define the "problem," how best to tinker with the pairing process to address that problem and how to define success or failure. 

-- Matt
Logged
Hester
Full Member
***
Posts: 156


« Reply #11 on: November 06, 2011, 07:39:34 AM »

"our goal is not that everyone in the tournament have more equal strength of opposition but rather that teams with the same record or opportunity to clear should have similar strength of opposition"

Gary,

when you get a chance, can you explain the inherency for the above goal? i.e., how skewed or disparate have recent tournaments been in terms of teams with the same record or opportunity to clear having similar strength of opposition?

i'm fine with you trying your formula at Wake - experimentation has always been one of the cool things about the Shirley.

i'm just ignorant as to whether this is a problem that needs solving or just a calculative itch that wants some scratching.
Logged
Agalol
Newbie
*
Posts: 22


« Reply #12 on: November 06, 2011, 03:53:42 PM »

As a debater who typically does not do well on speaks, but has been on the short list of teams not breaking for the last three years (to a varying degree naturally depending on the year), I support the experimentation at least. It is frustrating to hit a round robin team that obliterates you round 1 or 2, hurting allready mediocre speaks and it mattering no more to how rounds are paired than if we debated the lowest seed at the tournament. This is anecdotal naturally, while this may not make a large difference (would need to test run t to see) it seems to add a bit more fairness to a system that sometimes can be unforgiving if you are in only the 28.3 average range of speaks with a partner a bit below.
Logged
glarson
Sr. Member
****
Posts: 477


« Reply #13 on: November 06, 2011, 05:56:09 PM »

Matt,
I accept the challenge to discuss whether equalizing strength of opposition should or shouldn’t be an objective of tournament pairings and, if so, what procedures enhance and which procedures detract from that objective.
I’ve actually accepted it for essentially all of my career in the activity that equalizing strength of opposition is one of our key objectives, but with one caveat that I trust I’ve made clear from the beginning – it is beneficial to attempt to equalize strength of opposition “among teams with the same record and/or with a similar opportunity to clear at a tournament.”  The latter is most important when tournaments like Kentucky, Wake, Northwestern and the like are big enough that not all 5-win teams are able to clear.  I trust I will make the importance of this version of equalization “within subgroups” as you describe as we continue.
I’m open to be convinced that this version of equalization should not be an objective but I’ve never really heard many counterarguments over the years.  The really question has always been whether we could hope to accomplish it.

<<<a) Wins and points at most measure performance within a given tournament and not overall strength.  Strong teams have bad debates; weak teams have good debates; some teams debate better against certain opponents or arguments or on a particular side of the topic.  The strongest team does not always win and most people would agree that is a good thing.>>> 
I agree with a portion of this argument.  But current pairing strategies do attempt to look at performance prior to the tournament as well as performance within the tournament.  Preset rounds are based on prior performance with the goal of equalizing strength of opposition as best as we can.  This prepares us for but is quickly supplanted by performance within the tournament as a basis for seeding, determining opponents and ultimately who will clear.  Truth be told, as many questions we might have about whether wins and points absolutely measure relative performance within a tournament, there are even bigger problems in comparing the relative strength of teams based on prior tournaments, particularly in fall semester tournaments.  While tournament hosts often spend a lot of time in pre-ranking and now have excellent resources with debateresults, pre-rankings often don’t correlate well with actual tournament performance.  Teams have different number of rounds, attend different tournaments from different regions and have very few head-to-head matchups.  It’s not terribly unusual that a C- team ends up clearing or that an A+ doesn’t.   Sometimes the issue is that the pre-rankings aren’t sufficiently accurate given that none of us are omniscient.  Sometimes the issue is that the ranking bands at large tournaments are still too large to genuinely equalize competition.  And sometimes (most of the time), teams improve to do better than we predict OR fail to improve so that they have bad tournaments (not just bad rounds).  So I think that the best of both worlds is to make presets as accurate as possible based on good prerankings but then to use the best procedures possible to meet our pairing objectives in powered rounds.

<<<b) Wins and points do not necessarily measure quality of performance even within a given tournament, because, as you observe, teams have not necessarily faced opponents of the same strength.  Trying to ensure that teams face opponents of the same strength based on a measure that does not accurately reflect opponents’ strength, and then trying to assess the success of the experiment by looking at the same flawed measure sounds like a bad case of garbage in, gospel out.>>>
I disagree, based on the iterative character of the process.  In fact, this is one of the biggest reasons that we need to look at the entire performance of each team’s prior opponents throughout the entire tournament rather than just in the rounds prior to the teams meeting.  The biggest “problem” is not that we don’t think points or wins are adequate, but rather that ignore the critical data that happens AFTER two teams meet in early rounds.  When I said that power-pairing “tends” to equalize strength of opposition it is because feed-forward, it ensures (absent pull-ups) that all teams in a bracket are meeting teams with the same number of prior wins.  So an undefeated team going into round 8 will have met a team with at least seven wins, a team with at least six wins, a team with at least five …  In gross terms, the strength of opposition of the teams at the top of the tournament are much higher than teams at the bottom.  But here’s what happens to cause anomalies (whether we think points are good or not).  I face a team in round 3 at a point in time when we are both undefeated.  Power-pairing is “equalizing” the strength of our opposition.  But my round two opponent might win all of the rest of their debates while your opponent loses the rest of theirs.  Now this may well mean that this really WASN’T a balanced round and that it doesn’t really meet the objectives of having powered the round.  But we compound the problem (or at least don’t address it) when I ignore for the rest of the tournament how my opponents and your opponents did AFTER the round in which we met.  This is the self-correcting process using strength of opposition attempts to create and one that can be empirically tested.

<<<2) An independent flaw in your metric arises from the attempt to average opponent strengths that are not additive in a meaningful sense. >>>
Although this critique is repeated, I’m not sure I’m following.  I assume that you are questioning whether ordinal-level data can be treated like interval or ratio-level data with respect addition/averaging, etc.  If performance data distributes into a standard distribution (as debate performance might) then the “real” difference between ordinal rank 3 and 4 might well be over a different magnitude than the difference between 20 and 21 or 89 and 90.  But we need to note several things.  First, our current measures of performance don’t represent a single continuous variable but rather an ordered set of tie-breakers.  As a result, only an ordinal transformation provides commensurability, even for the seeding itself.  Second, the process of pairing is by nature an ordinal process.  When we pair HL based on records and points, we don’t and indeed can’t ask whether the top two teams in the bracket are the same difference apart in quality as the bottom two teams that they will debate.  And we don’t (and currently can’t ask the question) of whether the top 3-2 team might actually be better than several of the 4-1 teams.  So we inherently operate with an ordinal transformation of our data.  Using exactly the same transformation to evaluate strength of opposition provides commensurability of the data.  You do have a point that the seedings that strength of opposition calculations are based on could be viewed as inaccurate because they don’t take into consideration differences in strength of opposition.  Beyond representing a concession that strength of opposition DOES matter, it should be noted that a future and more radical procedure could include strength of opposition into the seeding calculation (with a recursive use of the formula to calculate opponents strength of opposition).  But this is a simpler experiment.   

<<<3) You say that you think equalizing strength of opposition should be one of the principal objectives of a pairing system.  But you do not seriously answer the point that it is not and cannot be an objective of power pairing because power pairing is fundamentally at odds with equalizing strength of opposition.  That is particularly so if we equate strength of opposition (as your proposed algorithm does) with performance within the tournament.>>>
I disagree that power pairing is at odds with equalizing strength of opposition, presuming that the latter is defined as we are defining it within brackets.  Of the goals you identify I think that power pairing AND equalizing strength of opposition are aimed to provide competitive equity.

<<<a) You have things precisely backwards when you say that power pairing “‘tends’ to equalize strength of opposition.”  The basic tenet of power pairing is that teams with better records (i.e., by your measure, stronger teams) should debate one another and teams with worse records (weaker teams) should debate one another.  Structurally, power pairing creates differences in strength of opposition, not equality.>>>
You are right IF we say that everyone at the tournament should have had the same strength of opposition.  We don’t.  We believe that everyone should have the opportunity after presets to find their level and that this proves most educational as well as best ensuring competitive equity in the quest to clear into elimination rounds.

<<<We can presume that the opponents faced by the two teams in the four preset rounds are balanced, since that is actually a principal goal of a preset schedule.  Focusing then on the latter four rounds, power pairing will have required the first team to debate opponents in the 1-3, 2-3, 3-3 and 4-3 brackets, while the second team will have debate opponents in the 4-0, 5-0, 5-1 and 5-2 brackets.  By design, power pairing creates a wide discrepancy in the strength of opposition.>>>
The principal thing that is missed and the factor that causes the greatest anomaly is that we can look at matching based on the current record of the opponent but we then don’t look at how they did after the round in which we debated them.  Two teams can debate teams with the same record in a given round but the subsequent performance of the two opponents can be radically different.  For instance, in round 5 we can observe a round where a 3-1 team gets pulled up to meet a 4-0 team.  We call it a mismatch and commiserate with the 3-1 team.  But is that round still the same mismatch if we discover that the 4-0 team loses the next three debates and 3-1 team wins the next three.  If those two teams are candidates to be pulled up in round 8, who actually had the mismatch?

<<<And the disparities become even worse if you bring into the mix a team with high points and low strength of opposition that is pulled up from a lower bracket and therefore has the lowest ordinal seeding.>>>
That actually isn’t how things would work.  A team that is pulled up based on having a low average strength of opposition is not automatically placed at the bottom of the new bracket.  But within the new bracket, the component of the formula determined by their seed is not based on their seed within the original bracket but rather what their seed would be if they simply had the additional win but the same distribution of points.  This part of the process is exactly the way it is done at present.  Now the question will be, is it going to be more likely than at present that the pulled-up team will end up at the bottom of the new bracket.  That’s actually a fairly frequent artifact of the current practice (it’s an open question as to whether we think this is more or less fair).  But assuming it does happen that way, the self-correcting part of the process will give the purported “victim” of the pull-up even more credit in future rounds for having faced the team at the top of the bracket.

<<<4) Trying to force the power pairing process to serve a function for which it is not, and cannot effectively be, designed strikes me as an unwise use of resources.>>>
I actually don’t see this as forcing power pairing to serve a function that it isn’t designed to do.  I’m actually quite convinced it will prove to be the opposite but in any case I am open to empirical disconfirmation.  I’ll also listen to poignant anecdotes.
As a final observation, I should mention the current alternative.  We define most powered rounds as high points vs. low points.  Other than protecting teams with high points (which presumably are the best teams but not necessarily), we haven’t ever spent much time defending what competitive objective that pairing system provides.  I think that it does provide some benefits PROVIDING that we take into consideration opportunities to equalize competitive equity so that a non-clearing team doesn’t have the experience of losing to a much more competitive slate of opponents than an “arguably” weaker team that clears having faced weaker opponents.
Logged
glarson
Sr. Member
****
Posts: 477


« Reply #14 on: November 06, 2011, 06:12:45 PM »

Reply to Hester.

Mike,

The original post included data from Kentucky that is actually very commonly reproduced in all of the large tournaments that I run.  There are typically very large differences between the average strength of opposition of the "hardest" and the "easiest" schedule within each win-loss bracket.  It is also the case that close but non-clearing teams will also have had significantly more difficult schedules than several of the clearing teams.  If interested, I can supply data from a broad range of tournaments.  As Matt observed, I very slightly overstated the worst-case schedule discrepancy at Kentucky due to the fact that one of the teams met a team that subsequently withdrew.  But the difference was still huge.

To put this into a different context, I don't think that it is particularly controversial that strength of opposition can impact competitive equity.  Practically every computer as well as every observer of college football is trying to figure out how to assess Boise State's position within the BCS.  All of the discussion focuses on strength of opposition.  Now we have two options.  We "could" try to figure out how to accommodate strength of opposition into formulas for determining final seeding.  At some point we make try to take that step as a community.  But short of that, we have numerous ways that we can help equalize strength of opposition before we ever get to trying to determining final seedings.  In a crude sort of way, that's what advocates of a playoff system are advocating.  They want to make sure that a Boise State would have to play a team with a worse record but presumably a much higher strength of opposition before they could qualify for a national championship game.  But of course, that just pushes the problem 2-3 games back.  In determining who qualifies for the playoffs, the selection committee would have to make the same judgments that invariably involve trade-offs betweeen performance and stength of opposition (as in the basketball selection process).  We will never be able to entirely erase the question.  But in an 8 round tournament with 2 to 4 presets and 8 total rounds we can adopt procedures that will get us much closer than we do now.
 
Logged
Pages: [1] 2
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.0.19 | SMF © 2013, Simple Machines
SMF customization services by 2by2host.com
Valid XHTML 1.0! Valid CSS!