[Date Index] [Thread Index] [Author Index]

# RE: BREAKING BRACKETS

```>      For that reason I am entering this debate as a way to make
> myself open to persuasion.  I will play devil's advocate against
> Berube's critique of NOT breaking brackets.

I too have started out as devil's advocate in this discussion but
I'll stay in the role a bit longer because I like the statistical
metaphor you take on below.  (Perhaps this illustrates some kind of
value of debate empirically if we are both arguing counter-attitudinally on
the issue.)

> -------------------------------------------------------------------
> >THE DIFFERENCE BETWEEN A 4-2 WITH 247 pts. and a 4-2 WITH 245 pts.
> >is  not significant.
>      Statistical significance is a measure of sampling error.
> Period.  Although it is used in common discourse and in Berube's
> post as a measure of how 'big' or 'important' a difference is. . .

Yes, this is common discourse.  Surely the metaphor of statistical
testing is permissible here, although technically it may not be applicable.
===============================

> There is no
> sampling going on.  The eight critics that an individual receives
> are not meant to sample the population of critics (if it were we
> would have to have a MUCH greater number of critics, and we would
> never advocate as Berube does the NON-RANDOM (mutual preference)
> selection of critics).  We view those eight critics as a
> population, not a sample.  Their marks do not represent (or attempt
> to approximate) a more general evaluation of a speaker -- they
> directly constitute that evaluation.

Not so, depending on your perspective.  All judges at the tournament
compose the relevant population from which non-exclusive samples of 8 are
drawn for each team.  [Remember we are only dealing with a metaphor of
sampling here!]  Questions of whether it is a random sample or a skewed
sample are relevant but not dispositive because most statistical tests
are robust enough to withstand violation of their assumptions.  In this
case, we are attempting the impossible anyway in seeking the null
hypothesis as our predicted outcome (i.e. that all panels are equal) so
there are skews, but that feeds Berube's metaphor.
A better metaphorical model would be nested replications.  That
is, we create a set of results with N teams where each team has 8 different
opponents cross factored with 8 different judges.  We could compare the
composite mean scores of each major factor (that is, we could examine
strength of opposition for each team, or judge variance in points awarded
to each team) but we usually settle for a distribution of results based
on wins, choosing in a (metaphorical) one-tailed test to select the
highest 8 or 16 or 32 records as "qualifiers."  [We do some violence to
statistical significance metaphors in that we often choose not the
clearly best teams (the top 5% or 1%) because our N population is fairly
small,  but something approximating 1/2 N for the bracket, to avoid
omissions due to the uncontrolled factors of judge and opponent.  We are
much more willing to be selective for speaker awards, giving only 10 to
the top participants of the 2N debaters present because we have much more
confidence that the top performers with our random sample of 8X8 critics
and opponents are clearly above average.]  When there are ties on wins, we
choose to rank within records on points (or perhaps
adjusted points, which tries to reduce the excess influence of extreme
scores and hence create more homogeneity of variance between teams to be
compared) but what we are really doing is shorthand for comparing sample
means (whether we divide total points by 8 for mean or just compare total
points won't change the order).
==============================

>      Variance is a mesure of dispursion around a central point.
> Period.  It is not synonymous with "variation" or "difference."
> The term carries no meaning when applied to a simple difference
> between two numbers.  There is no central point around which
> speaker points are grouped, so there is no variance to measure.

Again, think conceptually [not literally] of post hoc tests for ANOVA
(analysis of variance).  We have a series of random samples, each with a
mean.  If those group means had been drawn randomly from the same
population then (Central Limit Theorem) no matter what the shape of the
population distribution there would be a normal distribution of the
samples in the sampling distribution.  We ratio the variance between groups
around the grand mean (average for all individuals within the population)
to variance within groups, where the total variance of all individuals
around the grand mean is the sum of variance between groups and variance
within groups (analagous to error).  If that ratio is large
enough not to be a product of chance due to sample size and
number of groups we declare it to be statistically significant
[arbitrarily .05 or less] and then we consider which individual means differ
from which others by ordering them and establishing critical values for
differences to test whether any two means are individually different from
any others [had we a justification that LSD (least significant
difference) test would be equivalent to a t-test for the difference between
means, but statisticians are usually more conservative post hoc].
Having narrated the metaphor, let's apply it discursively.  Once
we identify the relevant population as those in elims with similar
records, we can ask how big a difference between two teams on speaker points
(or mean speaker points) it would take before we say that those results would
not have happened by chance.  Traditionally we've said a difference of 1
is significant enough to leave teams as they are, which is hard to
justify (or less than 1 if we went to ranks or adjusted points to break
ties).
Berube's metaphor posits that given the
imprecise method of arriving at the initial grouping by records (see
Ross' recent post on back-door records), the failure to control for the
factors of judge variance and opposition difficulty, and the arbitrary
decisions on tiebreaking factors within similar records (adjusted points
rankings arrived at by points within records ARE ESSENTIALLY RANDOM
(emphasis on essentially).  Therefore, instead of saying that the order
arrived at by current methods is meaningful, a tournament director might
be justified to simply redeal all the same-record cards again to avoid the
undesired outcome (school meeting self).  Carried farther that adjustment
would be made across records, too.  Carried less far, as Berube and Ross
advocated in different posts and as Larson described procedurally, one would
only make the minimal adjustment of switching two proximate teams.  The
key point is that NO REAL ADVANTAGE OR HARM OCCURS by the switch, given
that so many other factors are involved [side constraints, particular
strengths and weaknesses of teams, etc.].  The seeding done by the prelims
makes sense by record, at best, but is otherwise relatively random.
Or you could go even less far and say that swaps won't be made
outside of the same record, or if the opponent record would change as a
result.  Or you could say that a switch of more than a certain number of
points would be a "significant" switch and would not be made.  All these
changes assume a fairly large tournament with few chances of same-squad
teams re-meeting each other.  Just as mutual preference judging fails in
small tournaments, this logic fails if too many teams from one school
clear or if changing one team creates a conflict for another school.
=============================

>  The difference between a teams with 247 and 245 points, then is an
> important one:  it is a two point difference.  Any application of
> STATISTICAL terms to this difference is only for effect:  they are
> not statistics, because there is no sample.

And they therefore have [in this analysis] no more status than two
random numbers, which could be swapped.
=====================

> If points are good enough to determine breaks
> why are they not good enough to establish seedings?  To say that
> point differences mean nothing is to deny the possibility of
> breaking on points (which would force more teams out of elimination
> rounds in the first place, which is something Berube doesn't want).

Indeed.  Many larger tournaments take all 5-3s or no 4-4s, creating
partial elims.  Personally I'd be happy with smaller elims of only 6-2's,
even given the so-called unreliability of judging.  That would give shorter
but better elim days, better judging (more folks in the pool at 10 am
than at 10 pm.) and a statistically more valid selection of
"significantly above average teams" to showcase, given that any team
with 6 wins in a powermatched tournament is empirically successful that
weekend, while good but unlucky 5-3s could wait a week.  We don't do that
because of the pressure to let more people debate as much as possible [a
reason to break brackets and have 1 more debate] or some notion of
sweepstakes points [ditto]. Most damaging, if we take more teams to be sure
the good teams get out, that feeds Berube's claim of unreliability which
in turn feeds how arbitrary is ranking within records by points.
====================

On a personal note [Devil's advocate hat put back into closet] I
think the perception of fairness in the current process is still high,
and that presumption calls for retention until the case for breaking
brackets is sold.  I know many tournaments do it fairly and legitimately,
and yet many others don't break brackets.  You can make whatever call and
probably not have spikes driven through your heart.  I think, however,
that before bracket breaking becomes universal we want to consider the
potential for abuse (if not cheating) or the practical implications for
participation in the activity.  Dismissing Berube's claims that programs
and/or debaters are discouraged by current practices without
investigation is probably an error, as would be making changes without
investigating whether that action would deter others who perceived
breaking brackets as unfair. I'd strongly differentiate between your
individual right to choose for a tournament you run and a general endorsement
of the practice as normative.

****************************************
Glen W. Clatterbuck
Illinois College
****************************************

```

References:

Archive created by Jonathan Stanton (jonathan@cs.jhu.edu)