[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
[Author Index]
Return to main CEDA-L Archive Page
judge variation
>"Z-score" is indeed a statistical procedure that attempts to
>normalize for variations in sample mean and sample standard
>deviation. In my software, it is encoded using the following formula.
Let's see... having just discussed z-scores the last few weeks in my
stats class :), I want to ask a few questions:
1) given the succeptibility (sp)? of the std. dev. to extreme values,
"penaly" points (e.g., the 20 I awarded once at UNI) would seem to have
an inordinate effect on scores... wouldn't they?
2) the sample size for some judges would be 6 scores (3 rounds) or maybe
even less. Such a small distribution seems unlikely to be particularly
reliable. I realize that Gary is also using pop mean & std. dev to
mitigate extreme effects, but IMO this only camoflages the problem.
3) And, wouldn't speaker points be negatively skewed (since most people's
mean will be at least 26 & the max is 30)? Z-scores don't have the same
meaning if the distribution is not normal...
I dunno... I think controlling for judge variation is a nice idea, but I
don't think z-scores or any other statistical measure will do it, given
the available sample sizes. Perhaps if someone wanted to bother to keep
files on judges across tournments which could then be consulted
electronically to provide better samples.. but I'm not volunteering.
My answer: ditch the 30-point scale. Go to a 100-point, letter-grade
scale. There would still be a good deal of variation, but at least the
range would be less compressed (27-30), thus giving us more power to
discern differences between good debaters. IMO, this would allow high-low
points to be a better measure of achievement than they currently are.
-- Glenn
Follow-Ups:
Archive created by Jonathan Stanton (jonathan@cs.jhu.edu)
Return to main CEDA-L Archive Page