Newsgroups: rec.games.frp.archives From: [email protected] (Glen Barnett) Subject: PAPER: Testing Dice for bias Message-ID: <al#[email protected]> Organization: The Australian Graduate School of ManagementDate: Wed, 2 Dec 1992 04:41:08 GMT Approved: [email protected] Lines: 749 The following information is intended for distribution over Internet, and outside of that may be copied for personal use only. (c) Glen L. Barnett, 1992. All rights reserved. I recently responded to a thread on rec.games.frp.dnd about testing dice, in order to fix up a few misconceptions and make some suggestions. On the suggestion of Coyt Watters I'm putting this on r.g.f.archives, which I think is a good idea. What follows has some major additions, however. In this post, I will begin by discussing various posts on the subject of testing dice, then show how to do the test under discussion in the thread (the chi-squared goodness-of-fit test) properly, and then talk about more appropriate tests. --------------------------------------------------------------------- In article <[email protected]>, [email protected] (Allan Longley) writes: [..stuff deleted..] >testing dice for bias. Well, here is a test to use. I haven't actually >tested this yet, but it should -- in theory -- work. And no, this is not >a copy out of the old Dragon magazine, but it is the same test -- its a >pretty standard test. I will use simple terms -- so all you math/stat >people out there, don't correct the fine points, I know. [description of chi-squared goodness-of-fit test deleted] Since Allan asks for no correction of fine points, I will attempt to limit myself to major problems. This is not intended as a flame on Allan, but this is fairly important stuff, and should be explained correctly. If at any stage I get less than pleasant, please accept my apology in advance. While the calculations that Allan describes give the correct value of a chi-square goodness-of-fit statistic (which he calls "Indicator"), you should be *very* wary of interpreting the results in the way he describes, as I will explain: Let us assume you have 40 dice that (unknown to you) are all perfectly fair, and you wish to test all of them, to see if any are "biased". The way Allan has set his test up, you'd expect 2 of them to give results below "Probably Fair", which he says indicates the die is probably unbiased. That is, you have 40 fair dice, and you will expect to regard only *two* of them as probably O.K.! Similarly, you will expect to consider two of your "purely fair" dice as probably unfair. Of the remaining 36, you will expect 18 scores between "Probably Fair" and "Maybe" and 18 more between "Maybe" and "Probably biased". For these 36, you have to do the test again, under Allan's scheme. If you get both results below "Maybe" (you expect 9 of these) you say "Probably Fair". Similarly you expect 9 above "Maybe" on both trials. So we have (after repeating the test for 90% of the dice): Number of Fair Dice: 40 Expected number "probably fair": 11 Expected number "probably biased": 11 Expected number which we don't know about: 18. So over a quarter of perfectly fair dice will be called "probably biased". If we continue testing those remaining 18 we are still undecided about, the problem gets worse. Other problems: Allan says: "The column titled "Maybe" are the Indicator values where there is a 50% chance that the die is fair and a 50% chance that the die is biased." (A) This is just plain wrong. The column he refers to is the value that a test on a *fair* die will exceed 50% of the time. This is very different, and probably explains why Allan misunderstands the whole interpretation of the results. (B) If any of you can't see why what I said (B), and what Allan said (A) are totally different, don't despair. This stuff is not always obvious from the start. If you can follow the rules of an average RPG, you are smart enough to understand a few non-trivial statistical ideas. I'm quite happy to provide further clarification to the net if the demand is there. > >From Table 1, it appears that the d4 tested may be "Fair" but another test > should be done. I'd say not. A reasonable interpretation of the result is "There is no reason to doubt that the die is O.K.". Incidentally, the test statistic of 2.00 obtained in the example is only 2/3 of what you'd expect with a fair die. The value of 2.00 will be exceeded almost 60% of the time by a test on a fair die. --------------------------------------------------------------------- In article <[email protected]>, [email protected] (Allan Longley), in response to Michael Wright, says: |In article <[email protected]|[email protected] | (Michael G. Wright) writes: |>[email protected] writes: |> |>> This is called a chi-square test, and an article with the |>>procedure and numbers for it appeared way back in Dragon issue |>>#74... Thank you, Mr. Longley, for reposting it (or did you come |>>up with it in isolation? =) ) for the benefit of those who don't |>>have the issue (probably most readers). | |Yes, I seen the issue. The chi-square test is a standard statistical test |for determinig if a data set matches a particular distribution -- so, no, I |did not come up with the test in isolation, its been around for a lot longer |than D&D. I don't actually have the issue, so I didn't copy it for the net. | |I've been playing with the chi-square test and you know what I found out -- |ALL DICE ARE BIASED!! Well, that's not true -- all except d4's and d6's are |biased. Of course, this really shouldn't be a surprise. So, I've been |looking at modifiying the chi-square test for "real world" dice -- more on |this in a later post. Allan is correct that the test has been around a lot longer than D&D. The test, due to Karl Pearson, is nearly 100 years old. Its no surprise that Allan finds that all real dice are biased: i) Its impossible to make a truly fair die (obviously). Its just that most are close enough that we don't care too much. The chance of getting "close to fair" will decrease with the number of sides. ii) Allan's testing method will call more than a quarter of fair dice (assuming they existed) biased. Even the fairest dice you could buy have a good chance of being called biased. |>Actually, I use a program I made to roll dice for stats. Unfortunately, nobody |>in the party wants to use it, because dice rolls invariably end up better. I |>think this must be because of the pseudo-randomness of the program. |>Anyone out there that knows better? | |I wouldn't want to use a computer generated die-value while playing AD&D. |THe thrill of the "rolling die" is part of the game. Also, with reference |to the above, most players will have a favourite die/dice due to the |inherent bias found in real dice. The trick is to find the dice that are |biased beyond reasonable playability. In response to Michael G Wright: Michael's discovery that hand-rolled dice often come out better could occur for a couple of reasons: i) Players tend to hang on to "favourite" dice that "roll well" (i.e. come up with good results). So biased dice have some chance of concentrating into the hands of players. As long as this isn't too extreme, it probably doesn't matter too much. (Allan quite correctly identifies this reason). ii) Players don't really roll randomly. I had a fairly long email discussion with Sea Wasp on this topic just recently. Even unconciously, you can pick up the dice in a "non-random" fashion, so that a good roll will tend to be followed by another good roll if you don't roll tooo vigorously. You may notice that after a bad roll players tend to throw harder. This may be more of a problem, as some players are *much* better at it than others. If it becomes too noticeable, you may wish to invest in a dice cup, or mock up a craps-table affair. Allan's comment that "The trick is to find the dice that are biased beyond reasonable playability" is spot on. Exactly correct. Remember it, because I'll come back to it later. I think the above discussion also answers Paul Kinsler's questions. [in article <[email protected]>, [email protected] (Paul Kinsler) asked for clarification of Allan's comment that all dice are biased]. --------------------------------------------------------------------- In article <[email protected]>, [email protected] (David Alexandre Golden) writes: [stuff deleted] >I once did something along those lines to test whether my DM's die was >fair. (It would roll 20's a lot. Often on command.) > >To test the die, I made a histogram (I believe is the term for it) like this: > >x x x x x >x x x x x x x x x x x x x x x x x x x x >x x x x x x x x x x x x x x x x x x x x >1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 > >With an "x" being each time that number came up. A totally fair die would >have a straight line across, assuming it was rolled enough times. The >question is, what is enough times? I rolled a d20 about 380 times (not bad >if one person rolls and the other makes an x in the column) and while it >gave a fairly good idea that the die was biased, the biased numbers had only >occured a couple times more than the average. (If I remember correctly). >So I wrote a computer program to do the same thing until the deviation between >the highest and lowest number of occurances was less than about 15% of the >average number of occurances. (i.e a reasonably smooth profile). The >computer required SEVERAL THOUSAND ROLLS to do this. Your idea of making a histogram is a good one. In fact all the chi-squared test does is the same as drawing a line across the histogram where your expected "straight line across" would go, looking at the deviations from that, squaring (to get all positives) and adding the squared deviations up and dividing by that expected number. This gives a single overall measure of deviation from uniformity. The advantage of looking at the histogram is you see where the the differences are, but you can't tell how big they "ought" to be for a fair die. The actual number in the cell will be approximately normally distributed with mean equal to the expected number in each cell and standard deviation approximately the square root of the mean. In the above example, we'd expect Dave to get 19 in each cell, so the standard deviation is about 4.35. That is, we'd expect to get about 2/3 of the cells with counts in the range 15 to 23, and about a 2/3 chance of all but 1 or 2 of the values inside the range 11 to 27. [ some of my own discussion deleted - see the section "Tests based on the histogram" below] > ... Still, the point is that I'm skeptical >that the "fairness" of a die can be determined in only a hundred or so rolls. >(d4 maybe... d20 no way!) Well, in fact Allan's suggestion was to use 20 rolls per cell, so he'd use 400 rolls for a d20 and 80 rolls for a d4. But in any case you can never decide that a die is actually fair. If you do a test and get a result close to what you expected if the die was fair, you have a lack of evidence against the hypothesis of fairness (which is the default assumption for a statistical test of biasedness in a die - the "null hypothesis"). What you get is either a higher degree of evidence against the hypothesis of fairness (by getting a result that is very unlikely with a fair die), or a low degree of evidence against fairness. It's like in a court case, (a criminal case), where the defendant is assumed innocent until proven guilty (innocence is the null hypothesis), but evidence against the defendant is presented by the prosecution. The jury then decides either "guilty" if there is strong enough evidence, or "Not guilty" if there is not. They don't declare innocence. So we can't determine "fairness" anyway. The question we need to ask is: If the die is biased, will a hundred rolls (or whatever number) be enough for us to have a good chance to pick up that difference, while at the same time, not "convicting the innocent" too often? Whether it is enough depends on how big a difference you think it is important to pick up. ----------------------------------------------------------------------- In article <[email protected]>, [email protected] (Adam Dray) writes (in response to Dave Golden): >In other words, a histogram shows very little. Random doesn't mean >necessarily that you'll get an even distribution. It just mean the >probability that you won't get an even distribution is proportional to >the number of sides on the die, and the number of times you roll it. I disgree with the first sentence. The final sentence above is wrong. The more you roll it, the more even the distribution will be, as long as the die is fair. It doesn't really depend on the number of sides. (except as far as the negative dependence between cell counts is reduced for more sides). >Notes about the fairness of dice: > >Sharp-edged dice are better than smooth-edged dice. They're also more ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Not always, but this may be true more often than not. >expensive, however. Rounded dice are often inked by coating the >entire die with ink, then tossing the die in a "tumbler" (similar to >tumblers for smoothing rocks) until all the die on the outside is >gone. Thus, the ink is left in the crevices where the numbers are. > >Theoretically, the grooves for the numbers can make one side a more >likely outcome. Official casino dice don't have inset pips. If you do it right any effect will be swamped by other manufacturing defects anyway. [some stuff deleted] > >GameScience did tests on other manufacturers' dice. They found >certain numbers to be more likely. I've heard that the real 100-sided >die tends to roll certain numbers more often. It is impossible (both effectively and theoretically) to get a fair 100- sided die. The practical problems are more important than the theoretical ones. >Filing corners off your dice can make certain outcomes more probable. >Natural wear can do the same thing. > >For most people, none of this matters one damn bit. =) In general, no, it doesn't matter. It's encouraging to see that so many people (just about all posters on the topic) realise this. --------------------------------------------------------------------------- Interpreting the test statistic (Allan's "Indicator") Carry out the calculations as described by Allan*, but use any number of throws per cell (possible outcome) you like (I'd suggest 10 as a minimum, because otherwise the tabled distribution is out a bit). The more rolls you do, the better chance you have of picking up a difference of a given size. The value of 20 that Allan suggested may well be a reasonable choice in most circumstances. Allan gives the calculations for two different numbers of throws (20 and 10 per cell, but in different posts), so you ought to be able to generalise. * The calculations given by Allan may no longer be available to you, so an indication of how to do the calculations is given here: Roll the die many times, say 20 times per face. Record each result (I suggest you make up a tally sheet). Calculate the difference between the number of times each face came up and the expected number (20 in this case). Square these values and add them. Divide by the expected number per face. This is your chi-squared statistic. E.g. d4: Roll 20 times per face = 80 rolls Face: 1 2 3 4 No times 23 18 15 24 expected 20 20 20 20 difference 3 2 5 4 diff^2 9 4 25 16 Sum = 54, chi-squared value = 54/20 = 2.7 If the result is less than the final column of Allan's table 2 (which are the tabulated values for a 5% significance level), you shouldn't worry too much, there is not very strong evidence of bias - in fact 1 in 20 tests on a fair die will score worse than this. If the result is much bigger than the value you have some cause for concern. A result bigger than the 1% column below is quite unusual if the die is fair (a result at least this big only occuring 1% of the time), so it gives us good reason to suspect bias. A small table of the chi-squared distribution: 5% 1% df d4 7.81 11.34 3 d6 11.07 15.09 5 d8 14.07 18.48 7 d10 16.92 21.67 9 d12 19.68 24.72 11 d20 30.14 36.19 19 (these results came from a computer approximation to the chi-squared distribution. They should be accurate to the figures given.) If you want a more "cookbook" approach; if the result exceeds the 1% value, its probably biased. If its between the 1% and 5% values, there is a moderate degree of evidence that its biased, but it still might be OK. If its less than the 5% value, you don't have any reason to think its biased on the basis of the test. You will find more extensive tables in most elementary statistics books. (references for the chi-squared and Kolmogorov-Smirnov tests are at the end of this article). You look up the df (degress-of-freedom) that are one less than the number of faces on the die (e.g. d4 -> 3 df). A note on pronunciation: The Greek letter chi (the capital looks like an X, and the lower-case has one of the two crossed lines a bit curly) is pronounced with a hard "ch" like Charisma, and the word rhymes with pie. Note that mathematical symbols come from *ancient* Greek, so no arguments from any modern Greeks please. This will provide a reasonable all-round test for bias in a die. ------------------------------------------------------------------- Why you probably don't want to do the chi-squared test: (at least for d8 and above) The chi-squared test will pick up any kind of deviation from a purely even distribution. However, we are much more worried about some kind of deviations than others. For example, I'd be more interested in knowing that "20" came up too often on a d20 than knowing "10" came up too often. The first could affect play substantially, the second probably only a little. We should use a test with a better chance to pick up the kind of deviations from fairness that are most important to us (which will trade off with less chance of picking up deviations we are less concerned with). Let us consider a more complete example: Imagine we have two d20's we'd like to test, and that in fact (but unknown to us) they have the following (percentage) probabilities: (the rows are: Face number % prob 1st die % prob 2nd die ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 5 4.5 6 3.5 7 2.5 8 1.5 7 5 5 7 1.5 8 2.5 7 3.5 6 4.5 5 1.5 1.5 2.5 2.5 3.5 3.5 4.5 4.5 5 5 5 5 6 6 7 7 7 7 8 8 A fair die would have 5% right across, of course. These 2 dice can be obtained from each other by relabelling the faces. The first die will be reasonable in play, because, of course, we don't try to roll 'exactly 8' or 'exactly 9', but 'less than 8' or 'greater than 11'. The first die is never out by more than one twentieth of the required probablility (e.g. probability of a 2 or less is 9.5% instead of 10%) in either direction. It has the correct average, and almost the correct standard deviation (the difference is tiny). The second die would be very unbalancing in play: it has about a 2/3 chance (66%) of rolling 11 or higher, and a 20 is more than 6 times as likely as a 1. The mean is almost 13. The standard deviation is also out, but that's relatively unimportant. The chi-squared will rate them as equally bad! So a good test should be likely to identify the second die, but we might be prepared to sacrifice some of our ability to pick up the first, since it will make little practical difference in play. (I said I'd come back to this point!) Note that almost any deviation on a d4 will be important (there are only 4 different values), and to a lesser extent a d6. I'd stick with the chi- squared test on those. There are many tests that will do what we want. I will present only one such test*. (This is not to say that a properly applied chi-squared is not good, just that a test more closely tailored to our specific question of interest will be even better.) * two tests if you count "Tests based on histograms", below. The Kolmogorov-Smirnov test: Collect data as for the chi-squared test, up to the point where you start doing calculations. That is, lay out like this (you could run down rather than across): Roll: 1 2 3 4 5 ...... Count: 17 19 22 27 24 ...... Expected: 20 20 20 20 20 ...... Now add up your counts and expected counts, writing the partial totals as you go: Roll: 1 2 3 4 5 ...... Count: 17 19 22 27 24 ...... Expected: 20 20 20 20 20 ...... Sum Count: 17 36 58 85 109 ...... Sum Exp: 20 40 60 80 100 ...... Now find the differences (without sign): Sum Count: 17 36 58 85 109 ...... Sum Exp: 20 40 60 80 100 ...... Difference: 3 4 2 5 9 ...... The last difference will be zero, so you don't have to work out the final column (I still would as a check). Divide the largest difference (9 is the largest difference above, for the calculations you can see) by the number of rolls you made altogether. This is your test statistic. Let's call the value D. You can look it up in most books on nonparametrics, which will have tables. However, you would be better to use the table below, for reasons I'll discuss in a second. You multiply D by the square root of the number of rolls (equivalently, divide the largest difference by the square root of the number of rolls), and compare with: 5% 1% d# 1.08 1.35 4 1.10 1.37 6 These values apply pretty well irrespective of 1.11 1.38 8 the total number of rolls, but I would use at 1.12 1.39 10 least 10 rolls per face. 1.12 1.40 12 Note also that these values come from simulation, 1.14 1.42 20 and are hence not exact. This doesn't really matter. and interpret as I suggested for the chi-square test. You may find the following values in tables: 5% 1% 1.36 1.63 (irrespective of the number of sides on the die) the reason these are larger is that they are based on the assumption that the distribution the data are from is continuous (effectively, a *very* large number of faces on the die would give these values). If you use the textbook values, the test will be conservative (a fair die will reject slightly less often than the supposed 5% and 1% for the above table), due to the distribution of values being discrete (d20 generates only integers, not anything between). So, for our above example, assume there are no larger differences than 9, and that we made 400 rolls on a d20 (hence the expected number in each cell is 20, as above). Then D is 9/400 = .0225, which if you can get tables you'd look up. We made 400 rolls, so we could use the table above: the square root of 400 is 20, so D x 20 (= 9/20) = .45. This is much less than the 5% value, so there is little evidence that the die is unfair. There are tests which are probably even more appropriate, but these two (chi-squared and K-S) will be enough for you to get a good idea of any suspect dice. Note: If you suspect a die, and decide to test it, don't use the rolls that made you suspect it in the test. Generate a new set. e.g. if you are all recording your rolls as you play, and one players' results look funny, don't then test those recorded values - you have to generate a new set. ------------------------------------------------------------------------- Testing a die based on the histogram of rolls: The histogram approach can be turned into a test of sorts as follows: After drawing the histogram, 2 lines can be drawn either side of the expected (mean) result. If all histogram bars lie within the inner lines, there is no strong evidence of bias. If any of the bars go outside the outer lines, there is fairly clear evidence of bias. If one of the bars lies between the inner and outer lines, then there is some (mild) evidence of bias, but its is not really clear. You may wish to then perform a further test on the probability for that individual side, as described in the next section (Testing an individual face). If several of the bars lie between the inner and outer lines, we have a stronger indication of bias. Where do we draw the lines? I have worked out values for rolling 20 times for each face (as in the other examples, 80 times for a d4, 400 times for a d20). The bars on the histogram must actually go past these values. You could think of these values as giving "Acceptable Ranges" (literally, 95% and 99% acceptance regions) for the histogram. A fair die will give histograms with one or more bars outside these ranges 5% and 1% of the time respectively (actually just under). Table for 20 throws per face Approximate formula: d# 5% 1% Let N be the total number of rolls. 4 8-34 6-37 Let c be the number of faces on the die. 6 9-33 7-36 Let e be the expected number of times 8 9-33 8-35 each face will come up ( e = N/c ). 10 10-32 8-35 Then the lines go at e +/- A x sqrt[e (c-1)/c], 20 11-30 9-32 and the value for 'A' comes from the table below. All fractions should be rounded up. e.g. N=160, c=8, e=20 give: 5%: 20 +/- 2.73 sqrt (20 x 7/8) or 8-32 1%: 20 +/- 3.22 sqrt (20 x 7/8) or 7-34 Table to go with approximate formula: d# 5% 1% 4 2.49 3.02 Due to being in the 6 2.63 3.14 extreme tails of the 8 2.73 3.22 distribution, combined 10 2.80 3.29 with slight asymmetry, 12 2.86 3.34 the ranges we get are 20 3.02 3.48 sometimes out a bit. This is not a big deal. Example: We throw a d20 400 times, and record the results and from the table above, we draw the inner lines at 11 & 30, and the outer lines at 9 & 32, as well as a reference line at 20. (in the histogram below, "." =1 count, ":" =2 counts. The horizontal lines aren't quite in the correct positions; they ought to be about a quarter to half a character position lower.) Counts (34) 35 + > < |__________________________:__________________________________ (32) |__________________________:__________________________________ (30) | : | : 25 + : . | . : : : |__:__._____:_____:________:____________________:__:__:_______ (20) | : : : . : : : . : : : | : : : : : : : : . : : : : : : : 15 + : : : : : : : : : : : . : . : : : : : | : : : : : : : : : : : : : : : : : : : : |--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:- (11) |--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:- ( 9) | : : : : : : : : : : : : : : : : : : : : 5 + : : : : : : : : : : : : : : : : : : : : | : : : : : : : : : : : : : : : : : : : : | : : : : : : : : : : : : : : : : : : : : `--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+- face: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 counts: 22 21 18 23 19 22 20 16 34 17 19 15 18 15 14 22 25 24 18 18 ^^ One value (34) goes outside the 1% values (technically, you could say goes outside a 99% acceptance region), so it seems our dice is biased. More particularly, it rolls too many 9's. Whether this will affect a game very much is another point. (If it had been "1" or "20", however, perhaps this would result in a large effect on the game). Testing an individual face: When you suspect a particular face is coming up with the wrong frequency, or only wish to test a particular face (e.g. 20 on a d20), throw as before, but you can compare with a narrower range, as below. However, you can't use the data that made you suspect the face in this test, you must generate a new set of rolls. E.g. If you a histogram and find the results for 7 look odd, you can't use those numbers in this test. In that case the ranges given above (for testing an entire histogram) are appropriate. For the tables below, as for those above, you need to exceed the range given to call the die 'probably biased'. When you don't know the direction already (two-tailed test): (If you are in doubt, use this table rather than the next one) Table for 20 throws per face Approximate formula (reasonably accurate): d# 5% 1% Let N be the total number of rolls. 4 13-28 11-30 Let c be the number of faces on the die. 6 12-28 10-31 Let e be the expected number of times 8 12-29 10-31 each face will come up ( e = N/c ). 10 12-29 10-32 Then the lines go at e +/- 1.96 x sqrt[e (c-1)/c], 12 12-29 10-32 (5%) and for 1% at e +/- 2.58 x sqrt[e (c-1)/c]. 20 12-29 10-32 All fractions should be rounded up. e.g. N=160, c=8, e=20 give: 5%: 20 +/- 1.96 sqrt(20 x 7/8) or 12-29 (rounded up) 1%: 20 +/- 2.58 sqrt(20 x 7/8) or 10-31 (rounded up) When you think a particular face is coming up too often, or if you think a particular face isn't coming up enough (one-tailed test): Table for 20 throws per face Approximate formula (reasonably accurate): d# 5% 1% Let N be the total number of rolls. 4 14/26 11/30 Let c be the number of faces on the die. 6 13/27 11/30 Let e be the expected number of times 8 13/27 11/30 each face will come up ( e = N/c ). 10 13/27 11/30 Then the lines go at e +/- 1.65 x sqrt[e (c-1)/c], 12 13/27 11/30 (5%) and for 1% at e +/- 2.33 x sqrt[e (c-1)/c]. 20 13/27 11/31 All fractions should be rounded up. e.g. N=160, c=8, e=20 give: 5%: 20 +/- 1.65 sqrt(20 x 7/8) or 14-27 (rounded up) 1%: 20 +/- 2.33 sqrt(20 x 7/8) or 11-30 (rounded up) These values are given as either/or i.e. since you have already specified a particular direction, you will compare with only the higher values or the lower values, not both. Example 1: You decide to test you new d20 to see if it rolls the correct number of 20's, but you don't believe it to be biased in a particular direction. You roll 400 times, and 35 times you get a "20". From the "two-tailed" table above, you can see that's outside the outer (1%) range. It seems your d20 rolls too many 20's. Example 2: Another player seems to be rolling a lot of 1's on her d4. You decide to test it whether 1 comes up too often. You roll it 80 times, and get the following: 1 2 3 4 13 26 25 16 Since you decided to test if there were too many 1's, you can only see if the number of 1's exceeds 26, which it does not. You can't say, after generating the data "Oh, actually, perhaps it rolls too few ones", or "perhaps it rolls too many 2's" without generating a new set of data for the new hypothesis. You must never base what you are testing for on what you spy in the set of data you use in the test. Our only conclusion on this test: the d4 doesn't roll too many 1's. You may like to then generate a new set of rolls to see if it rolls too few 1's. --------------------------------------------------------------------- As further examples, here are the chi-squared test and Kolmogorov-Smirnov (KS) tests performed on the same data. chi squared test: counts: 22 21 18 23 19 22 20 16 34 17 19 15 18 15 14 22 25 24 18 18 diff from 20: 2 1 2 3 1 2 0 4 14 3 1 5 2 5 6 2 5 6 2 2 diff^2: 4 1 4 9 1 4 0 16 196 9 1 25 4 25 36 4 25 36 4 4 sum diff^2: 408 chi-squared statistic: 408/20 = 20.4 (far less than the 5% value of 30.14) Kolmogorov-Smirnov test: (Calculations have been run down the page because I can't fit 20 3 digit numbers, with spaces and labels across an 80-column screen). counts sum expected diff 22 22 20 2 21 43 40 3 18 61 60 1 23 84 80 4 19 103 100 3 22 125 120 5 20 145 140 5 16 161 160 1 34 195 180 15 <-- max diff, D, is 15. Well short of significance 17 212 200 12 at the 5% level. e.g. calc D/sqrt(n) = 15/20 19 231 220 11 or .75; where the 5% value from the simulations 15 246 240 6 is 1.14 18 264 260 4 15 279 280 1 14 293 300 7 22 315 320 5 25 340 340 0 24 364 360 4 18 382 380 2 18 400 400 0 ---------------------------------------------------------------------------- Conover, W.J. (1980): Practical nonparametric statistics, 2nd Ed., Wiley, New York. Neave, H.R. and Worthington, P.L.B. (1988): Distribution-free tests, Unwin Hyman, London.