To evaluate collation, one needs to be able to predict what a truly random product (box or pack) would look like. So far, I’ve talked about trying to calculate the odds of a group of cards being completely different. I’ve also looked at determining odds by listing all the possible outcomes. These methods work fine, but they are both a bit intensive for complex problems – like what are the odds of getting a complete 300-card base set from a box of 540 cards.

A possible solution is to determine the odds using a Monte Carlo method. A Monte Carlo calculation involves simulating an outcome many times (1,000 or 10,000 or more times). Each simulated outcome can then be checked and tabulated for a certain result. For example, of the 10,000 simulated openings of 540-card boxes, how often did the simulated box contain a complete 300-card base set?

OK, 10,000 simulations may not seem like an easier solution, but they key is to have a computer do all the work. You just need to tell the computer how many boxes to open and key parameters like the number of cards in a box and the size of the base set. Of course, you need to program the computer, but this kind of program is pretty simple.

Let’s go back to our standard question from the previous two collation posts – What is the probability of getting a complete Dugout Dirt set (four cards) from any box of 1994 Stadium Club? (Each box contains four Dugout Dirt insert cards.)

Running a simulation on 1994 Stadium Club involves the computer generating four random cards, each numbered between 1 and 4. The computer then checks to see if the four cards are all different, meaning a complete set has been pulled from the box. If so, the computer notes that a complete set was obtained and then goes to the next simulation.

After 100,000 simulations, the program I ran said that a complete set was pulled 9,289 times. That’s a 9.289% chance of pulling a complete set. The previous two posts found the chance to be 9.375%. Why aren’t they the same? Monte Carlo simulations tend to get better (closer to reality) with a larger number of simulations. (When I ran my program on 1,000,000 simulations, I got 9.2894%. Not so different.)

The great thing about Monte Carlo simulations is that you can check each simulation for all kinds of different situations. For example, you could check each simulation for having just one duplicate or two duplicates. Doing a calculation on one of these outcomes might not be easy, but to check it with a computer program is pretty simple.

Next week I will look at some of the boxes I’ve opened and see how they stack up. Are they randomly packed or artificially collated to minimize duplicates?