Need Statistical Help!

Estu · #1 04-18-2013, 09:43 AM

So here's the deal. I've been collecting loot data on monsters in-game and updating the wiki with their info (this has been for low-level monsters so far since they are the fastest to kill). Here's an example of one of the monsters I have the most data for: http://wiki.project1999.org/A_Decaying_Dwarf_Skeleton

That page's loot data are from 408 kills (I was farming bone chips for faction). Usually I just collect 100 kills and move on to other monsters. Even for this page, though, we can see a big (at least relative to the percentages) difference in loot percentages in items that likely have the same actual probability of dropping: the 'common loot' likelihood of dropping varies from 0.2% to 2.2%, meaning that some items appear to drop ten times as often as others, though each item, in my opinion, probably just has a 1% chance of dropping, and I just happened to get more of some items than others.

I was thinking about this, and it occurred to me that the percentages I put up on the wiki might be misleading. Yeah, it's definitely better to have P1999 data than EQEmu data which is almost always wrong, but people might look at a list of loot data and get the wrong idea. For instance, say you want to farm bone chips. You look at the page for a decaying skeleton and see that they drop bone chips 72.9% of the time. Then you look at the page for a dwarf skeleton and see that they drop bone chips 67.9% of the time. Maybe you conclude that decaying skeletons drop them a little more often, so you should farm them (let's forget for now that decaying skeletons are actually easier to kill in large quantities than dwarf skeletons), even though the data is only based off of about 100 kills for each monster, meaning that we can't actually say with confidence that the real chance to drop bone chips is very close to the given percentages.

I ran some numbers the other day. If an item has a 30% chance to drop and you kill the monster that drops it 100 times, then with a likelihood of about 5%, you will find the item dropping below 20% of the time, and with a likelihood of about 5%, you will find it dropping over 40% of the time. So one out of ten such items you see on the wiki will be have its drop data off by over 10%. The way I got these numbers was arduous; I looked at the associated binomial distribution, calculated the probability of getting each number of drops between 20 and 40 (from 100 trials), and added them up.

Here's what I'm looking to do: if I kill a monster 'n' times and it drops some item 'k' times, I want to generate a 95% confidence interval for that item's actual drop rate, i.e. I want to say that there is a 95% chance that it drops between x% of the time and y% of the time. I haven't taken stats for a while so I don't know the best way to do this, and I worry that using smooth approximations of discrete distributions will give me intervals that are inaccurate (however, I also want to be able to do a lot of these computations (hundreds of monsters, each with over 100 kills) quickly). My assumptions are that 'n' is at least 100, and items usually drop at least 1% of the time.

There's also a slight complication: most of the time items will drop either not at all or just once, but sometimes (e.g. with bone chips), they may drop twice or more (spider silks may drop five times off of spiders in East Karana). The mechanism by which this happens, I believe, is that there is a set probability 'p' used for each one dropping, and it's tested however many times. So maybe bone chips have a 50% chance to drop in each of the two times they might drop, so 25% of the time we get no bone chips, 50% of the time we get one, and 75% of the time we get two. What my parser does is it doesn't give the actual probability 'p' of each one dropping, or of at least one dropping, but rather the expected (average) number of items dropped, since this is easy to compute and in my opinion the most useful piece of information. So in the previous case, each one has a 50% chance to drop and 75% of the time we get at least one bone chip, but the expected number of bone chips is 0*25% + 1*50% + 2*25% = 1, so the wiki would show a "likelihood" of 100%. These cases would, I'm assuming, complicate the confidence intervals, though most pieces of loot that drop can only drop once, so it's more straightforward.

Any stats nerds wanna help me out? Thanks!