![]() |
#1
|
|||
|
![]() I started playing p1999 about six weeks ago, having left EQ in early 2005 after Arch Overseers largely collapsed due to the great OoW burnout, and expected to find a community that had min-maxed the system up to the hilt. But it's clear that a lot of the folklore and baseless dogma that characterised player knowledge back then still persists. One critical element in that seems to be the subject of charm duration. In this post I will set about deriving a proper statistical analysis of charm duration and provide some tools that can be used to test it. I'm not formally a statistician, though I use stats in my work and am pretty sure that the approach used here is correct. Still, if you spot any errors, please feel free to let me know.
The first step is to be absolutely clear about what we mean by 'charm duration'. It's generally accepted that the game's global timeline is divided up into 'ticks', each six seconds long, and that any decision about charm breaking is made at the boundary point between ticks. So we have the situation depicted in the diagram below: [You must be logged in to view images. Log in or Register.] The charm spell will land on a mob at some random time, which is unlikely to be exactly at a tick boundary. Providing it is not resisted there will be an initial period of less than 6 seconds before a charm-break test is applied. Subsequently, every six seconds, another test will be performed. As long as these tests keep being passed, the charm will continue, until finally a test is failed and the charm breaks. In my experience the initial cast of charm is only ever resisted if the mob is of a higher level than the spell is capable of charming. As shown by the diagram, a charm that lasts six ticks (plus the initial period) will reflect a sequence of six consecutive passes followed by one failure. In the rest of this discussion we will ignore the initial period, which is irrelevant to the analysis. In general, a charm that last n ticks will be the result of n charm-break successes. Important assumption: I assume that each test is performed independently. That is, the probability of success on each tick is determined without reference to the number of preceding successes. This is a reasonable assumption and simplifies things considerably. There may be a limiting factor in terms of maximum charm duration, this is unknown. The maximum charm duration I've seen so far is 818 seconds, which is over 13 and a half minutes. I expect most enchanters will be used to seeing charms last over ten minutes, such that tash wears off even though you reapply tash on each break. So to start off, we ask, "What is the probability that a charm will last n ticks?" This is actually a very simple question and relies on the underlying probability, p, of success on each individual tick. If we toss a coin, the chance of it coming up heads is 50% (p=0.5). What is the probability of it coming up heads each time if we toss it twice? The answer is to multiply the probabilities, so p = 0.5*0.5 = (0.5)^2 = 0.25. If you toss it twice, the chance of getting two heads is 1 in 4. Likewise, if you toss it three times, the chance of getting heads three times in a row will be (0.5)^3, 1 in 8, etc. So, The probability that a charm will last n ticks is p^n. This is an exponential curve, and matches what we see if we just dump a whole load of individual charms together and graph the distribution. I wrote a python program (attached below as CharmParsing.py) that parses a log file, extracts the duration of charms and performs some stats that will be explained later. Usage is simply py -m CharmParsing <path to log file> It will create a .csv file in the same location as the log that contains the stats and a list of individual charm durations in seconds and ticks. I took a log covering a bit over 2 weeks and fed it into the program, then graphed the duration of the 427 charms. This is uncontrolled data, so merely indicative, but it does show an exponential distribution. [You must be logged in to view images. Log in or Register.] At this point it's important to note that, given the exponential distribution, any talk of 'average' charm duration is both misleading and completely meaningless. The average only applies to samples that are drawn from a normally-distributed population, which is very much not the case here. If we're going to analyse charm duration we need a different statistic, we need an estimate of the underlying probability (p) of success on each tick. This can be performed by recognising that each tick during the charm duration represents a separate independent trial. A charm that lasts six ticks represents six successes and one failure. As long as we have enough data and they're properly controlled (i.e. on the same mob, with no changes that might alter the underlying probability) we can group all these trials together and use them to calculate the Wilson Score (see references below), a binomial statistic that provides an estimate of underlying probability at a specified confidence level: [You must be logged in to view images. Log in or Register.] where p0 = upper and lower bounds of the Wilson Score; n = total number of trials, p_hat (p with a circumflex) = number of successes/total number of trials, and c = the critical value corresponding to the level of confidence required (1.96 for 95% confidence). This is used in CharmParsing.py to report a central probability along with its upper and lower bounds. We're now getting somewhere. We can generate estimates of the underlying probability of charm success on each tick. But what we really want is a way to compare these values between different conditions (i.e. changes in level-difference, CHA and magic resist). We could just see if the ranges given by the Wilson Score overlap, but in some cases that might produce a Type II error (false-negative). A better method is to compare the difference between central probabilities (p2-p1) to the Newcombe-Wilson difference interval (also covered in the references below). The test is given by: [You must be logged in to view images. Log in or Register.] where p1 = central probability for condition1, w(1,-) = lower bound to the Wilson Score for condition 1, w(1,+) = upper bound to the Wilson Score for condition1, etc This is performed by CharmDiff.py, also attached. To use this, cut your log file up into separate segments, each corresponding to one controlled value for the condition being tested. Usage is py -m CharmDiff <95¦99> <path to first log file> <path to second log file> The first argument must be 95 or 99, the level of confidence which you wish to use. It will print results to the terminal, but this can be redirected to a file using the > operator. We now have the tools needed to investigate charm duration in a meaningful manner, and the following posts will show some results I've gathered. I strongly encourage anyone interested to try this for themselves and post the results they get. Any scientific data are only meaningful to the extent that they can be replicated. If you're looking to compare conditions, just make sure that the logs used are properly controlled. The code used only works for L12 Charm, Beguile and Cajoling Whispers, because I'm level 51 and don't have Allure. If you want to include those, it's easy to add them to the regexps and cast duration constants at the start of the files. It's important to note that anecdotal evidence is completely worthless here. We've all had pets that seem unusually unruly ('OMG I just recharmed the bloody thing and it's broken again!'), but that means nothing. The First Rule of Statistics is: Shit Happens. My personal gut feeling is that charm breaks are more likely to happen when I've taken my hand off the keyboard to pick up a drink, but I haven't worked out a way to test that reliably yet... If you want to generate useful data you need to keep the conditions controlled and record a sufficiently large number of trials - I would recommend at least an hour or so's worth, i.e. 600 or more trials. Q: You use big wurds and sumz! Y u do sumz? Dey make brane hurt! A: You've been using Illusion:Troll too much. Here's 10pp, go buy yourself a drink. The following posts will concern different conditions that are reputed to affect charm duration. If you can't wait, I'm coming to the conclusion that charm success per tick has a fixed probability close to 0.98. References: The Wilson Confidence Interval for a Proportion Binomial Confidence Intervals and Contingency Tests Interval estimation for the difference between independent proportions Plotting the Newcombe-Wilson distribution Sean Wallis, Statistics in Corpus Linguistics Research, 2021
__________________
_____
Green: Feressa | ||
#2
|
|||
|
![]() The topic that's generated the most confusion seems to be the importance of the CHA stat.
I'm going to start off here by saying that it is, as a general principle, impossible to prove a negative. I can't prove to you that CHA has no effect on charm duration. The way statistics works is to start with a null hypothesis (that there is no difference between two conditions) and test whether the evidence shows that you can reject that hypothesis with a specified level of confidence. I.e. you are testing whether or not you can prove a positive effect. TLDR, the result is: Unable to reject the null hypothesis that CHA has no effect at the 95% level. To test this I grabbed a Greater Spurbone in Emerald Jungle and parked it on the south wall. I was level 50, the mob was blue-con, resisted Beguile twice and hit for an observed max of 100, so probably level 38-39. I took off all gear with any CHA on it to take my CHA stat down to its base of 115 (because I followed the folklore and made my character according to the guide …) and proceeded to recharm on each break for around 2 hours. I then put on all the CHA gear I could find, applied the CHA buff to take my CHA up to 226 and repeated the process for another 2 hours or so. This resulted in around 1200 individual tick trials for each condition, and the results are given below: Input data 1: File: L50 EJ Gt Spurbone CHA 115.txt Total trials: 1157 p charm success (per tick): 0.9742 Wilson Score lower bound: 0.9647 Wilson Score upper bound: 0.9836 Input data 2: File: L50 EJ Gt Spurbone CHA 226.txt Total trials: 1294 p charm success (per tick): 0.9847 Wilson Score lower bound: 0.9776 Wilson Score upper bound: 0.9915 probability difference: 0.0104 Newcombe-Wilson difference interval: -0.0117, 0.0117 Not significant at 95% level To illustrate this further, here's a diagram showing the extent to which the two Wilson Scores overlap: [You must be logged in to view images. Log in or Register.] Now I know that some will be tempted to carp that the increased CHA did show a minor increase in the central success probability number. Unfortunately this fails to appreciate the actual nature of the Wilson Score. The actual probability for each condition may lie at any point within the upper and lower bounds, and these overlap substantially. Furthermore, 95% confidence (2 sigma) is pretty weak-sauce as far as confidence goes and represents the lowest level at which we can start talking about any real difference. One factor that might come into play is an adjustment called continuity correction. This is employed to ensure that probabilities stay within the 0-1 bounds and becomes more important as the probabilities approach closer to 0 or 1 (as seen here). The programs use continuity correction in their default state, but this does lead to slightly wider Wilson Score intervals. I turned continuity correction off (you just need to change a boolean constant in the code), but it still failed to produce a significant result. My personal feeling is that the results for the 115 CHA condition just happen to be on the lower part of the range - all my preliminary results were closer to 0.98. But it doesn't really matter, the statistics involved are capable of taking that into account, and do so here. As I said above, I encourage you to test this for yourselves. Getting the data is rather boring, but the more data the better. How does this affect standard guidelines? Newbie enchanters should put their points into STR, so they can carry more fine steel back to town to sell.
__________________
_____
Green: Feressa | ||
#3
|
|||
|
![]() What about level? Surely the difference in level affects charm duration?
Result: Unable to reject the null hypothesis that level difference has no effect at the 95% level. A few days before performing the experiment with CHA I went to GFay and faced off against the mighty level 2 orc pawn (confirmed white-con to a level 2 player). This test was also performed with a CHA of 115. I was 48 at the time, so the difference was 46 levels, as opposed to 11-12 levels against the greater spurbone in EJ. Input data 1: File: L50 EJ Gt Spurbone CHA 115.txt Total trials: 1157 p charm success (per tick): 0.9742 Wilson Score lower bound: 0.9647 Wilson Score upper bound: 0.9836 Input data 2: File: orc_pawn_CHA115.txt Total trials: 2053 p charm success (per tick): 0.9840 Wilson Score lower bound: 0.9784 Wilson Score upper bound: 0.9895 probability difference: 0.0098 Newcombe-Wilson difference interval: -0.0110, 0.0109 Not significant at 95% level There is one possibility that is not tested here. It may be that level difference only has a manifest effect when it is very small, possibly via an exponential factor that has saturated by the time level difference is 10 or more. Unfortunately I'm unable to test this on my own. If you know of a healer or another high-level enchanter who's willing to spend hours doing nothing other than help getting a L50 mob under control again on breaks, then drop me a line. But be warned: doing this for several hours in a row is not the most exciting experience.
__________________
_____
Green: Feressa | ||
#4
|
|||
|
![]() Finally, what about magic resist? Does that have an effect on charm duration?
I haven't formally tested this yet. This is largely because, as mentioned earlier, I frequently notice charms that last so long that tash wears off, which obviously will have a confounding effect on the succeeding trials, and I'm not sure how to handle these instances. But frankly, the question of magic resist is largely moot anyway as we know for sure that MR is important in handling charm breaks. When charm breaks you go through the stun-L4mez-reTash-reCharm cycle (strung together with the clicky exploit) and landing the stun and mez are essential components in making that happen smoothly. Successful charming means successful handling of charm breaks, and keeping the mob's MR low is a critical factor in that. So those Rusty Spiked Shoulderpads are indeed a useful addition, just not in terms of increasing charm duration.
__________________
_____
Green: Feressa | ||
#5
|
||||
|
![]() Quote:
| |||
#6
|
|||
|
![]() Charms should only last 8 minuets (max), mountains of proof:
https://project1999.com/forums/showp...3&postcount=81 I hope OP's science (that is above my pay grade) and this can put a rest to the debate and crush a bunch of people's joy [You must be logged in to view images. Log in or Register.] | ||
#7
|
||||
|
![]() Very nice work. It'll take me a couple readings to fully grasp, but on a preliminary basis your conclusions look well-founded. A couple initial comments:
You suggest that charisma does not impact charm duration, and therefore is overvalued as a stat. I don't play an enchanter, but my understanding has always been that high charisma is mainly valued for the impact on the lull line of spells, not charms. Quote:
The test comparing durations for the orc pawn versus the EJ skeleton is interesting. I would expect that a 46-level difference would completely saturate any level-dependent effect, but a 10-12 level difference I would have expected to be small enough to show an impact if there was one. I'm happy to volunteer two hours supporting further testing with either a 60 druid or 55 cleric. | |||
#8
|
|||
|
![]() the AI is getting pretty good
also thats a lot of words to say (your level vs mob level) > MR > 255 cha...something that has been known for decades now and to suggest enc start STR is absurd...even if its a trivial difference in charm durations ill still take it to decrease lull crit fails | ||
Last edited by kjs86z2; 04-15-2025 at 08:22 AM..
|
#9
|
||||
|
![]() From loraen's ench guide:
Quote:
Anecdotally I think MR makes a huge difference too because sometimes I go to Lguk or other lesser zones and I'll just pop a charm on a mob without tashing to trash a camp real quick and those charms seem be very short more often than not even though the mobs cap at like level 40. And yes, cha is vital for lull crit fails and if you do any sort of dungeon crawls, lulls will often be your most casted spell. In some tricky places I've used lull crit fails to pull and stripping down all my +cha only pieces (-50ish) and taking off cha buffs made the process much quicker. | |||
Last edited by Goregasmic; 04-15-2025 at 12:08 PM..
|
#10
|
||||||
|
![]() Quote:
Quote:
The folklore that's built up over charm duration is founded on poor data and faulty analysis. Quote:
I'm only looking at charm duration here. Resists on lull spells are obviously a completely different topic, but one that would be worth investigating properly at a later date. As an aside, ironically I did notice a resist on the initial cast of Cajoling Whispers last night, so it can happen. It does seem very rare though, and I have to wonder if this shares the same 98% success rate that's applied to charm break rolls.
__________________
_____
Green: Feressa | |||||
![]() |
Thread Tools | |
Display Modes | |
|
|