Project 1999

Go Back   Project 1999 > Class Discussions > Casters

Reply
 
Thread Tools Display Modes
  #1  
Old 04-14-2025, 02:38 PM
charleski charleski is offline
Large Rat


Join Date: Feb 2025
Posts: 9
Default A Statistical analysis of Charm Duration

I started playing p1999 about six weeks ago, having left EQ in early 2005 after Arch Overseers largely collapsed due to the great OoW burnout, and expected to find a community that had min-maxed the system up to the hilt. But it's clear that a lot of the folklore and baseless dogma that characterised player knowledge back then still persists. One critical element in that seems to be the subject of charm duration. In this post I will set about deriving a proper statistical analysis of charm duration and provide some tools that can be used to test it. I'm not formally a statistician, though I use stats in my work and am pretty sure that the approach used here is correct. Still, if you spot any errors, please feel free to let me know.

The first step is to be absolutely clear about what we mean by 'charm duration'.
It's generally accepted that the game's global timeline is divided up into 'ticks', each six seconds long, and that any decision about charm breaking is made at the boundary point between ticks. So we have the situation depicted in the diagram below:
[You must be logged in to view images. Log in or Register.]


The charm spell will land on a mob at some random time, which is unlikely to be exactly at a tick boundary. Providing it is not resisted there will be an initial period of less than 6 seconds before a charm-break test is applied. Subsequently, every six seconds, another test will be performed. As long as these tests keep being passed, the charm will continue, until finally a test is failed and the charm breaks. In my experience the initial cast of charm is only ever resisted if the mob is of a higher level than the spell is capable of charming.

As shown by the diagram, a charm that lasts six ticks (plus the initial period) will reflect a sequence of six consecutive passes followed by one failure. In the rest of this discussion we will ignore the initial period, which is irrelevant to the analysis. In general, a charm that last n ticks will be the result of n charm-break successes.

Important assumption: I assume that each test is performed independently. That is, the probability of success on each tick is determined without reference to the number of preceding successes. This is a reasonable assumption and simplifies things considerably. There may be a limiting factor in terms of maximum charm duration, this is unknown. The maximum charm duration I've seen so far is 818 seconds, which is over 13 and a half minutes. I expect most enchanters will be used to seeing charms last over ten minutes, such that tash wears off even though you reapply tash on each break.

So to start off, we ask, "What is the probability that a charm will last n ticks?" This is actually a very simple question and relies on the underlying probability, p, of success on each individual tick. If we toss a coin, the chance of it coming up heads is 50% (p=0.5). What is the probability of it coming up heads each time if we toss it twice? The answer is to multiply the probabilities, so p = 0.5*0.5 = (0.5)^2 = 0.25. If you toss it twice, the chance of getting two heads is 1 in 4. Likewise, if you toss it three times, the chance of getting heads three times in a row will be (0.5)^3, 1 in 8, etc. So,
The probability that a charm will last n ticks is p^n.

This is an exponential curve, and matches what we see if we just dump a whole load of individual charms together and graph the distribution. I wrote a python program (attached below as CharmParsing.py) that parses a log file, extracts the duration of charms and performs some stats that will be explained later. Usage is simply
py -m CharmParsing <path to log file>
It will create a .csv file in the same location as the log that contains the stats and a list of individual charm durations in seconds and ticks. I took a log covering a bit over 2 weeks and fed it into the program, then graphed the duration of the 427 charms. This is uncontrolled data, so merely indicative, but it does show an exponential distribution.
[You must be logged in to view images. Log in or Register.]

At this point it's important to note that, given the exponential distribution, any talk of 'average' charm duration is both misleading and completely meaningless. The average only applies to samples that are drawn from a normally-distributed population, which is very much not the case here. If we're going to analyse charm duration we need a different statistic, we need an estimate of the underlying probability (p) of success on each tick.

This can be performed by recognising that each tick during the charm duration represents a separate independent trial. A charm that lasts six ticks represents six successes and one failure. As long as we have enough data and they're properly controlled (i.e. on the same mob, with no changes that might alter the underlying probability) we can group all these trials together and use them to calculate the Wilson Score (see references below), a binomial statistic that provides an estimate of underlying probability at a specified confidence level:
[You must be logged in to view images. Log in or Register.]

where p0 = upper and lower bounds of the Wilson Score; n = total number of trials, p_hat (p with a circumflex) = number of successes/total number of trials, and c = the critical value corresponding to the level of confidence required (1.96 for 95% confidence).
This is used in CharmParsing.py to report a central probability along with its upper and lower bounds.

We're now getting somewhere. We can generate estimates of the underlying probability of charm success on each tick. But what we really want is a way to compare these values between different conditions (i.e. changes in level-difference, CHA and magic resist). We could just see if the ranges given by the Wilson Score overlap, but in some cases that might produce a Type II error (false-negative). A better method is to compare the difference between central probabilities (p2-p1) to the Newcombe-Wilson difference interval (also covered in the references below).
The test is given by:
[You must be logged in to view images. Log in or Register.]
where p1 = central probability for condition1, w(1,-) = lower bound to the Wilson Score for condition 1, w(1,+) = upper bound to the Wilson Score for condition1, etc

This is performed by CharmDiff.py, also attached. To use this, cut your log file up into separate segments, each corresponding to one controlled value for the condition being tested. Usage is
py -m CharmDiff <95¦99> <path to first log file> <path to second log file>
The first argument must be 95 or 99, the level of confidence which you wish to use. It will print results to the terminal, but this can be redirected to a file using the > operator.

We now have the tools needed to investigate charm duration in a meaningful manner, and the following posts will show some results I've gathered. I strongly encourage anyone interested to try this for themselves and post the results they get. Any scientific data are only meaningful to the extent that they can be replicated. If you're looking to compare conditions, just make sure that the logs used are properly controlled. The code used only works for L12 Charm, Beguile and Cajoling Whispers, because I'm level 51 and don't have Allure. If you want to include those, it's easy to add them to the regexps and cast duration constants at the start of the files.

It's important to note that anecdotal evidence is completely worthless here. We've all had pets that seem unusually unruly ('OMG I just recharmed the bloody thing and it's broken again!'), but that means nothing. The First Rule of Statistics is: Shit Happens. My personal gut feeling is that charm breaks are more likely to happen when I've taken my hand off the keyboard to pick up a drink, but I haven't worked out a way to test that reliably yet... If you want to generate useful data you need to keep the conditions controlled and record a sufficiently large number of trials - I would recommend at least an hour or so's worth, i.e. 600 or more trials.

Q: You use big wurds and sumz! Y u do sumz? Dey make brane hurt!
A: You've been using Illusion:Troll too much. Here's 10pp, go buy yourself a drink.

The following posts will concern different conditions that are reputed to affect charm duration. If you can't wait, I'm coming to the conclusion that charm success per tick has a fixed probability close to 0.98.

References:
The Wilson Confidence Interval for a Proportion
Binomial Confidence Intervals and Contingency Tests
Interval estimation for the difference between independent proportions
Plotting the Newcombe-Wilson distribution
Sean Wallis, Statistics in Corpus Linguistics Research, 2021
Attached Images
File Type: jpg CharmDuration.jpg (29.7 KB, 167 views)
File Type: jpg Hist-charmDurations.jpg (40.8 KB, 167 views)
File Type: png Wilson Score.png (10.5 KB, 175 views)
File Type: jpg New-Wil Test.jpg (50.3 KB, 75 views)
Attached Files
File Type: zip CharmPrograms.zip (7.3 KB, 0 views)
__________________
_____
Green: Feressa
Reply With Quote
  #2  
Old 04-14-2025, 02:39 PM
charleski charleski is offline
Large Rat


Join Date: Feb 2025
Posts: 9
Default

The topic that's generated the most confusion seems to be the importance of the CHA stat.
I'm going to start off here by saying that it is, as a general principle, impossible to prove a negative. I can't prove to you that CHA has no effect on charm duration. The way statistics works is to start with a null hypothesis (that there is no difference between two conditions) and test whether the evidence shows that you can reject that hypothesis with a specified level of confidence. I.e. you are testing whether or not you can prove a positive effect.

TLDR, the result is:
Unable to reject the null hypothesis that CHA has no effect at the 95% level.

To test this I grabbed a Greater Spurbone in Emerald Jungle and parked it on the south wall. I was level 50, the mob was blue-con, resisted Beguile twice and hit for an observed max of 100, so probably level 38-39. I took off all gear with any CHA on it to take my CHA stat down to its base of 115 (because I followed the folklore and made my character according to the guide …) and proceeded to recharm on each break for around 2 hours. I then put on all the CHA gear I could find, applied the CHA buff to take my CHA up to 226 and repeated the process for another 2 hours or so. This resulted in around 1200 individual tick trials for each condition, and the results are given below:
Input data 1:
File: L50 EJ Gt Spurbone CHA 115.txt
Total trials: 1157
p charm success (per tick): 0.9742
Wilson Score lower bound: 0.9647
Wilson Score upper bound: 0.9836
Input data 2:
File: L50 EJ Gt Spurbone CHA 226.txt
Total trials: 1294
p charm success (per tick): 0.9847
Wilson Score lower bound: 0.9776
Wilson Score upper bound: 0.9915

probability difference: 0.0104

Newcombe-Wilson difference interval: -0.0117, 0.0117
Not significant at 95% level

To illustrate this further, here's a diagram showing the extent to which the two Wilson Scores overlap:
[You must be logged in to view images. Log in or Register.]

Now I know that some will be tempted to carp that the increased CHA did show a minor increase in the central success probability number. Unfortunately this fails to appreciate the actual nature of the Wilson Score. The actual probability for each condition may lie at any point within the upper and lower bounds, and these overlap substantially. Furthermore, 95% confidence (2 sigma) is pretty weak-sauce as far as confidence goes and represents the lowest level at which we can start talking about any real difference.

One factor that might come into play is an adjustment called continuity correction. This is employed to ensure that probabilities stay within the 0-1 bounds and becomes more important as the probabilities approach closer to 0 or 1 (as seen here). The programs use continuity correction in their default state, but this does lead to slightly wider Wilson Score intervals. I turned continuity correction off (you just need to change a boolean constant in the code), but it still failed to produce a significant result.

My personal feeling is that the results for the 115 CHA condition just happen to be on the lower part of the range - all my preliminary results were closer to 0.98. But it doesn't really matter, the statistics involved are capable of taking that into account, and do so here.

As I said above, I encourage you to test this for yourselves. Getting the data is rather boring, but the more data the better.

How does this affect standard guidelines? Newbie enchanters should put their points into STR, so they can carry more fine steel back to town to sell.
Attached Images
File Type: jpg CHA_test.jpg (24.8 KB, 164 views)
__________________
_____
Green: Feressa
Reply With Quote
  #3  
Old 04-14-2025, 02:40 PM
charleski charleski is offline
Large Rat


Join Date: Feb 2025
Posts: 9
Default

What about level? Surely the difference in level affects charm duration?

Result:
Unable to reject the null hypothesis that level difference has no effect at the 95% level.

A few days before performing the experiment with CHA I went to GFay and faced off against the mighty level 2 orc pawn (confirmed white-con to a level 2 player). This test was also performed with a CHA of 115. I was 48 at the time, so the difference was 46 levels, as opposed to 11-12 levels against the greater spurbone in EJ.

Input data 1:
File: L50 EJ Gt Spurbone CHA 115.txt
Total trials: 1157
p charm success (per tick): 0.9742
Wilson Score lower bound: 0.9647
Wilson Score upper bound: 0.9836
Input data 2:
File: orc_pawn_CHA115.txt
Total trials: 2053
p charm success (per tick): 0.9840
Wilson Score lower bound: 0.9784
Wilson Score upper bound: 0.9895

probability difference: 0.0098

Newcombe-Wilson difference interval: -0.0110, 0.0109
Not significant at 95% level

There is one possibility that is not tested here. It may be that level difference only has a manifest effect when it is very small, possibly via an exponential factor that has saturated by the time level difference is 10 or more. Unfortunately I'm unable to test this on my own. If you know of a healer or another high-level enchanter who's willing to spend hours doing nothing other than help getting a L50 mob under control again on breaks, then drop me a line. But be warned: doing this for several hours in a row is not the most exciting experience.
__________________
_____
Green: Feressa
Reply With Quote
  #4  
Old 04-14-2025, 02:40 PM
charleski charleski is offline
Large Rat


Join Date: Feb 2025
Posts: 9
Default

Finally, what about magic resist? Does that have an effect on charm duration?

I haven't formally tested this yet. This is largely because, as mentioned earlier, I frequently notice charms that last so long that tash wears off, which obviously will have a confounding effect on the succeeding trials, and I'm not sure how to handle these instances.

But frankly, the question of magic resist is largely moot anyway as we know for sure that MR is important in handling charm breaks. When charm breaks you go through the stun-L4mez-reTash-reCharm cycle (strung together with the clicky exploit) and landing the stun and mez are essential components in making that happen smoothly. Successful charming means successful handling of charm breaks, and keeping the mob's MR low is a critical factor in that.

So those Rusty Spiked Shoulderpads are indeed a useful addition, just not in terms of increasing charm duration.
__________________
_____
Green: Feressa
Reply With Quote
  #5  
Old 04-14-2025, 03:00 PM
Jimjam Jimjam is offline
Planar Protector


Join Date: Jul 2013
Posts: 12,235
Default

Quote:
Originally Posted by charleski [You must be logged in to view images. Log in or Register.]
I haven't formally tested this yet. This is largely because, as mentioned earlier, I frequently notice charms that last so long that tash wears off, which obviously will have a confounding effect on the succeeding trials, and I'm not sure how to handle these instances.
If you duel a conspirator they can refresh tash on your pet before it fades, removing that confounding effect.
Reply With Quote
  #6  
Old 04-14-2025, 03:26 PM
shovelquest shovelquest is offline
Planar Protector


Join Date: Oct 2019
Posts: 2,998
Default

Charms should only last 8 minuets (max), mountains of proof:

https://project1999.com/forums/showp...3&postcount=81

I hope OP's science (that is above my pay grade) and this can put a rest to the debate and crush a bunch of people's joy [You must be logged in to view images. Log in or Register.]
Reply With Quote
  #7  
Old 04-14-2025, 03:45 PM
bcbrown bcbrown is offline
Fire Giant


Join Date: Jul 2022
Posts: 523
Default

Very nice work. It'll take me a couple readings to fully grasp, but on a preliminary basis your conclusions look well-founded. A couple initial comments:

You suggest that charisma does not impact charm duration, and therefore is overvalued as a stat. I don't play an enchanter, but my understanding has always been that high charisma is mainly valued for the impact on the lull line of spells, not charms.

Quote:
Originally Posted by charleski [You must be logged in to view images. Log in or Register.]
I haven't formally tested this yet. This is largely because, as mentioned earlier, I frequently notice charms that last so long that tash wears off, which obviously will have a confounding effect on the succeeding trials, and I'm not sure how to handle these instances.
Since every tick is an independent trial, when parsing the log couldn't you track the log entry for tash wearing off and group subsequent trials separately from tash-active trials?

The test comparing durations for the orc pawn versus the EJ skeleton is interesting. I would expect that a 46-level difference would completely saturate any level-dependent effect, but a 10-12 level difference I would have expected to be small enough to show an impact if there was one. I'm happy to volunteer two hours supporting further testing with either a 60 druid or 55 cleric.
Reply With Quote
  #8  
Old 04-15-2025, 08:19 AM
kjs86z2 kjs86z2 is online now
Aviak


Join Date: Jun 2019
Posts: 92
Default

the AI is getting pretty good

also thats a lot of words to say (your level vs mob level) > MR > 255 cha...something that has been known for decades now

and to suggest enc start STR is absurd...even if its a trivial difference in charm durations ill still take it to decrease lull crit fails
Last edited by kjs86z2; 04-15-2025 at 08:22 AM..
Reply With Quote
  #9  
Old 04-15-2025, 12:01 PM
Goregasmic Goregasmic is online now
Kobold

Goregasmic's Avatar

Join Date: Jan 2024
Posts: 198
Default

From loraen's ench guide:

Quote:
Of course, we obviously want charm to last as long as possible, and the main determinants of charm duration are Mob Level, Magic Resistance, and Charisma, in that order. Therefore your first choice is what level of pet to charm. At L60, I would guess that a L46 froglok dar knight will stay charmed about five times longer than a L53 froglok ilis knight. If you are getting too many charm breaks, go with lower level mobs. For example, there are two ways to solo the necrosis scarab camp: the fast way is to charm the krup roamer, but there is no shame in bringing one of the lower level frogs over from across the moat. For reasonable charm durations my recommendation is to charm something 0.75-0.85 times your level; at L60 this works out to L45-50; 51 is possible but risky and 52-53 is almost certainly going to be a short trip without malo.
I think this is based on findings in this thread from 2013 where lvl 53 mobs charm time averaged around 90 seconds. Anyone who charmed anything will know that this is pretty short. Spent 56-58 in frenzy and the few times I tried keeping a pet around for as long as possible I've had some hit the max duration of allure which was about 18m30s (I set a gina countdown timer for 18mins when I start casting charms). So mob level absolutely has an impact post 50 it seems. This thread also tries to demonstrate the usefulness of charisma at higher levels but it is hard to get a good sample solo when ilis frogs double for 400. Also, it is important to keep in mind cha has diminishing returns after 200.

Anecdotally I think MR makes a huge difference too because sometimes I go to Lguk or other lesser zones and I'll just pop a charm on a mob without tashing to trash a camp real quick and those charms seem be very short more often than not even though the mobs cap at like level 40.

And yes, cha is vital for lull crit fails and if you do any sort of dungeon crawls, lulls will often be your most casted spell. In some tricky places I've used lull crit fails to pull and stripping down all my +cha only pieces (-50ish) and taking off cha buffs made the process much quicker.
Last edited by Goregasmic; 04-15-2025 at 12:08 PM..
Reply With Quote
  #10  
Old 04-15-2025, 12:45 PM
charleski charleski is offline
Large Rat


Join Date: Feb 2025
Posts: 9
Default

Quote:
Originally Posted by shovelquest [You must be logged in to view images. Log in or Register.]
Charms should only last 8 minuets (max), mountains of proof:

https://project1999.com/forums/showp...3&postcount=81

I hope OP's science (that is above my pay grade) and this can put a rest to the debate and crush a bunch of people's joy [You must be logged in to view images. Log in or Register.]
The fact is, however, that you can expect to see a charm lasting over ten minutes at least once in the course of a night. I don't know if there's a hard limit in the code, but obviously very long charms are quite rare given the exponential distribution.

Quote:
Originally Posted by kjs86z2 [You must be logged in to view images. Log in or Register.]
the AI is getting pretty good

also thats a lot of words to say (your level vs mob level) > MR > 255 cha...something that has been known for decades now

and to suggest enc start STR is absurd...even if its a trivial difference in charm durations ill still take it to decrease lull crit fails
I'm saying exactly the opposite, didn't you notice? [You must be logged in to view images. Log in or Register.]
The folklore that's built up over charm duration is founded on poor data and faulty analysis.

Quote:
Originally Posted by Goregasmic [You must be logged in to view images. Log in or Register.]

I think this is based on findings in this thread from 2013 where lvl 53 mobs charm time averaged around 90 seconds. Anyone who charmed anything will know that this is pretty short. Spent 56-58 in frenzy and the few times I tried keeping a pet around for as long as possible I've had some hit the max duration of allure which was about 18m30s (I set a gina countdown timer for 18mins when I start casting charms). So mob level absolutely has an impact post 50 it seems. This thread also tries to demonstrate the usefulness of charisma at higher levels but it is hard to get a good sample solo when ilis frogs double for 400. Also, it is important to keep in mind cha has diminishing returns after 200.

Anecdotally I think MR makes a huge difference too because sometimes I go to Lguk or other lesser zones and I'll just pop a charm on a mob without tashing to trash a camp real quick and those charms seem be very short more often than not even though the mobs cap at like level 40.

And yes, cha is vital for lull crit fails and if you do any sort of dungeon crawls, lulls will often be your most casted spell. In some tricky places I've used lull crit fails to pull and stripping down all my +cha only pieces (-50ish) and taking off cha buffs made the process much quicker.
Maybe I wasn't clear enough in my first post. Average charm duration is completely meaningless. The concept of a mean only works for normally-distributed data. A lot of data does come from a normally-distributed population, but in this case that simply isn't true. For exponential data like this you can use the median instead, but that has very limited utility and generally doesn't allow the application of statistical tests. You really need to perform a deeper analysis of the data and extract an estimate of the core underlying probability.

I'm only looking at charm duration here. Resists on lull spells are obviously a completely different topic, but one that would be worth investigating properly at a later date.

As an aside, ironically I did notice a resist on the initial cast of Cajoling Whispers last night, so it can happen. It does seem very rare though, and I have to wonder if this shares the same 98% success rate that's applied to charm break rolls.
__________________
_____
Green: Feressa
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 08:50 AM.


Everquest is a registered trademark of Daybreak Game Company LLC.
Project 1999 is not associated or affiliated in any way with Daybreak Game Company LLC.
Powered by vBulletin®
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.