Statistics for the EQ parsers

 

  1. Introduction.

For years now, parsing has been the only means for players to get some grip on the actual mechanics of EQ. The effects of most skills and stats are difficult to see in-game because they are clouded by the RNG, and SOE isn’t forthcoming about them either. Even so, conclusions drawn from these parses must be considered carefully, as is the case with all numbers subject to statistical variation. All too often conclusions are drawn prematurely because the statistical uncertainty of the numbers involved are not considered. The word ‘proven’ is used too often too hastily.

Parsed numbers without any reference to the statistical variation are particularly dangerous. If someone claims weapon A is better then weapon B because he parsed A at 55 dps and B at 52 dps you can’t be sure those numbers mean anything until you know how and for how long he parsed them. Popular parse programs like Yalp or EQCompanion do not consider the statistics and sometimes people just jot down the first dps number they’re given by these programs without thinking twice about what that number really means.

The statistical uncertainty of a parsed number is easy to understimate. I realize that not many people will actually be able to calculate the uncertainty of their parses, but all parses should at least include an indication of the sample size or the parse time so that others can at least check the conclusions drawn from them.

For those who want to put a little accuracy in their parses (pun intended) I wrote this document showing how to estimate the uncertainty of commonly parsed skills and stats. This writeup is not intended for the layman, you’ll need to have some grasp of the mechanics of the game. If the mathematical derivations are too heavy for you, you can skip them and just use the final results. I’m starting off with a little introduction to binomial statistics in Chapter 2. Then I’ll derive equations for the statistical error of commonly parsed quantities such as hitratios, dualwield- and double attack rates, proc rates, avoidance and mitigation, and dps in Chapters 3,4 and 5. In Chapter 6 I’ll deal with some additional subtleties.

 

2. Binomial statistics.

Strictly speaking, almost all random numbers in EQ follow binomial statistics (the one notable exception being the average hit per damage, more about that later). In binomial statistics, if the chance for a single succesful trial (e.g. a single succesful dualwield check or hit/miss check) is p, then the chance P(n) of getting n successes in N trials is:

, (1)


where

. (2)

Fortunately in almost all parses the binomial distribution can be approximated by its normal distribution, i.e. the standard bell-curve, with its maximum at n = Np and standard deviation . As long as p-2s/N > 0 and p+2s/N < 1 then there’s little error in this approximation (which is the error in the error of the thing we’re trying to determine and therefore not nearly as interesting in obtaining an estimate of the error itself). Anyway this means that the apparent or parsed value of p, p* = n/N, is given by

(3)


with

(4)


the standard error or 95% confidence interval. Hence

. (5)
and

. (6)


Its easy to forget that the standard error is always an estimate because p has to be substituted with p* in the error term. Also the superscripts are often left out and the same symbol is used for both the real and the observed value leading to some confusion. It’s an important distinction however; for example when someone says "my hitratio is 0.56" what he really means is "my observed hitratio is 0.56, but the real hitratio, with 95% confidence, is somewhere between 0.56± 0.03"

 

3. Case specific examples of commonly parsed skills and stats.

As I said, most random numbers in EQ are binomial in nature, but can almost always be approximated with the normal distribution. It’s really just a matter of picking the right n and N and plugging them into Eq.(5) or (6).

3.1. Hitratio.

There are two ways to define hitratio. The usual way is as the fraction of all swings that are hits, NH/NS, with NH the number of swings and NS the number of swings. Alternatively you can define hitratio as NH/(NH+NM), with NM the number of misses. They are different only when you are facing the mob because some swings will get riposted, dodged, parried or blocked. When you are parsing against a mob in the ‘rogue’ position, such as when you’re parsing against the Katta banker, they are same. Also NH/(NH+NM) is the same regardless if you’re tanking or in rogue position, because the d/r/p/b checks are made before the hit/miss check. Therefore the hit/miss check is always made against NH+NM, and N in Eq.(5) and (6) is equal to that whether you’re tanking or not. I.o.w. the apparent hitratio hr* = NH/NS and

. (7)


Practically, there is little harm in replacing NH+NM with NS, it’ll result in an error in the error of only 5-10%. I have a reason for describing all errors in terms of NS, which will become apparent in Chapter 5.

3.2. Double attack rate.

When parsing double attack rate, n is equal to the number of combat rounds with a double attack NDA = NS-NC, and N is the total number of combat rounds NC. Both have to be extracted from the total number of swings NS first which can only be done if the mainhand and offhand weapon are of a different type (or when you’re using a 2H weapon) and hasted delay is more then 1.0s. We then have

(8)
and

. (9)


In this case its really just a matter of counting swings and combat rounds. Hit or miss doesn’t matter at all. And since the da check is always made as the combat round starts, things like combat casting or getting out of range doesn’t affect the parsing either, meaning the da rate can be parsed just as easily during xp sessions then against static mobs.

3.3. Dualwield rate.

When parsing dualwield rates, n is the observed number of combat rounds with your offhand weapon and N is the theoretical maximum number of combat rounds with your offhand weapon based on its delay. This is not a handy way to parse dualwield rates though. It only works if you’re parsing one continous fight against a mob that doesn’t fight back like the Katta banker. Combat casting for instance resets the timers and throws off the analysis. A better way is to use two weapons with the same delay, so that the dualwield check is always made at the same time as the mainhand combat round. Combat casting, getting stunned or out of range will have no effect on the analysis then. The apparent dualwield rate can then be found from

. (10)


However, the number of swings in both hands also incorporates double attacks from both hands. And while these cancel out in the division, the statistical variation they bring with them does not; in fact they enhance each other. So this method does give a bit more variation then you’d expect. The errors add up harmonically, that is, by taking the square root of the sum of the squares of the relative errors. There are 3 sources of error, and if you use typical numbers for the da rate and dw rate, you’ll find that they’re all of the same magnitude so the error can be approximated by

. (11)


The additional errors can be avoided by actually counting the number of combat rounds. In that case dw* is simply / and

, (12)


since NS = NC× (1+da*)× dw* and dw*× (1+da*) » 1

3.4. Proc rates

Proc rates (ppm) are coded in such a way that the numbers of procs per unit of time is constant. However procs are still tied to combat rounds. If you turn off attack for 30s every minute you will proc only half as much. This is done internally by the game engine by calculating a chance to proc per combat round, ppr, (since you can only proc once per round per hand) and making that chance proportional to the hasted delay of the weapon. The total number of procs Np is given by

Þ , (13)


where t is the fight time and is the number of combat rounds per second, which depends on your hasted delay only (and since ppr is proportional to hasted delay and NC is inversely proportional, they neatly cancel out and ppm becomes independent of hasted delay). The statistics apply to ppr only. Hence the proc rate is

. (14)


And of course we can again substitute NS=NC(1+da) to get an expression in term of NS.

The relative error in the proc rate is

(15)

and can get very large since is usually of the order of 0.05

3.5. Avoidance

Parsing avoidance is basically the complement of hitratios with the role of the PC and NPC reversed (see section 2.1.), that is avoidance = 1 – hitratio = NM/(NM+NH). Because they are complementary, the statistical error is exactly the same. I usually make the distinction between avoidance as defined above and total avoidance = NM/NS, since it’s the latter that determines your damage reduction when you are tanking.

Total avoidance = avoidance + dodge rate + block rate + parry rate + riposte rate = 1-NH/NS. For each of the d/b/p/r terms the error has the standard form. E.g. the apparent dodge rate =Nd/NS and

. (16)

 

4. Average damage per hit.

As I said in Chapter 2 the average damage per hit, AH, does not follow binomial statistics. Technically you could determine the distribution of damage for each parse and compute the variance and standard error from that. This is very impractical of course, but an approximate equation can be found as follows. The distribution of damage is not normally distributed because of the high frequency of min and max hits. There is little harm in pretending that the distribution is normal however as long as the distribution is not terribly skewed (such as with a lognormal distribution) and use that as a basis for estimating the error. Again, a small error in the error is not nearly as important as getting a reasonable estimate of the error itself. I’ve studied many damage distribution, both from PCs and NPCs, in detail and found that the standard deviation of the distribution is well described by

(17)


where DB is the damage bonus. The DB is a constant addition to each hit and therefore has no statistical variation. The simplication I made is neglecting the DB altogether. This is an approximation on the safe side, it overestimates the standard deviation, but it leads to such a simple final expression as we’ll see.

With s established, we can now calculate the error for a sample of NH hits as

, (18)


that is the relative error in AH is simply 1/Ö NH. If we substitute NH = hit ratio × NS then we get

. (19)


The second approximation I made, Ö hr » 1 (its ~0.75-0.8) underestimates the error, but this more or less cancels out with the first approximation. I’ve done a few strict analyses of the error in AH and found
dAH/AH = C/Ö NS, with C ranging from 0.9 to 1.1 depending on DB. So Eq.(17) is indeed a very good and easy to use approximation.

When parsing mitigation, Eq.(18) also applies since mitigation is just the average damage per hit you take instead of dish out yourself; mitigation = 1-AH/Amax, where Amax is the maximum damage per hit.

 

5. DPS.

5.1. Melee dps.

Parsing melee dps means adding up all the melee damage and dividing by the total fight time. In a continuous parse (Katta banker again) fight time is easy to determine and contains no error to speak of. The total melee damage, dmgM, is

. (20)

So we have 4 independent sources of error (3 for the MH where dw=1). Hence

. (21)


(
ddw/dw) is found from Eq.(12) (not Eq.(11), that would be counting the error in da twice). dda/(1+da) is given by Eq.(9). (dhr/hr) is given by Eq.(7), with the note that we can always use NS instead of NM+NH with little error. Finally, (dAH/AH) is given by Eq. (19).

Note: an additional source of error are critical hits, but their variation can be safely neglected compared to the ones above. In fact if use typical numbers you’ll find that you might as well neglect the dw and da terms as well.

We can now write out Eq.(21) in terms of NS and if we use the typical value of hr=0.6, we get

. (22)


In a continuous parse (
ddps/dps) = (ddmg/ dmg) so the following rule of thumb applies:

. (23)

In other words, you need a sample size of 40000 swings to get the error in dps down to 1% If your hasted delay is 1s, then that 10 hours of continuous parsing. When you’re parsing in during xp sessions in normal gameplay things are quite different and Eq.(23) is no longer valid. You get two additional sources of error. The first is the variation in the mobs you’re fighting. The second source is actually a collection of smaller sources all of which can be traced back to an error in the determination of the fight time, i.e. (ddps/dps) ¹ (ddmg/ dmg). You get variation from combat casting, getting stunned, out-of-range/cannot-see-target. Its impossible to quantify these two sources exactly. Let’s say the mobs you are fighting give you a 5% standard deviation just from difference in levels, mob class, etc. The additonal error in dps is then 2s/Ö Nmobs. The error goes down with the square of the number mobs, which is is lot smaller then then Ns. Those 40000 swings may translate to only 200 mobs if you’re dualwielding. So the additional relative error is then 2× 5%/Ö 200 = 0.7%, i.e. of the same order of magnitude as the normal error given by Eq.(23). The 1/Ö Nmobs mechanics apply to the other source as well. So when parsing during xp sessions, you have not one but three sources of statistical variation all of the same magnitude. I estimate the relative error in dps in xp sessions therefore to be at least 3/Ö NS instead of 2/Ö NS. We’re then talking ~100k swings to get an error of only 1%.

5.2. Proc DPS

Proc damage is given by

, (24)


with dp the damage per proc (with in turn can have additional variation from critical hits) and rr is the resist rate. If you work out the math for each factor and then apply typical values you’ll find that nearly all the variation comes from ppr. The exception being when using a 2H with long delay while slowed, but that’s not a normal circumstance for a beastlord. So

. (25)


A rule a thumb is not easily made because ppr can vary a lot. With WA5 a 2H weapon with hasted delay of 3s gives a ppr~0.15, a 1s delay weapon in the OH at WA0 yields ppr~0.017 and the corresponding relative errors are 4.8/Ö NC (~5/Ö NS) and 15.4/Ö NC (~16/Ö NS).

5.3. Total dps

Total dps is the sum of melee and proc dps and therefore

(26)
and

(27)
with

(28)


the fraction of melee dps compared to total dps. For most weapons the biggest chunk of dps comes from melee damage, so (1-fM)2 vanishes to 0 rather quickly, in which case we can simply use Eq.(23). But for weapons with very large procs, such as the ED or Dedgerex, the statistical variation in total dps comes mainly from procs (for the ED, the relative error will typically be ~4/Ö Ns instead of 2/Ö Ns). As a rule of thumb, if proc dps is less then 20% of the total dps then its statistical variation can be neglected compared to the variation in melee dps.

 

6. Additional considerations.

Ok, so you’ve parsed out your weapons, now what? What conclusions can you draw? Well if say weapon A parses out at 50± 0.5 dps and weapon B at 53± 0.5 then the answer is clear, the difference in dps is statistically significant so weapon B is better then weapon A. If weapon A parses out at 50± 10 and weapon B at 53± 10 then the answer is also clear: you can’t tell which weapon is better, the statistical uncertainty is too large. This example shows why having some notion of the statistical error is important; you can’t draw conclusions from the dps numbers alone. Of course it isnt always necessary to calculate the errors themselves; if the parser mentioned he did his parses overnight against a static mob then you can be sure he has a sufficiently large sample. If he mentioned he parsed for 5 minutes then you can readily dismiss the difference as insignificant.

But what if weapon A parses out at 50.0± 2.5 and weapon B at 53.0± 2.5? You’re in the grey area now where the dps difference which is of the same magnitude as the errors. If A=A*± dA* and B=B*± dB* then

. (29)


So in this example, the difference between the two weapons is 3.0± 3.5 dps. So what’s the conclusion? There are methods to calculate the chance that weapon B is better then A but there really isn’t much point in using them unless you have no choice. In the grey area, the error in the error suddenly becomes important and can spell the difference between statistically significant and insignificant. Far better then to parse some more until the statistical uncertainty is so low that there can be no mistaking the difference.

But sometimes it isn’t possible to go back and obtain a bigger sample size. For instance when you want to parse out the effects of the Combat Stability of Combat Agility AA skills. Once you’ve purchased the skills, you can no longer go back and obtain a bigger sample of your avoidance or mitigation without these skills. It is therefore a good idea to get an idea of the magnitude of the sample size you’ll need in advance, and stay on the safe side when parsing the baseline. With AA skills the effects are usually small, a few percent, so the relative error needs to be even smaller and that means you’ll need a big sample. Combat Agility is actually one of the larger effects. Let’s say we want to parse out the difference between CA0 (baseline) and CA3. CA3 is advertised as a 10% increase in avoidance. If we want to parse that out with an error of no more then say 2% (that is a 20% error relative to the difference in avoidance itself), and use equal sample sizes at CA0 and CA3, then the error in each parse needs to be 2%/Ö 2 = 1.4%. Using Eq.(7) and with a mob hitratio of ~0.5-0.6 we can then estimate that our sample size needs to be 15000-20000 swings (each!). That’s not easy to parse out; you can’t use static mobs for defensive parses. For defensive parses you’ll usually have to settle with less accuracy. But its still a good idea to get as big a sample size as is possible within reason for the CA0 parse. Better too much then too few because you can’t go back.

Another thing to consider is that not all skills work as advertised. Combat Stability does very little if you’re not at the AC soft cap, and even if you are, you won’t get 10% according to the definition of mitigation I use because damage bonuses can’t be mitigated.

Finally, consider that all the errors I’ve given are based on the 95% confidence interval. That means that in 5% of all cases the real value p falls outside p*± dp*. There are hundreds of dps parses on the Net, so this’ll be the case for dozens of them. This is just another reason why a single parse never proves anything even if the errors are given (likely yes, proven no); it may be one of those 5%. You can of course increase the confidence interval to e.g. the 99.99% confidence interval (4s instead of 2s) but even that has its limits. Multiple parses, preferably from independent sources make a far better case.

Something to think about in that respect is selective reporting. "Man does not win the lottery" is not a headline you’ll see in the newspaper but it applies to 99.99999% of us. In EQ, it is pretty much tradition that after every patch, someone somewhere posts something like "it seems that after today’s patch I get a lot more resists of my Slow spell" or "has the proc rate on weapon X been nerfed?" or something similar like that. Usually these are unparsed observations (with the request if someone can parse it to confirm it) and therefore highly biased to begin with. But just suppose for a moment it is accurate, for instance that person really did notice twice as many resists as normal (twice the amount should be noticable). Consider this though: a few hours after the patch you cant have cast Slow more then a hundred times. Suppose the real resist rate is 10%. You’d expect to get 10 resists but with a standard deviation of 3. The chance for any one person to get more then twice the amount of normal resists is something like 0.05%. But the person reporting is forgetting he’s not just any one person. He’s part of a group of tens of thousands of people that play EQ every day, so something like this would occur to a dozen people every day. Of course, the 99.95% rest of them don’t rush to the boards to say ‘situation normal’, only the exceptional case get reported. The patch just adds a psychological aspect to it. This is just an example, but exceptional things do occur on a regular basis. Of course if just one person reports that his fear spells holds longer with a charisma buff and a second person adds his voice to that then an EQ myth is born. Selective reporting can even happen with parsed data. I retest previously parsed data all the time. Some people have their logs on by default and parse everything. Most of it never gets reported because there’s no need to, situation normal. But sooner or later someone will come across a statistical fluke which is highly improbable by itself. Re-testing data is therefore essential.

 

/hugs

Coprolith

65 Feral Lord

Tholuxe Paells

Email: coprolith@chello.nl