5

It is necessary for me to write about the 2.5th and 97.5th percentiles of a data set. What is the correct way of writing this?

This post talks about "zeroth", "n-th" and even "epsilonth" as generalisations of the -th suffix, but I haven't found any guidelines for non-integers.

I feel that 2.5th percentile sounds better than 2.5-percentile. Do you agree?

Stanley
  • 159

4 Answers4

6

By definition, there are only one hundred percentiles, ranging from the 1st (top 1%) to the 100th (bottom 1%). You can not be in the 2.5th percentile.

Carl Smith
  • 1,483
  • 12
  • 23
  • 25th and 975th permillile? I can only find one English usage of permillile (but several in Latin!) There are quartiles and quintiles and deciles, but I can't find any general term for a k-ile. – Mike Harris Jun 06 '16 at 19:33
  • This is correct. If you have to divide the total into intervals smaller than percentiles, then by definition they are no longer percentiles. Because "permillile" doesn't seem to be a word, you have to find some other way to express your meaning. (This is also consistent with Nick Cox's answer.) – MarcInManhattan May 25 '23 at 20:08
4

Late to this party, but coming with some statistical experience:

From a statistical point of view, neither is good or indeed common wording in technical literature. This is partly a hangover from long usage (the word percentile was introduced in the 19th century) and partly because there are better ways to express the idea.

Historically, the 1st, 2nd, ..., 99th percentiles are numerical values greater than precisely 1, 2, ..., 99% of the data. So, although people don't usually say this, there could be 101 percentiles, because there are also the minimum and maximum. (Indeed, the maximum is greater than or equal to 100% of the values. There is lots of small print about how to calculate such percentiles when the number of values is not a multiple of 100 (almost always) or if there are ties in the data (two or more equal values; very common with some kinds of data: most people have 10 fingers), but let's not go there, as the small print doesn't affect the language used.)

More recently, by extension percentiles have been used for the intervals between these numerical values: the 1st percentile interval (or class, or bin) lies between the minimum and the 1st percentile (value), and so forth.

That said, a wording I think would register with statistical groups is

the 2.5% point or the 97.5% point of a distribution

This particular example is not bizarre as it may seem, as a pair of such points often defines a so-called confidence interval covering 95% of a particular distribution, which is typically met in a first course in statistics. If that is true, then we would use quite different wording, namely 95% confidence limits. But there are many other kinds of interval in statistics.

Nick Cox
  • 145
  • 1
    Tangential nitpick, a 95% confidence interval is not an interval that contains 95% of a particular distribution. – Nuclear Hoagie May 25 '23 at 15:54
  • It contains 95% of a sampling distribution. I didn't want to bring in yet more ideas secondary to the main issue. – Nick Cox May 25 '23 at 15:57
  • It's not even that. Suppose you sample a population and find a mean value with a 95% CI. If you ran the same experiment many times, you'd get some distribution of sample-mean values. The 95% CI from any one particular experiment does not contain 95% of the distribution of means over experiment runs. That of course can't be the case - you get a different 95% CI for every run of the experiment, they can't all contain 95% of the distribution of sample means. – Nuclear Hoagie May 25 '23 at 16:08
  • Fair enoigh comment, but I suspect you're taking most potential readers much further than they want to go. The intent of a confidence interval is to capture 95% of the possibilities for what is being estimated; in practice that can only be done approximately at best and one sample is a highly fallible guide to what it is taken from. Here it is not easy to be completely correct and highly concise and clear at the same time. Perhaps I should just cut the comments on confidence intervals, although to me it's the most obvious way in which anyone would use 2.5 and 97.5% points. – Nick Cox May 25 '23 at 16:17
2

I feel that 2.5th percentile sounds better than 2.5-percentile. Do you agree?

Yes I do agree. I would pronounce them 'two point fifth percentile'.

I see that Hot Licks said the same while I was typing.

I'm not sure what I would say with, e.g. 2.25 th.

0

Per cent describes a division into 100; the equivalent for 1000 is per mille, and this gives the rare terms permilles or milliles: hence you have the 25th permille/millile, equal to the value that 2.5% of samples are below.

Alternatively, a quantile is an arbitrary division and you can precede it by a number indicating the number of subgroups, so a 200-quantile would have 200 divisions, and the 5th 200-quantile is what you want. Not great, but that's what happen when you use intentionally obscure measures.

See the Wikipedia article on quantiles. Or find a way of rephrasing it, such as "2.5% of samples fall below x".

Stuart F
  • 9,628
  • 5th-200-quantile is not, and is not like, any expression I've ever read on quantiles, and I've published papers (and posted many times on Cross validated) on the topic. The last sentence is closer to common usage. A researcher might refer to 2.5% of samples if the data were based on samples of material (water, rock, blood, whatever); a statistical person would be more likely to say 2.5% of the sample or of the values or of the data. – Nick Cox May 26 '23 at 00:48