Hi,👋 we have updated the app and fixed multiple bugs. We are lacking funds, request to free user not to use Adblock. Ads are non intrusive. 😊

@cremieuxrecueil: What's more convincing?p = 0...

@cremieuxrecueil
33 views Feb 12, 2025
1
What's more convincing?

p = 0.04 in a sample of 10 or p = 0.04 in a sample of 1,000,000?

🧵
2
Pick an answer, then go to the next post.
3
OK, now you've answered, and I hope you answered correctly: p = 0.04 in a sample of 10 will generally be much more convincing than that same p-value in a sample of 1,000,000 people.

The reason has to do with a paradox.
4
This is my favorite illustration of the paradox (from @lakens):

Even when there's no effect, with a large sample, you'll find plenty of significant estimates because minuscule deviations from 'no effect' with a point null will often be significant if you have high power.
Media image
5
Because of this fact, the same p-value at different levels of power corresponds to very different levels of evidence.

So p = 0.04 in a sample of 1,000,000? That could be better evidence against an effect than for it.

That the essence of Lindley's paradox.
6
Lindley's paradox makes it very clear why p-values are not measures of evidence absent context.

If we want reliable measures of the evidence for something, we can use likelihoods or Bayes factors instead.
Media image
7
The way Bayes factors work is by dividing the marginal likelihood of the data you observe under one hypothesis to its marginal likelihood under another hypothesis.

Or even simpler, comparing the probability of one model (M0) to another (M1) given an observation.*
Media image
8
I'm going to get to some results, but first:

The ladder of evidential strength proposed by Sir Harold Jeffreys goes from evidence being worth little more than an anecdote to being extreme.

This goes in reverse when the denominator is larger than the numerator.
Media image
9
In 2018, Hoekstra et al. evaluated the evidence for null effects in medicine.

Several trials had found nonsignificant results, but it wasn't clear how much evidence in favor of the null those trials provided.

There was only a modest relationship with p-values.
Media image
10
But p-values are evidently less important than sample size for determining if some effect isn't real.

Studies with larger sample sizes provided much better evidence in favor of no effect.
Media image
11
Hulme et al. evaluated the efficacy of hydroxychloroquine for treating COVID-19.

They found that reanalyzing a trial showing benefits from HCQ with patients who deteriorated, excluding the untested, or when the excluded were assumed positive, the evidence became fairly weak.
Media image
12
In another HCQ reanalysis, Wagenmakers and Gronau found that the evidence for a treatment effect among COVID-19 patients with pneumonia was only moderate, suggesting the need for more research.
Media image
13
In 1959, Festinger and Carlsmith conducted a now-famous study where they documented Festinger's recently-coined concept of "cognitive dissonance".

The study has been cited almost 6,000 times.
Media image
14
In the study, participants did tedious tasks for an hour and were then not paid, paid $1, or paid $20 to tell people that the tasks were interesting.

The paid groups ended up rating the tasks more interesting after all was said and done, showing 'cognitive dissonance.'
Media image
15
Reanalysis of this result in 2018 suggested that the evidence for cognitive dissonance was actually not more than anecdotal.

In other words, it was not worth more than a bare mention for demonstrating the phenomenon it was argued to show.
Media image
16
Bayes factors might be able to help with the abuse of p-values. The way they could do that is by forcing people to put more thought into their analyses.

So if I'm being realistic, I don't expect them to help, but I hope you'll agree they're pretty cool!
18
For clarity, the first post is asking about which is more convincing evidence of an effect being present.

The post on Jeffreys' classifications mentions going in reverse, but to be clear, what I mean is fractional Bayes factors being evidence for the denominator hypothesis/model
19
Clarification on what I meant with the apostrophes around 'no effect' in the p-value plot:
Actions
Visual Editor Carousel Maker NEW
Update Thread
What You Can Do
  • Download as PDF
  • Save to Notion
  • Export as Markdown
  • Visual Editor
  • LinkedIn & Instagram Carousel Maker
Create Free Account

Includes 7-day Premium trial