| Thread Navigator

Canvas & Ratio

Choose your destination platform format

Layout Template

Choose a content structure for your slides

Preset Themes

Typography & Sizing

Font Family

Title Font Size36px

Body Font Size18px

Header & Footer Size12px

Brand Kit Customization

AGENCY

Configure brand assets for headers & footers

MULTI-PROFILES (AGENCY)

Active Brand Profile

Show Brand Watermark

Brand Watermark Text

Social Handle

Brand Logo URL (PNG) AGENCY

SAVE PRESETS (AGENCY)

Save current as Preset

Outro Slide CTA

Customize your closing call-to-action slide

CTA Title

CTA Message & Emojis

Custom CTA Buttons

Background Pattern

Source Content

Build Your Carousel

Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Drag Post #1

Crémieux

@cremieuxrecueil

What's more convincing? p = 0.04 in a sample of 10 or p = 0.04 in a sample of 1,000,000? 🧵

Drag Post #2

Crémieux

@cremieuxrecueil

Pick an answer, then go to the next post.

Drag Post #3

Crémieux

@cremieuxrecueil

OK, now you've answered, and I hope you answered correctly: p = 0.04 in a sample of 10 will generally be much more convincing than that same p-value in a sample of 1,000,000 people. The reason has to do with a paradox.

Drag Post #4

Crémieux

@cremieuxrecueil

This is my favorite illustration of the paradox (from @lakens): Even when there's no effect, with a large sample, you'll find plenty of significant estimates because minuscule deviations from 'no effect' with a point null will often be significant if you have high power.

Apply Image

Drag Post #5

Crémieux

@cremieuxrecueil

Because of this fact, the same p-value at different levels of power corresponds to very different levels of evidence. So p = 0.04 in a sample of 1,000,000? That could be better evidence against an effect than for it. That the essence of Lindley's paradox.

Drag Post #6

Crémieux

@cremieuxrecueil

Lindley's paradox makes it very clear why p-values are not measures of evidence absent context. If we want reliable measures of the evidence for something, we can use likelihoods or Bayes factors instead.

Apply Image

Drag Post #7

Crémieux

@cremieuxrecueil

The way Bayes factors work is by dividing the marginal likelihood of the data you observe under one hypothesis to its marginal likelihood under another hypothesis. Or even simpler, comparing the probability of one model (M0) to another (M1) given an observation.*

Apply Image

Drag Post #8

Crémieux

@cremieuxrecueil

I'm going to get to some results, but first: The ladder of evidential strength proposed by Sir Harold Jeffreys goes from evidence being worth little more than an anecdote to being extreme. This goes in reverse when the denominator is larger than the numerator.

Apply Image

Drag Post #9

Crémieux

@cremieuxrecueil

In 2018, Hoekstra et al. evaluated the evidence for null effects in medicine. Several trials had found nonsignificant results, but it wasn't clear how much evidence in favor of the null those trials provided. There was only a modest relationship with p-values.

Apply Image

Drag Post #10

Crémieux

@cremieuxrecueil

But p-values are evidently less important than sample size for determining if some effect isn't real. Studies with larger sample sizes provided much better evidence in favor of no effect.

Apply Image

Drag Post #11

Crémieux

@cremieuxrecueil

Hulme et al. evaluated the efficacy of hydroxychloroquine for treating COVID-19. They found that reanalyzing a trial showing benefits from HCQ with patients who deteriorated, excluding the untested, or when the excluded were assumed positive, the evidence became fairly weak.

Apply Image

Drag Post #12

Crémieux

@cremieuxrecueil

In another HCQ reanalysis, Wagenmakers and Gronau found that the evidence for a treatment effect among COVID-19 patients with pneumonia was only moderate, suggesting the need for more research.

Apply Image

Drag Post #13

Crémieux

@cremieuxrecueil

In 1959, Festinger and Carlsmith conducted a now-famous study where they documented Festinger's recently-coined concept of "cognitive dissonance". The study has been cited almost 6,000 times.

Apply Image

Drag Post #14

Crémieux

@cremieuxrecueil

In the study, participants did tedious tasks for an hour and were then not paid, paid $1, or paid $20 to tell people that the tasks were interesting. The paid groups ended up rating the tasks more interesting after all was said and done, showing 'cognitive dissonance.'

Apply Image

Drag Post #15

Crémieux

@cremieuxrecueil

Reanalysis of this result in 2018 suggested that the evidence for cognitive dissonance was actually not more than anecdotal. In other words, it was not worth more than a bare mention for demonstrating the phenomenon it was argued to show.

Apply Image

Drag Post #16

Crémieux

@cremieuxrecueil

Bayes factors might be able to help with the abuse of p-values. The way they could do that is by forcing people to put more thought into their analyses. So if I'm being realistic, I don't expect them to help, but I hope you'll agree they're pretty cool!

Drag Post #17

Crémieux

@cremieuxrecueil

Sources: <a target="_blank" href="https://www.cell.com/trends/ecology-evolution/abstract/S0169-5347(21)00341-4" color="blue">cell.com/trends/ecology…</a> <a target="_blank" href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0195474" color="blue">journals.plos.org/plosone/articl…</a> <a target="_blank" href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0245048" color="blue">journals.plos.org/plosone/articl…</a> <a target="_blank" href="https://osf.io/preprints/psyarxiv/7nk8z" color="blue">osf.io/preprints/psya…</a> <a target="_blank" href="https://journals.sagepub.com/doi/full/10.1177/2515245918779348" color="blue">journals.sagepub.com/doi/full/10.11…</a> <a target="_blank" href="https://psycnet.apa.org/record/1960-01158-001" color="blue">psycnet.apa.org/record/1960-01…</a> <a target="_blank" href="https://journals.sagepub.com/doi/full/10.1177/2515245918779348" color="blue">journals.sagepub.com/doi/full/10.11…</a> Unnoted in the thread results regarding kidneys: <a target="_blank" href="https://osf.io/preprints/psyarxiv/4pf9j" color="blue">osf.io/preprints/psya…</a>, <a target="_blank" href="https://ccforum.biomedcentral.com/articles/10.1186/s13054-022-04120-y" color="blue">ccforum.biomedcentral.com/articles/10.11…</a> * I know this definition is imprecise, but it's a tweet. I also know Bayes factors aren't perfect and they can be abused.

Drag Post #18

Crémieux

@cremieuxrecueil

For clarity, the first post is asking about which is more convincing evidence of an effect being present. The post on Jeffreys' classifications mentions going in reverse, but to be clear, what I mean is fractional Bayes factors being evidence for the denominator hypothesis/model

Drag Post #19

Crémieux

@cremieuxrecueil

Clarification on what I meant with the apostrophes around 'no effect' in the p-value plot: <a target="_blank" href="https://twitter.com/cremieuxrecueil/status/1780834586982764545" color="blue">x.com/cremieuxrecuei…</a>