The Physician Behind the PREDIMED Retraction

John M. Mandrola, MD: Hi, everyone. This is John Mandrola from theheart.org, Medscape Cardiology. And I'm pleased to be with Dr John Carlisle, an anesthesiologist in the United Kingdom who did the research involved with the recent PREDIMED retraction. Dr Carlisle, welcome.

John B. Carlisle, MBChB: Thank you, John. I'm very pleased to be here.

Mandrola: I'm excited to meet you. First, just tell us who you are.

Carlisle: I am a hospital doctor in the UK. I work in a smallish-sized [National Health Service] hospital. I've been a specialist here in Devon, UK, for 17 years. I'm not an academic; I'm not employed by a university. My day-to-day job is as an anesthesiologist. I also staff a preoperative assessment clinic, meeting patients for surgery. I'm an intensivist as well.

Mandrola: How did you get interested in this project?

Carlisle: When I was a trainee, I was looking for things to bulk up my CV and a job came up with the Cochrane Collaboration. At the end of the 1990s, the Cochrane Anaesthesia Review Group, which is based in Denmark, Copenhagen, was just setting up and they wanted somebody to respond to comments. I held that job for a couple of years. And then they said, "Well, it's about time you did your own systematic review."

So, I did a systematic review about drugs to prevent postoperative nausea and vomiting. During the course of looking at papers for that systematic review, I came across quite a few from a Japanese anesthesiologist. And it turned out, about 10 years later, that he had made up most of his data. It is from that that I really developed an interest in this field.

Mandrola: You noticed the irregularities in the process because of doing the systematic review, is that right?

Carlisle: That's correct.

Mandrola: Do you have any background in statistics or computer science?

Carlisle: No, not really, except for the passing interest you have in order to pass exams, which many medical students will be familiar with. It's one of those last things you look at before the exam and then you forget about it. The other time I learned a bit about statistics was for doing systematic reviews, looking at how to analyze randomized controlled trials (RCTs) when you combine them.

Mandrola: I've read your paper and numerous descriptions of your methods, but can you make it simple? How did you do this?

Carlisle: Fortunately, the core of the method is familiar to all doctors, which is how we calculate the probability that two groups are different. When you're looking at an RCT that asked, "Did this drug work," you're generally looking for a small P value. If the outcome was something continuous, like patient weights — maybe one group went on a diet, the other did not, and you're looking at weight loss — you do a t-test, which will be familiar to most doctors. More than two groups, you use an [analysis of variance (ANOVA)]. The names of those tests are familiar even if you don't know the nuts and bolts of it.

My method was to apply those types of tests to the characteristics that are present before you do the trial. So, the heights, the weights, those things that are present in the population before we actually do the experiments.

Mandrola: The baseline characteristics in an RCT, if it is truly randomized, should not be different.

Carlisle: It is true that if you have a big enough sample — you'd need hundreds of thousands — then the means will be pretty much exactly the same. Most studies never get that big. How much difference relies on chance — the chance differences in the people allocated to one group and the other. There is almost always some difference. When a study reports those means, they may report them imprecisely, so they may appear to be the same. A mean weight of 74.1 kg in both groups, for example, may turn out to be quite different if you then increase the number of decimal places. Most of the time, there are some differences between the groups. How much difference should be due to chance.

Mandrola: How does your method detect irregularities in these baseline variables?

Carlisle: At its simplest, you do a t-test on the heights. A really small P value, indicating a really big difference, would suggest that maybe there is something wrong with that study. At the other end of the spectrum, the groups may be unusually similar, [and you can calculate a P value for that].

My method differs from the normal methods of t-tests and ANOVAs only to the extent that I used simulations to try and work out [the chance that two means were the same] because that is just as unlikely as two means that are very different.

Mandrola: In the first table of [a published RCT], the investigators list the baseline characteristics. The P values for each baseline characteristic (eg, height, weight, waist circumference) are generally included. In your method, you look at a sum of the average of those P values?

Carlisle: That is correct. I generated a single P value for the trial as a whole, which means you've got to somehow combine those P values for those different characteristics. Sometimes the authors won't calculate the P values, as was the case in in the PREDIMED study.^[1] If they had, they may have spotted something wasn't quite right, but they didn't do that. Sometimes people will calculate P values incorrectly, so you'll see a P value next to that characteristic, but it may be wrong.

Some journals recommend that you don't calculate P values for baseline characteristics because any differences should be the result of the chance process of allocation rather than something important.

Mandrola: What would you characterize as the weaknesses of your method?

Carlisle: The assumptions of the method [are the weakness]. If people are not aware of those assumptions, they will misinterpret the analysis. The analysis assumes that the sample population is allocated in a very simple way. It assumes that there is no block randomization, no stratification, and no minimization. There are some new methods of randomly allocating patients that make a study more efficient.

The minimization process, for instance, actually changes the probability of being allocated to one group or the other as the trial progresses. That means that my method would produce a slightly incorrect P value. This process also assumes that the calculated means are normally distributed. Things like age, height, and weight are distributed in a slightly non-normal way (eg, log-normal). However, the distribution of means are usually normally distributed, so that is likely okay.

Mandrola: What about correlation of variables? For instance, tall people might have heavier weights.

Carlisle: That's right. So, even if the P values for the individual statistics are correct, when you combine them, the method assumes that they are independent of each other. And as you just said, tall people are generally going to be heavier, so any slight imbalance in heights will also be reflected in an imbalance in weights because those two things are connected. Whenever one uses this sort of method and you get the result, you've then got to pause and think, "Okay, hang on, were the assumptions that we made met?" And if there are reasons to think they're not, then you've got to be fairly cautious in how you approach what the next step might be.

Selecting Studies for Analysis

Mandrola: How did you decide which studies to look at?

Carlisle: Doing the original systematic reviews that I did with Cochrane, I had already analyzed studies by that Japanese author that I mentioned. He is top of the leaderboard on Retraction Watch — a website that your viewers may be interested in. That site posts a list of the top 30 or so authors who have had the most papers retracted. The purpose of the website is to track retractions of biomedical literature. There are four anesthetists in the top 20, which is a bit of a worry to anesthetists. Either we are lying more than other specialties or we're lying the same amount, but really bad at it and we get found out.

I had analyzed the studies from this Japanese researcher and then I wanted to look and see how many more of these types of studies might there be in the journal I work for, Anaesthesia, and other anesthetic journals in which he had published. I looked at six anesthetic journals and I ended up looking at 15 years' worth of RCTs. I analyzed any I came across; I was not interested in the particular topic of those RCTs.

It does not mean that the studies with normal P values were good ones nor does it necessarily mean that the papers with small P values are bad ones either.

Having done that, some of the people I talked to at conferences were a bit alarmed that maybe anesthesiologists were getting a bad name as liars. They suggested I look at some other journals. I chose two big-hitting journals, the New England Journal of Medicine and [the Journal of the American Medical Association]. I simply looked at 15 years' worth of RCTs with the caveat that I didn't always include every single one I came across. There were a few animal studies that I purposely decided not to include, but I did include a few animal studies in my analysis.

Comments

Commenting is limited to medical professionals. To comment please Log-in.

Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.

The Physician Behind the PREDIMED Retraction

Selecting Studies for Analysis

Replicating the Method

Are Systematic Reviews Better?

Comments

The Physician Behind the PREDIMED Retraction

Selecting Studies for Analysis

Replicating the Method

Are Systematic Reviews Better?

Tables

References

Authors and Disclosures

Authors and Disclosures

Authors

John M. Mandrola, MD

John B. Carlisle, MBChB

Comments

Email This

Feedback