For scientists, research is published in the journal Nature  is a big deal. It bears weight, prestige, and the promise of a career advancement ̵
But the prestige of these journals does not free them from the problems that have plagued science for decades. In fact, because they publish such exciting and innovative work, there is a risk that they will even publish exciting but unreliable articles more . You can also contribute to a scientific record that shows only the "yes" answers to big questions, but does not mention the important but boring "no" results.
Colin Camerer, a behavioral economist at the California Institute of Technology, recently led a team of researchers in attempting to review 21 social science studies from Science and Nature 13 of them successfully replicate. The results published yesterday (in Nature of course) may also indicate how our focus on positive results affects literature. They also paint a complicated picture of the replication crisis in the social sciences and illustrate how infinitely complicated the replication project is.
How reliable is the scientific record?
The reliability crisis of psychology broke out in 2011 with a wave of consecutive shocks: the publication of a paper purporting to show precognition; a scam scandal; and a recognition of P-Hacking, in which the researchers exerted too much freedom in the way they analyzed data to make almost any result real. The scientist began to wonder if the publication file was bloated with unreliable results.
The crisis is by no means limited to psychology; Many of the problems plague fields from economics to biomedical research. But psychology has been a persistent and particularly loud voice in conversation, with projects like the Center for Open Science that understand the scope of the problem and should try to fix it.
In 2015, the Center published its first results from a huge psychology replication project. Of the 100 replications attempted, only about one third were successful. At the time, the replicators were cautious about their conclusions and pointed out that failed replication could mean that the original result was an unreliable false positive – but it could also mean that there were differences in the experiments or unnoticed that the failed replication was a false negative result.
Indeed, the tendency to publish positive results makes false negative results a significant risk for replication.
The dangers of false-negative
A challenge of experimental work is deciding how many research topics you need to get a reliable result. There is no universal answer when it comes to sample size: the right number of subjects (or mice or countries) to study depends on the question you ask. If you can expect big differences between groups – for example, if you want to find out if men are on average larger than women – you do not need so many people. But if you think that your difference will be small, you will need a much larger sample size.
This expected difference between groups, called the effect size, helps researchers figure out how many subjects they need for their study. It's important to do it right, because if you do not have enough effect size motives, you're more likely to miss a result that's actually real-it will not be distinguishable from statistical noise.
Researchers often look at previous studies to estimate an effect size. In the case of replication, it makes sense to use an effect size derived from the original work.
The thing is, there is reason to believe that these effect sizes might not be very accurate. Experiments often ask multiple questions at the same time – if you ask enough questions, the random chance will make it look like the answer is "yes" on some of them. Scientists typically only reported their "yes" answers because these are the only ones that seem interesting. But if everyone does that, the literature will be biased over time: the big yes – effect greats will be released, but they can reflect happiness as well as what 's really going on.
This means that replication is actually looking for an effect size that is smaller than that of the original study. Therefore, the researchers doing the replication may need to use more people to make sure they have a good chance of finding the effect. This could be one of the reasons why the replication rate was so low.
The replication of a study sounds simple, but it is not
Camerer and his colleagues wanted the reliability of social outcomes in nature and science . They searched for studies published between 2010 and 2015 that were easy to replicate: those that used research topics that were easily accessible (such as students) and tested a clear experimental hypothesis. They found 21 papers that met their criteria.
But Camerer and colleagues did not just want to look at every single study; They wanted to find out if they could say anything about the reliability of this type of work. They wanted to do science about science or meta-science. That meant they had to try to be consistent with each replication. This is difficult for very different studies, and it meant that some decisions were made so that each paper received similar treatment.
The team decided to concentrate only on the first experiment in each work and to repeat it. A single experiment can yield several results. So if replication shows that some are the same and some are different, how do you decide if it was successful? The researchers decided to focus only on the result that was considered the most important in the original study and compare it to replication.
They involved the original authors in replicating their work so that they could be sure that the replications were as close together as possible to the original studies and that everyone agreed on how they would analyze the data. They also made sure that their sample size was large enough to have much lower impact than those mentioned in the original contributions, so they are less likely to receive false negatives.