In 2016, a survey of hundreds of researchers suggested that many branches of science are facing a reproducibility crisis. While there is debate about the scope of the problem, many scientists agree that it's an issue. There could be a variety of reasons for the problem. One aspect may involve errors in genetic sequences either in reference databases that all researchers use to verify the sequence of some portion of the genome, or RNA, for example. For example, in 2017, public databases of DNA sequences were found to carry higher-than-expected rates of errors. This could be having a multitude of detrimental effects, particularly if researchers are using those erroneous sequences to generate reagents that will be used in laboratory experiments.
A study in Nature Biotechnology in 2019 found that long-read sequencing can lead to mistakes in genome sequencing, which could lead to problems when researchers are verifying the sequences of reagents they use, or when sequences are being reported or updated.
Outside of problems with sequencing, investigators have found that in supplementary data that often accompanies research studies and contains large amounts of information that can't be included with the print of each study, there are Excel files in which autocorrect has butchered gene names. This has continued even after researchers were warned of the problem; in 2016 about 20 percent of papers had this issue and by 2021, it had grown to 30 percent of papers.
Now researchers have discovered sequencing errors in some highly cited genetic studies regarding cancer. This discovery has been reported in a preprint. In this study, the authors assessed the supplementary data of hundreds of papers, and found errors in the sequences of DNA or RNA used in experiments, which don't match reference data. While this might be expected of papers from very low-impact journals, or those lacking integrity, such as paper-mill journals, this is apparently also happening in high-impact journals.
Two journals were examined in this study: Molecular Cancer and Oncogene. Previous work by this group identified errors in sequences reported in Gene and Oncology Reports. In the lastest study, the researchers manually screened reagents that were meant to target unmodified human genes or DNA sequences, for 334 Molecular Cancer papers published between 2014 and 2020. Errors were identified in 92 of 334 papers - there were problems with 253 (3.8%) of 6,647 nucleotide sequences, with a median of two erroneous sequences per problematic paper. Some years were worse than others; in 2016, ten percent of Molecular Cancer papers carried these types of errors while it was 38 percent in 2020.
For Oncogene papers, a focus was placed on those that mentioned circular RNA or microRNA; there were errors in 50 of 1,165 screened sequences, reported in 21 of 42 papers analyzed.
It's difficult to know the true impact of all of these errors. Some may have no effect on the conclusion of a manuscript while other small changes in genes can have a massive influence on their function. One also cannot know how many of these errors were simple, unintentional mistakes, or whether there were other motives involved. Certainly, the journals should be taking a hard look at this issue.
Study leader and cancer researcher Professor Jennifer Byrne of the University of Sydney noted that about one-third of the problematic Molecular Cancer papers and roughly one-quarter of the error-containing Oncogene papers have already been flagged on the post-publication peer-review site PubPeer, primarily for other image-integrity issues.
“The editors-in-chief of both journals, and Springer Nature, agree with Professor Byrne that ensuring the integrity of the publication record is of the utmost importance, and we take concerns raised regarding the papers published in our journals very seriously,” commented Chris Graf, research-integrity director at Springer Nature. “We requested details of these concerns, so that we could investigate them and act where appropriate, over a year ago, but they have only just been made available. Now that we do have them, we are able to start a full investigation.” He adds that, “If concerns prove to be well founded, we will take action.”