Thursday, June 3, 2021

Wikipedia's coverage of the lab leak hypothesis (part II): a proposal to curate the news sources



This blog entry continues my comments on Wikipedia's coverage of the lab leak hypothesis.  The central point of discord has been that Wikipedia has to avoid edits pushing conspiracy theories, but without going as far as excluding pieces and bits of legitimate information.  Most information on the lab leak hypothesis is notable and worthy of coverage in Wikipedia, but so far the former balance has not been reached, and a feud is still ongoing.

Wikipedia publishes only the opinion of reliable authors. The complexities of the molecular biology, virology and epidemiology behind the origin of SARS-CoV-2 then requires that sources are chosen carefully. In this post I'll take a deep dive into the main differences in the presentations of the lab leak hypothesis between news outlets and official sources, like the World Health Organization (WHO). This is the second part in a series of blog post on this topic, if you are interested in reading the previous part, you can do it by clicking here.

Before I start citing news outlets on the lab leak hypothesis it is worth warning their potential for failing at reliability. Despite the occasional blunders, news reporting from well-established news outlets is generally considered to be reliable for statements of fact in Wikipedia. However, whether a specific news story is reliable for a fact or statement should be examined on a case-by-case basis, hence this blog post.

Two important considerations that help on this assesment are: Firstly, the fact that most medical news articles fail to discuss evidence quality.  They tend to overemphasize the certainty of any result. Second, the fact that statements that all or most scientists or scholars hold a certain view requires reliable sourcing that directly says that all or most scientists or scholars hold that view. Otherwise, individual opinions should be identified as those of particular, named sources. 

High-quality realiable source (RS, in Wikipedia jargon) that speak about the lab leak hypothesis include: Reuters, AP, PBS, The NY Times, and CNN.  In this post I will point out the main points of contention between the information they have published and the facts published in high-quality secondary sources published in top scientific journals, or, lacking that point of contrast, I will use the opinion of top experts virologists, like Kristian Andersen and Victor Max Corman.

___________________________________________________

1) Let's start with claims from Reuters:

a. Where, when and how SARS-CoV-2 originated is a mystery.  

True. 

The word "mystery" means "anything kept secret or unexplained or unknown". Since the direction in which news sources usually fail is when they overestimate certainty of research, in this case they would be doing the opposite: underestimating the certainty expressed by mainstream virologists on whether the origin is a mystery or whether it is well-understood.  The WHO report says "It remains to be determined where SARS-CoV-2 originated". Andersen introduces a nuance: he said it is not surprising to not have found the intermediate host because that kind of work "often takes decades". A more precise wording that describes this fact is that we know little about the origin but, conditional on being about two years away from the start of the pandemic, we know approximately the right amount that we are expected to know.

b. The hypothesis that SARS-CoV-2 escaped from a virology laboratory in Wuhan, China, is one of the two prevailing competing theories.  

False.

The WHO report ranked first the hypothesis of introduction through an intermediate host followed by zoonotic transmission.  By definition the two prevailing theories are the ones that have "superior power or influence" than the rest.  This means the two prevailing theories are direct zoonotic transmission, and intermediate host followed by zoonotic transmission, at least according to the WHO report.  Andersen said that the WHO report shows there are "much more likely competing hypothesis" than the lab leak. He also finds unfortunate that others "suggest a false equivalence between the lab escape and natural origin scenarios". Reuters got it inaccurate here.

c. Scientists have failed so far to identify any wildlife infected [prior to the initial outbreak] with the same viral lineage of SARS-CoV-2

True

The WHO report says: "the presence of SARS-CoV-2 has not been detected through sampling and testing of bats or of wildlife across China. More than 80000 wildlife, livestock and poultry samples were collected from 31 provinces in China and no positive result was identified for SARS-CoV-2 antibody or nucleic acid before and after the SARS-CoV-2 outbreak in China. Through extensive testing of animal products in the Huanan market, no evidence of animal infections was found.". Victor Corman said that "it is a pity we still lack such data for SARS-CoV-2", referring to wildlife sampling he undertook in camels during his past research on MERS. He tweeted this in March 2, 2021. 

d. The Chinese government has refused to allow the lab-leak scenario to be fully investigated.

True.

AP has the exact same statement here. PBS said that "... on Tuesday, China rejected once again a call for further investigation".  NY Times said that China lacked cooperation with the WHO. China has allowed only a partial investigation, not full.  Although the WHO said in the "Declaration of interest" section of the report that "All declared interest were assessed and found not to interfere with the independence and transparency of the work", the evidence indicates that the Chinese government had a lot of control during the study.  For example, the report states that it was co-headed by Liang Wannian of the People's Republic of China, and that the Government of China was allowed to indicate objections to the list of foreign team members. Despite ocassional requests to refine the design of studies, the report says that the international team members started mainly with methods and data provided by the Chinese team. Wu Ken, Chinese ambassador to Germany said that his government has an open attitute about being investigated, but that they "reject putting China in the dock without evidence, assuming its guilt and then trying to search for evidence through a so-called international investigation". 

e. Scientists and others have developed hypotheses based on general concerns about the risks involved in live virus lab research, clues in the virus’ genome, and information from studies by institute researchers. 

Unusable in Wikipedia.

The Reuters article does not attribute specific names of scientists to this  "development of hypothesis".  We started this blog entry with the caution that news articles can be used in Wikipedia as long as their statements on scientists opinion are qualifiied by referencing more reliable sources (i.e. "According to a review published in Science, most scientists believe x", "According to a Coronavirology textbook published by Elsevier, many scientists believe y"). Failing to appeal these higher sources,  news articles can be used if at least they attribute individual opinions on particular, named sources. Even if I personally know the names of the scientists who believe there are clues of manipulation in the virus' genome, I can't edit Wikipedia with that information if it was omitted from the Reuters piece. So this is an example of a news piece that is flawed when referencing with correct balance the opinions of scientists regarding the origin of the virus.

f. A May 5, story by Nicholas Wade in the Bulletin of the Atomic Scientists, said lab scientists experimenting on a virus sometimes insert a sequence called a “furin cleavage site” into its genome in a manner that makes the virus more infective. David Baltimore, a Nobel Prize-winning virologist quoted in the article, said when he spotted the sequence in the SARS-CoV-2 genome, he felt he had found the smoking gun for the origin of the virus.



Usable in Wikipedia, but seems irrelevant anecdote.

Well, it is true that Nicholas Wade wrote that Baltimore said that.  Reuters did a good job in presenting adjacently the commentary from Andersen: "Kristian G. Andersen, a scientist at Scripps Research who has done extensive work on coronaviruses, Ebola and other pathogens transmissible from animals to humans, said similar genomic sequences occur naturally in coronaviruses and are unlikely to be manipulated in the way Baltimore described for experimentation." I wish someone asks Baltimore to expand his point.

The WHO report says that the Furin cleavage site has been found in animal viruses as well, and elements of it are present in RmYN02 and a Thailand bat SARSr-CoV. Andersen refered to the puzzling nature of this genomic insertion: "Furin cleavage sites are common in CoV, even if this is the first example we have seen in a SARSr-CoV. There are insertions in this very spot in other SARSr-CoVs too, so clearly a highly evolvable site.".  The verdict: Although puzzling, it seems that by natural recombinations it is possible for a non-sarbecovirus betacoronavirus to aquire the Furin Cleavage Site at the S1/S2 junction. 

______________________________________________________________________________

Moving on to AP, we have:

a. Arinjay Banerjee, a virologist at the Vaccine and Infectious Disease Organization in Saskatchewan, Canada, was interviewd by AP.  His position was presented as follows:

“The great probability is still that this virus came from a wildlife reservoir,” he said, pointing to the fact that spillover events – when viruses jump from animals to humans – are common in nature, and that scientists already know of two similar beta coronaviruses that evolved in bats and caused epidemics when humans were infected, SARS1 and MERS. “The evidence we so far have suggests that this virus came from wildlife,” he said

Usable in Wikipedia, but we can not tell if this is an isolated opinion or if it is representative of many other scientists

It is true that Banerjee said that.  Now to check the accuracy of his statements, one would exert caution in that news outlets tend to overestimate the certainty of a scientific claim.  The WHO report says that "The majority of emerging diseases originate from animal reservoirs and there is strong evidence that most of the current human coronaviruses have originated from animals", which supports Banerjee's opinion.  However, if we are to add caution to this factual statement, the WHO report does say in the "Arguments against" section of the animal origin hypothesis the puzzling fact that all animals found to be infected with SARS-CoV-2 were so by contact with humans rather than enzootic virus circulation. To present this evidence balances better the information so that it does not overestimates its certainty, in my opinion.

b. The case on the origin of SARS-CoV-2 is not completely closed

True.

Considering that news sources overstate how good scientific investigations are, the fact that they report this one as "not completely closed" can be trusted.

This comes from an earlier AP report published after receiving a draft of the WHO report: 

c. Mark Woolhouse, an epidemiologist at the University of Edinburgh said it was possible the source of COVID-19 might never be identified.

Usable in Wikipedia, but we can not tell if this is an isolated opinion or if it is representative of many other scientists.

It is true that Woolhouse said that. Now, on fact-checking the actual matter.  The word "cause" in this case probably means the animal source of the initial outbreak in Wuhan.  Andersen said that finding the intermediate host "often takes decades" and to say that it may never be identified is not far fetched.  The WHO report set a Phase 2 investigation to help tracing the origin of SARS-CoV-2 including analysis of trade in animals as well as interviewing farmers that supplied wildlife to Huanan market.  Woolhouse would seem to be pessimistic about the outcome of this continuation efforts to find the source, but his is a reliable position, given the propensity of news sources to overstate the certainty of results in scientific research.  

_____________________________________________________

Let's move on to analyze PBS reporting.

a. Lab leak theories are low to medium confidence within the intelligence community. 

True.

In my opinion in dealing with secret intelligence sources, it is best to cite them in Wikipedia after their investigations concludes with a final public report, or with material being declassified, otherwise they are meh.

b. There is pressure from Republicans for the Government to push on investigating the lab leak, threatening to call it weak on China

True.

We can trust PBS on this, they are neutral when reporting political information.

_______________________________________________________

Now let's move on to NY Times:

a. The two most vocal poles of the argument are natural spillover vs. laboratory leak

True.

"Vocal" means "willing to express oneself in words, esp. in many words".  We can trust that NY Times qualifying the lab leak proponents as showing willingness to express it in many words, even more than, say, frozen-food hypothesis proponents. Nonetheless, this factual claim is probably limited to a Western world or US audience.   

b. Akiko Iwasaki, an immunologist at Yale University, said that "In the beginning, there was a lot of pressure against speaking up [about the uncertainties about the origins of the virus.], because it was tied to conspiracies"

Usable in Wikipedia, but seems irrelevant anecdote.

It is true that Iwasaki said that.  Now, to actual fact-check the existence of these pressures, I found nothing either in the WHO report or in the experts tweets (Andersen and Corman).  Until repeated by other sources, this remains irrelevantly anecdotal, in my opinion.

c. There is no direct evidence for the “lab leak” theory that Chinese researchers isolated the virus, which then infected a lab worker.

True.

The WHO report says "There is no record of viruses closely related to SARS-CoV-2 in any laboratory before December 2019, or genomes, that in combination could provide a SARS-CoV-2 genome". It also says that the three laboratories in Wuhan had high quality biosafety level (BSL3 or 4) facilities. Andersen changes the "direct" qualifier to "scientific" and says "To this day, we have yet to see any  scientific evidence  supporting a lab leak".

d. Researchers have been able to reconstruct some of the evolutionary steps by which SARS-CoV-2 evolved into a potential human pathogen while it was still infecting animals. 

True.

According to the WHO report, one of these reconstructed steps was that RaTG13 was found to have 96.2% genetic similarity with SARS-CoV-2. However, they qualify that by saying "Although SARS-CoV-2 is closely related to RaTG13, only one of the six critical amino acids sites [in the RBD of the S protein] is identical between the two viruses. A second step was that pangolin viruses were found to have some of the parts needed to complete the evolution, but the WHO summarizes the results from this line as inconclusive by saying "Although some researchers thought these observations [similar amino acids to the RBDs of pangolins] served as evidence that SARS-CoV-2 may have originated in the recombination of a virus similar to pangolin-CoV with one similar to RaTG13, others argued that the identical functional sites in SARS-CoV-2 and pangolin-CoV-GDC may actually result from coincidental convergent evolution". Andersen summarized the advances on the reconstruction of the evolutionary history of SARS-CoV-2, in this tweet: "The 'natural' version of this actually has a lot of evidence to it by now - we continue to see more and more of the pieces that make up the puzzle of SARS-CoV-2's evolutionary origin. The problem is - it's a big puzzle.". If the puzzle is big and the main reconstructed steps have not been conclusively determined, we should be cautious to say that a lot of  progress has been made on this front, in my opinion. 

______________________________________________________

Finally, let's breakdown information from CNN:

a. Three huge coincidences foster the lab leak theory (proximity of the lab, speculations of workers that fell sick near the time of the outbreak, that Wuhan's CDC moved in early December 2019)

True

The key here is that CNN is editorializing and that they are not saying that the coincidences hugely foster the lab theory (notice the order of words), but that they are huge coincidences and they foster it. "Huge" is editorializing and can be used in Wikipedia with attribution.  

The WHO report mentions the proximity and the moving of the Wuhan CDC in a section called "Arguments in favor" of the lab leak hypothesis.  The report is rather sparse about these arguments and does not qualify their magnitude.  

b. It is likely that China is hiding something related to hospital samples from 2019 or similar type of evidence

True.

Again, this is CNN editorializing, so that this statement can only be used in Wikipedia with attribution.  Neither the WHO report or Andersen have talked about China hiding anything. CNN expands by saying that the WHO team "admits they would like access to more material".

c. The WHO investigators share the conclusion of most specialists in this field: that the disease most likely came from bats, via another species, known as an "intermediary animal," and then infected humans.

True.

The WHO report says "likely to most likely", though, which is not exactly the same wording.

d.  The virus' transfer or spillover in nature, is the vast preponderance of scientific research on the subject to date.

Unusable in Wikipedia.

When a news sources qualifies the extent of agreement on the scientific community, it can only be used if a) it cites a stronger source or b) it is attributed to the opinion of a particular scientist that represents that position.  Neither is present here.

In conclusion, Wikipedia has had a hot topic of debate on how to present the lab leak hypothesis.  On one hand, some editors believe only scientific papers vetted by secondary sources are reliable sources of information on the origin of SARS-CoV-2.  On the other hand, other editors, have pushed to open the door to using news sources in the SARS-CoV-2 related pages of Wikipedia.  In this blog post I rigurously present a middle ground, in which news sources are curated by examining their main flaws, and leaving only a subset of quotes to be true and usable in Wikipedia.  

No comments: