It has long been speculated that the actual mortality due to COVID pandemic is much higher than official numbers, not only in India but in several other countries. New York Governor Andre Cuomo has been caught repeatedly fudging mortality data. The UK government too was accused of under-reporting COVID deaths in the first wave of 2020. A study in Brazil suggested under-reporting of COVID deaths by approximately 40%. Several articles have speculated that India is massively under-reporting COVID-related deaths. Admittedly none of these are based on any hard data. The latest of these studies is from two researchers from Harvard and noted economist and former advisor to the Government of India Arvind Subramanian has also lent his name to it. The study claims that India has under reported its COVID deaths by as much as fifteen times and that the actual toll may be as high as six million and not 4.19 lakh. This is an extraordinary claim that has been widely reported across national and international media over the past few days.
It is certainly possible that deaths have been recorded but not mentioned as COVID, and there are several valid reasons for this to happen. Many patients passed away before being tested or tested negative despite clinical signs and symptoms suggesting COVID. Such deaths are generally recorded as SARI (severe acute respiratory illness) or pneumonia without a definite mention of COVID in view of negative test or lack of confirmed diagnosis. Researchers therefore try to calculate “excess deaths” during the year as compared to previous years. Some degree of under-reporting is inevitable due to the abovementioned reasons but a special claim of under-reporting to the tune of ten to fifteen times needs to be based on at least some concrete data and requires collusion at several levels, from treating doctors to the highest levels of state and central government. As a practising doctor working on the COVID frontlines since over a year, and knowing that health is a state subject with several different parties ruling various states, I can vouch for the impossibility of such a grand conspiracy.
Any study or research paper that makes extraordinary claims must be examined in detail. I was curious to see how exactly the Harvard team and Arvind Subramanian came to their conclusions, and read the paper in its entirety. It can be found here. For any scientific paper to be taken seriously, it must:
- Not make pre-suppositions or special pleadings
- Be based on good quality of data
- Have a robust methodology
Unfortunately, the paper fails on each of these points. It has several pre-suppositions and special pleadings, is based on bad data, and has questionable methodology.
Pre-suppositions and special pleadings
The paper states upfront: “India’s official Covid death count as of end-June 2021 is 400,000.1 The reality is, of course, catastrophically worse.” (emphasis added). This is clearly a pre-supposition. A genuine researcher never pre-supposes, as it biases the study. The researchers suspect a higher death toll as India has had 0.3 deaths per capita as compared to over 3 for Mexico and Peru, and around 2 in Brazil, Italy, US, and UK. The researchers claim, wrongly, that infection rates are lower in these countries as compared to India. As a percentage of population, Brazil, US and UK have had far more confirmed cases of COVID compared to India.
Also, India’s mortality rate due to COVID is low, but not extraordinarily low. In a list of 200 countries, India’s mortality rate stands at 124 with several first-world countries and third-world countries clocking a lower mortality. The assumption that India is under-reporting because Brazil, Italy, US and UK have a higher mortality rate is deeply problematic and flawed and is a case of “special pleading” not backed by any proof or data. India has a much younger population than many of these countries, and it is perfectly logical that India would have a lower mortality than them as COVID mortality is highest in the 60-plus age group which is only 12% of its population.
Data Sources
The researchers rely on informal data sources in their paper. Primarily these are the sero-prevalence studies conducted by AIIMS and WHO, and the consumer pyramid household survey (CPHS) produced by the Center for the Monitoring of the Indian Economy (CMIE) in which one of the questions the interviewed person is asked is whether any family member had passed away in the previous six months. Both of these data are very unsuitable for making any such calculations.
Extrapolation from CPHS and SRS Data Sets
The researchers have relied in large part on the CPHS data set for drawing their conclusions. Admittedly, the CPHS data set is deeply problematic as the year 2019 shows an unusually high spike in mortality compared to the 2015-18 period. The pandemic did not exist in 2019. However, this does not deter the researchers from still continuing to rely on this data set and they call the 2019 findings problematic instead. Noted economist Ms Shamika Ravi, Senior Fellow, Brookings Institution, explains, quoting in part the researchers themselves, why the CPHS data is not a suitable or reliable source of data for determining actual mortality. In another article, Ms Shamika Ravi points out that due to better information gathering systems implemented by the Central Government over the past five years, the government’s Sample Registration System (SRS) data of 2019 shows a significantly higher number of deaths than that of 2015-18 average. As of 2019, an estimated 92% of deaths are being recorded by the SRS compared to around 70% earlier, and this number is expected to rise further. Any interpretation based on extrapolation of the 2019 data would end up showing over a million excess deaths over 2015-18, even at a time when there was no COVID pandemic. Hence, the use of SRS data in comparison to previous years is very unreliable. The researchers have acknowledged the severe problems with the data used by them but have still gone ahead to make sensational claims based on it, which is very unfortunate.
Extrapolation from antibody data in serosurveys
The researchers have taken the midpoint of the first and second waves, correlated the numbers with the WHO-AIIMS serosurvey of the same period and extrapolated it to deaths due to COVID by applying American CDC’s infection fatality rate (IFR) to the serosurvey data. This is a very questionable methodology, to say the least. It is well known that subclinical spread of SARS COV2 is far wider than actual number of diagnosed cases. For example, approximately 12% of the population of UK tested positive for COVID antibodies in December 2020. Extrapolating to absolute numbers, it suggests that 79 lakh people were exposed to SARS COV2 in UK at a time when the total number of actual cases was approximately 17 lakhs. If we are to calculate mortality based on 79 lakhs, it would be approximately 450% that of the actual mortality declared by the UK government at that time. Consider Mumbai, which has a seropositivity of approximately 60%. If we extrapolate this to the population of Mumbai, we would get a mortality figure running into several lakhs against the actual number of around 12,000. This should explain why it is foolhardy to make assumptions based on extrapolation of serosurvey data to IFR. Such a calculation is bound to show up highly inflated figures that are far removed from ground realities.
Standard Error in calculating excess mortality
For their forecasting, the researchers have taken different standard errors based on age group. The purpose of standard error is to estimate the deviation from the mean. A deviation of 3 would be sufficient to cover 99.7% of the target population under the standard Bell curve. The standard error indicates how far the target sample result is from the actual mean. In the concerned paper, the data table has the standard error of different population groups in brackets.
There are severe problems with this table. It uses the deeply flawed CPHS data, applies IFR of the US CDC to it and extrapolates from serosurvey results to calculate the output results, which are literally all over the place. Standard error in various age groups ranges from 0.21 for baseline urban to as high as 53.97 for over 80 population during second wave. A wide standard error means the sample does not represent the population being studied, and these are astronomically high numbers. Ideally, the standard error should not exceed 2, but in all the higher risk groups, it is noted that the standard error ranges from 4 to almost 54, clearly showing that the displayed result does not accurately represent the sample populations. Conveniently, the fact that the bracketed numbers represent standard error is tucked away inconspicuously in the text of the paper. It is very hard to take these results seriously as they are not representative of the samples being studied, which are based on bad data in the first place. One would be forced to wonder about the motives of brutally torturing data in this manner to force it to conform to the presuppositions of the paper.
Actual Under-reporting of Mortality
Some degree of under-reporting of COVID related mortality has indeed happened, and several states have added previously undeclared numbers to their mortality data. A prominent media house has claimed credit for having forced state governments to declare their mortality numbers honestly by their ground reportage, but several states that updated their mortality data were not “exposed” by this media house. Kerala failed to declare 6000 confirmed COVID deaths and their official toll is approximately 15,000 but Subramanian et al have still praised the state as having a low mortality due to better public healthcare infrastructure in their paper. Despite being lauded by many for allegedly reporting numbers honestly, Maharashtra added nearly 13,000 previously undeclared deaths to their tally last month. This additional number is 10% of the total death toll of 1.31 lakhs. Madhya Pradesh also added 1478 deaths to their tally, about 15% to its overall mortality is 10,512 as of 22nd July 2021. Gujarat has been accused of massively under-reporting COVID deaths as the state had appointed an audit team that certified every confirmed death only after conducting an inquiry, but West Bengal which followed exactly the same procedure did not face any such accusations. Clearly, the discussion on a very serious topic has been intensively politicised, and it is unfortunate that the researchers are knowingly or unknowingly contributing to further politicising what should have been a completely apolitical discussion.
Concluding Remarks
India does have a historic problem of reporting mortality due to the myriad data collection agencies in play. However, the Central government has been making substantial efforts at reducing the gap between reported and actual mortality and brought it down to under 10%. An exact number for the mortality due to COVID will never be known due to reasons outlined in the above article. We will at best have an approximation. However, any claims of a difference of six to ten times and millions instead of lakhs must be made on the back of reliable data and robust methodology. The Arvind Subramanian – Harvard paper, by using a flawed data set and extremely questionable methodology to draw its conclusions, does neither and hence it “fails the smell test” of objectivity in scientific research.