Why India needs another statistical revolution
The Indian statistical system is failing to fulfil the needs of 21st century policymaking.
(From left) V.K.R.V. Rao, D.R. Gadgil and P.C. Mahalanobis—members of the first National Income Committee set up in 1949. The edifice built by Mahalanobis was designed for a command-and-control era
The Indian state has often been a source of bewilderment to observers and analysts, and a source of frustration for its citizens who marvel at the state’s ability to perform complex tasks—such as running a successful space mission—while failing to perform many basic tasks such as ensuring the survival of newborns. Perhaps no other organ of the state embodies this paradox as starkly today as the Indian statistical system.
In many ways, the Indian state is perhaps much more data-rich than it has ever been before, with detailed biometric data about an overwhelming majority of its citizens available through the Aadhaar system. It will soon have detailed records of company-level commercial transactions through the Goods and Services Tax Network (GSTN). Yet, in many other ways, its statistical systems are failing to fulfil the needs of 21st century policymaking, with even
basic data on industrial output, and
gross domestic product, or GDP, being routinely questioned by policymakers and analysts.
The state’s ability to amass and collate administrative data seems to grow by the day. Most big schemes of the government, such as the Mahatma Gandhi National Rural Employment Guarantee Scheme (MGNREGS) and the Jan Dhan Yojana today collect and disseminate detailed disaggregated data. Yet, the once-famed statistical bodies, the Central Statistical Organization (CSO) and the National Sample Survey Office (NSSO), seem to struggle to produce timely and credible data on crucial aspects of the Indian economy and society, which
imperil policymaking (
), and has made
life difficult for investors.
What explains this paradox?
To understand the paradox, we first need to turn back a few pages of history.
The history of statistics in India is in some ways a history of the Indian nation-building project itself. Although collection of economic statistics in the country is recorded as far back as Kautilya’s Arthashastra itself, modern statistics began life in the country only during the British Raj. India’s colonial rulers were interested in collecting statistics relating to taxes, trade and incomes, not only to govern and control their subjects, but also to provide themselves an account of how beneficial their reign was. But this provided an opportunity to nationalists such as Dadabhai Naoroji to challenge their claims.
The thrust of Naoroji’s argument was that the British Raj was draining India’s resources and impoverishing Indians. As T.N. Srinivasan, professor emeritus of economics at Yale University, pointed out in a 2007 Economic and Political Weekly (EPW) article, Naoroji’s attempt to show that India’s gross output was hardly sufficient to provide subsistence to its population led him to calculate a subsistence-based “poverty line” at 1867-68 prices, though Naoroji did not use that phrase himself.
Based on the subsistence diet prescribed by the British government medical inspector for the “emigrant coolies”, Naoroji came up with a poverty line varying from Rs16 to Rs35 per capita per year in various regions of India. Naoroji’s “drain theory” as well as his estimates of poverty came to be used by nationalists in India to attack the British Raj. A similar argument was developed in a much more comprehensive manner by Bombay University economists K.T. Shah and K.J. Khambata in their 1924 work, Wealth and Taxable Capacity in India. Some years later, a young Cambridge University economist from India, V.K.R.V. Rao, would provide a more rigorous estimate of national income in India.
The first statistical revolution
While these estimates helped stir debate and discussion on India’s economic backwardness, it was clear that even the best of these estimates were tentative and built on a weak statistical edifice.
Indian leaders such as
Jawaharlal Nehru and
B.R. Ambedkar, as well as corporate giants such as the Tatas and the Birlas, all agreed that the country needed centralized economic planning. But the absence of sound data emerged as a serious impediment to planning.
One of the first tasks of newly independent India was thus to reassess the size and nature of the Indian economy. Rao was an obvious candidate who could help in this task. So was economist D.R. Gadgil, who had begun conducting socio-economic surveys at the Gokhale Institute of Politics and Economics in Pune (then Poona).
But the man chosen to head the mission was statistical genius P.C. Mahalanobis. A physicist by training, Mahalanobis had already won global repute, and garnered attention from the statistical community, thanks to the pioneering large-scale surveys being conducted at the Indian Statistical Institute (ISI), which he had set up at Kolkata’s Presidency college. All three of them joined hands to produce a voluminous report on India’s national income. All along, the authors were careful to point out data gaps and limitations of their estimates, as well as the error margins associated with each sectoral estimate.
All three also played pivotal roles in the establishment of a new statistical edifice in the country, which would help circumvent the problems they had encountered, and in a few years, become
the envy of the world. Of the three, it was Mahalanobis who played the leading role in creating a new statistical edifice for India. Mahalanobis had a grand vision of statistics in the newly independent country, which he outlined in an oft-quoted speech,
Why Statistics?, at the 1950 session of the Indian Science Congress, and which he set about to achieve with the highest level of political support.
Although remembered today largely as the architect of India’s five-year plan model, Mahanalobis, as the honorary statistical adviser to the cabinet, had a greater contribution in building a new statistical architecture for the country. He helped establish CSO, the National Sample Survey (NSS) and the Annual Survey of Industries, all of which were run from ISI in the early years. While he helped set up CSO, the thrust of his activities was directed towards establishing the nascent surveys, which were still being viewed with suspicion by many.
To establish the credibility of these surveys, he invited some of the pioneers of statistics to review the work done at ISI. The first review committee of NSS included such intellectual giants as R.A. Fischer, M.H. Hansen, T. Kitagawa, A. Linder and F. Yates. Their opinion was not entirely uncritical but it noted in its report that in the matter of sample surveys, “those outside India must expect to have more to learn than to teach”.
What perhaps helped Mahalanobis the most in achieving his vision was the trust placed on him by Nehru. In his 1998 book, The Idea of India, political scientist Sunil Khilnani argues that in courting intellectuals such as Mahalanobis, Nehru’s aim was to “subordinate the civil servants to the superior rationality of scientists and economists”.
Given the paucity of administrative data, and the possibility of biases creeping in, the strategy Mahalanobis envisaged in his notes to the Nehru cabinet focused on creating credible data sets based on representative sample surveys, economist Ashok Rudra writes in his biography of Mahalanobis.
In doing so, Mahalanobis might have contributed to the weakening of the Indian administrative system’s statistical capabilities, wrote S.M. Vidwans, in a three-part series on India’s statistical system published in three successive editions of EPW in 2002. Vidwans, a former head of the directorate of economics and statistics of Maharashtra, argued that the process of centralization of statistical systems began in the 1950s itself, and in the years to follow, this emerged as a key weakness of the Indian statistical system.
The edifice begins to crumble
The Mahalanobis model of data collection did create new statistical institutions, which inspired similar institutions all over the world, but it did so at the cost of developing statistical expertise within the administrative system. This problem worsened since the 1980s, according to Vidwans. Gradually, many of the data collection processes were phased out at the administrative level, and the centralized agencies acquired sole control over vast swathes of the Indian statistical system. Instead of being used sparingly for purposes where there were no alternatives to sampling, sampling became the first choice of technique for collecting data.
As long as statisticians were in charge of the statistical system, changes to the system were at least based on technical requirements. But this changed in the post-Mahalanobis era. Through a process of “creeping change”, administrators took over the responsibility of the Indian statistical system from statisticians, pointed out Vidwans. The composition and structure of the governing council of NSSO were sought to be changed to accommodate more insiders at the cost of having a wide cross-section of data users in the council, before the council was eventually “dissolved” in the mid-2000s. Even CSO was weakened as bureaucrats of the ministry of statistics and programme implementation (Mospi) sought to undermine the role of its director general and neglected institutional mechanisms that had earlier allowed CSO to coordinate with other government bodies and ministries, argued Vidwans. And there was no one of the stature of Mahalanobis who could stem the tide.
Over the past two decades, multilateral agencies, such as the World Bank and the International Monetary Fund, began demanding new statistical inputs from Indian statisticians to fulfil their global mandate, which imposed new responsibilities on the system. Given the gradual decay of statistical systems within administrative departments, the bulk of the responsibility for data collection fell on NSSO. While NSSO was expanding its role and beginning to conduct newer surveys to meet the new data dissemination norms, it faced cuts in budgets that made it difficult to fill regular posts, wrote Sheila Bhalla in a 2014 EPW article. After the liberalization of the economy, most ministries including Mospi found it difficult to expand their workforce as a sweeping wave of staff-strength rationalization and contractualization swept through government departments. A new centralized recruitment strategy made it difficult to find enumerators who were proficient in local languages, affecting survey work.
The edifice built by Mahalanobis was designed for a command-and-control era when it was easy for the state to demand and receive information from companies. But the opening up of the economy showed that the edifice was ill-suited to a market economy, in which the state no longer controlled all aspects of production and trade.
An opportunity to reform the statistical system arose when the National Statistical Commission (NSC) was instituted to suggest changes to improve its functioning. NSC, headed by C. Rangarajan, made several important recommendations in its report in 2001 to reform the structure of the statistical system, and to improve data collection methods. But as Srinivasan pointed out in a sharply worded critique, “its failure to offer any methods for judging the adequacy, timeliness and accuracy of statistical data and to undertake cost-benefit analyses of its concomitant recommendations undermines the utility of its work”.
“Without such information, how can the government decide how to apportion its scarce resources among competing priorities?” he asked.
Srinivasan’s concerns turned out to be prescient as successive governments have ignored most of the substantive recommendations of NSC relating to improvements in the statistical system’s data collection capacity. Apart from providing for staff incentives and the setting up of a national statistical regulatory agency, little else has been attempted.
An uncertain future
Even though changes in regulations, new survey initiatives and increasing digitization have provided the Indian state with far more information than ever before, there is no evidence that such data is being collected and processed efficiently. One glaring example is the use of
the MCA-21 database for GDP estimation. Statisticians seemed to have decided to use the database first, and ask questions later, and have failed to release the detailed data or the summary tables till date. A similar problem seems to plague the Socio-Economic Caste Census (SECC). This census was potentially a far-more important initiative than Aadhaar, as it was initiated to precisely identify those who could be targeted for welfare schemes. But as several economists have pointed out, the SECC data suffers from serious flaws. While it is a step forward compared to the past, it is also an opportunity lost. Instead of examining the processes that led to the collection and processing of the SECC data, the government first announced that it is going to use the database, and then set up a committee, headed not by a statistician but an ex-bureaucrat, to examine the data.
The digital age raises newer challenges. The questions of which entity will store data in what form and with protections, how and when such data will be collected, used, shared, or disseminated have become much more important than ever before.
The need for coordinating the activities of different data-collecting entities, and for laying down norms with foresight has never been as great as it is today.
But can we expect our beleaguered statistical system to rise to the challenge? Can we expect it to show foresight, and burnish its credibility by inviting outside experts rather than insiders to review its work?
The answers to these questions will determine the kind of data society and economy we will face in the coming years