AGU RESEARCH

Columns that reveal the world
- Getting up close and personal with the researchers -

In the world we live in,
From issues close to us to issues that affect all of humanity,
There are many different problems.
The current situation and truth that are surprisingly unknown,
Our proud faculty members offer interesting insights
We will reveal it.

  • Faculty of Economics Department of Economics
  • Statistics and today's data science are rapidly spreading in modern society.
    Statistical literacy required
  • Associate Professor Tamae Kawasaki
  • Faculty of Economics Department of Economics
  • Statistics and today's data science are rapidly spreading in modern society.
    Statistical literacy required
  • Associate Professor Tamae Kawasaki

New talent required as usage scenarios expand

My field of expertise, mathematical statistics and multivariate analysis, is also used in data science, and is currently an academic field that is of great social interest. Overseas, data scientists have been attracting so much attention for about 10 years that they have been described as "the sexiest job of the 21st century" (*), and in Japan, too, it seems that the number of people who want to use data analysis in business strategies and the like has rapidly increased, especially in recent years, as many people from companies have been attending statistical-related academic conferences. The background to this is probably the increase in business books such as "statistics is the most powerful science" and introductory books that are easy for the general public to pick up, as well as the wide range of applications for languages that can be used for data analysis, such as Python and R, which can now be easily used on ordinary PCs.

 

There is no doubt that statistics is a very powerful tool for business and the public. However, if the data is not handled and the analysis results are not interpreted with the utmost care, it can be very dangerous. If the data is analyzed correctly, perspectives that were not visible before can become visible, but on the other hand, if the data is not processed and analyzed using the appropriate method, incorrect results can be derived. To do this, it is necessary to correctly understand each method, and it is also important to be able to determine whether the analysis satisfies each assumption. If you handle statistics carelessly with an insufficient understanding, you may stray from the conclusion that should have been drawn and take the wrong path.

Currently, this is an academic field that is attracting a great deal of attention from an economic perspective, but it is also a field that involves such risks. For this reason, it is important to develop human resources who have an economic perspective, the necessary mathematical knowledge and thinking, and can perform appropriate analysis. Data scientists who are familiar with mathematical statistics and economics and can link the two at a high level are undoubtedly the kind of people that will be needed in the future society. As university professors, we have a responsibility to develop such new human resources while aiming for academic development. I am currently focusing on research into "missing values" in statistics, and through this research, which is essential for the further development of statistics and data science, I hope to establish new analysis methods and also nurture many young people.

*Harvard Business Review October 2012: “Data Scientist: The Sexiest Job of the 21st Century” Thomas H. Davenport and DJ Patil

Missing value processing technology has become increasingly important in recent years

Data science also uses a statistical method called "multivariate analysis" that analyzes and interprets multiple variables simultaneously. For example, in a survey on people's health, various items such as "age, sex, height, weight, eyesight..." are collected in addition to physical condition. Each of these items is treated as a variable, but if you analyze only "height" as data without looking at other variables, it will be difficult to obtain the intended analysis results on people's health. Multivariate analysis analyzes variables in a complex manner, and can derive analysis results such as what tendencies can be found in certain types of people.

However, while it would be ideal if all the data could be collected smoothly, some data will be lost, for example, if someone forgets to answer a question. This is called "missing values." Roderick JA Little and Donald B. Rubin have proposed the following three main mechanisms for how missing values occur:

The first, "MCAR," is a case where, for example, in a questionnaire survey, a person does not answer a question due to an oversight. In this case, the probability of missing values occurring is completely random, not dependent on the variable in question or other variables. The second, "MAR," is a case where missing data does not occur completely randomly, but is dependent on other variables and can be explained by those variables. For example, in data on income, the probability of not answering questions about income increases as people get older, so the data contains a variable related to age. In this case, missing data on income can be explained as occurring randomly with age as a condition. And the third, "MNAR," is a case where, in the previous example of income, income and age are not dependent, and people with higher incomes are more likely to not answer questions about income. In other words, the data does not contain a variable that can explain missing income.

When such missing values exist, even if they are "MNAR", data analysis cannot be performed unless the missing values are dealt with somehow, and the value of the data cannot be found. Of course, there is the option of discarding the entire part containing the missing values, but this would be a waste of time and money spent on aggregating the data, and it is not possible to re-collect the data multiple times, so how to fill in the missing values or how to handle data containing missing values is a major issue in modern data science. The graph below proposes that statistical methods can be used for data containing missing values, and the approximation accuracy can also be improved by adding a mathematical approach to achieve higher approximation accuracy.

In clinical trials of pharmaceuticals, the occurrence of missing values is inevitable, such as when a participating patient is unable to continue the trial due to the effects of their disease or for some other reason, so the handling of missing data is also described in the guidelines. In the future, anyone who handles data, not just in the pharmaceutical field, will need to have knowledge of how to handle missing values.

 

Simulation results for data with missing values

If the sample size is large enough, the curve approaches the red curve. By adding a mathematical approach modification (pink) to the proposed method (blue), the accuracy of the approximation can be improved even if the sample size is not large enough.

The Importance of "Statistical Literacy" Built on Mathematics, Economics, and Ethics

Today, as the need for data science expands, the issue of missing values is also becoming more important. I myself participated as a member of the Ministry of Land, Infrastructure, Transport and Tourism's review committee on the processing of statistical surveys, and I felt that interest in the processing of missing values has increased even more. Perhaps a major factor is that society has begun to realize the importance and usefulness of statistics. This is of course because there is value in the analytical results and interpretations derived using statistics, but it is also because people have begun to recognize the importance of facing data in order to make use of it.

Statistics is an application of mathematics, so there is no room for emotion or subjectivity. In data science, it is important to maintain this attitude from data collection to interpretation of the analysis results. You may have worked hard to collect data and performed complex analysis, but you may have come to the conclusion that "nothing can be said." However, that may be because the data you used could not say anything. If you have knowledge and knowledge of statistics and mathematics for handling data, you will be able to review your research plan, how you collect samples, and how you analyze the data. Just because you did not get the results you expected, you should not resort to subjective interpretations.

As expectations for statistics rise and opportunities to use statistics rapidly increase, I feel that such "statistical literacy" is becoming necessary. In addition to not bringing subjectivity into analysis, if statistics are not approached with awareness of how data, including personal information, is collected and managed, it can develop into legal and ethical issues. It is important not only to have a strong interest in the methods and results of data analysis, but also to have knowledge, sensitivity, and imagination about the various things that lie beyond. Without these, data analysis may become nothing more than a processing task.

Statistics is an academic field, but I believe its ultimate goal is to have it used in society. Having focused my research on statistical theory, I am now teaching at College of Economics because I have a strong interest not only in mathematical pursuits, but also in the role of statistics in society. Theoretical approaches from mathematics and economic approaches to society. And statistical literacy based on objectivity and ethics. By acquiring these foundations, you will be able to use statistical data analysis to benefit society.

Handling missing values and multivariate analysis methods are merely "tools." After mathematically analyzing data and interpreting it from a social and economic perspective, how can we use statistical literacy to apply the results of the analysis? I believe that this entire process is "data science." While understanding and pursuing detailed methods, I hope to use statistics to benefit society without losing sight of the bigger picture. (Published in November 2022)

Related articles

  • "Data Science as Liberal Arts" by Seiichi Uchida, Yoshinori Kawasaki, Daisuke Kochu, Jun Sakuma, Hiroshi Shiina, Hiroshi Nakagawa, Tomoyuki Higuchi, and Hiroshi Maruyama, edited by Genshiro Kitagawa and Akimichi Takemura (Kodansha: 2021)
  • "Introduction/Exercise Mathematical Statistics" by Kazuo Noda and Etsuo Miyaoka (Kyoritsu Publishing: 1990)
  • "Introduction to Multivariate Analysis" by Hideyuki Douke and Tsunehisa Imada (Tokai University Press: 2001)

Study this topic at Aoyama Gakuin University

College of Economics Department of Economics

  • Faculty of Economics Department of Economics
  • Associate Professor Tamae Kawasaki
Link to researcher information

Related Keywords

Related Content

  • Faculty of Social and Information Studies
  • Let's use the economic effects of tourism to help revive the economy
  • Professor Toru Nagahashi
  • "Tourism" is enjoyed as a leisure activity. While unraveling the history of tourism, we look at how tourism has become a form of "trade" and a major industry that supports the country. What should Japan do in the future to become a "tourism-based nation"? (Published in 2012)

  • Faculty of Economics
  • Predicting population distribution for the next few decades by block and district
    Urban planning and disaster prevention planning
    Providing the underlying data
  • Professor Takashi Inoue
  • Foreseeing changes in population structure over the long term is essential for national and local government policymaking. This is all the more true in Japan, where population decline is accelerating. However, detailed population estimates have been extremely difficult due to technical barriers. The smaller the estimated area, the more likely it is that numerical fluctuations will be reduced. Professor Inoue applied a certain classical theory to devise a groundbreaking equation, enabling him to estimate future populations for each small region across the country. In this column, we take a closer look at the researcher and explain the details of his new methodology. (Published in 2021)

  • 国際政治経済学部 国際政治学科
  • 掲載日 2025/05/16
  • 紛争・戦争の因果関係をひも解き、類似した事案に生かせるロジックを導き出す
  • 泉川 泰博 教授
  •  人類が紛争や戦争を繰り返してしまう原因や、さまざまな行為者が国境を越えて協力する理由について、解明しようとする学問が「国際関係理論」だ。泉川泰博教授は、アメリカ留学時代から国家間の同盟政治・敵対関係などに着目し、今日に至るまでさまざまな切り口で研究を続けてきた。欧米や東アジアの情勢が急激に変化する中で、その研究内容は海外からも大きな注目を集めている。
    (2025年公開)

Related Content

  • Faculty of Economics
  • Nuclear power issues from an insurance perspective
  • Professor Emeritus Terumitsu Homma

  • Faculty of Social and Information Studies
  • Let's use the economic effects of tourism to help revive the economy
  • Professor Toru Nagahashi

  • Faculty of Business Administration
  • Exploring better relationships between companies and people in organizations
  • Professor Masaru Yamashita