Volume 7 , Issue 6 , December 2021 , Pages: 132 - 138
Racial Filtering Classification Model Through Data Analysis of Racial Contents in Twitter
Jung-hun Baeck, St. Mark’s School, Southborough, United States
Teresa Hyoju Chang, Seoul International School, Seoul, South Korea
Jaden Chunho Chyu, Phillips Academy Andover, Andover, United States
Bryan Chunwoo Chyu, Phillips Academy Andover, Andover, United States
Chaehyun Lim, McLean High School, McLean, United States
Received: Sep. 23, 2021;       Accepted: Oct. 20, 2021;       Published: Nov. 10, 2021
DOI: 10.11648/j.ijdsa.20210706.11        View        Downloads  
Abstract
Stop Asian Hate or Stop Asian American Pacific Islanders (AAPI) Hate refers to the national movement against racially-motivated attacks on Asians. This protest was initiated in line with the Black Lives Matter (BLM) movement, to dismantle the ongoing hate and targeted crimes against Asians, and to educate people of such threats. Hate crimes targeting Asians have been occurring steadily across the U.S, but with the effect of COVID-19, these crimes started increasing in number. For the Stop Asian Hate movement, the matter was exacerbated with people accusing certain Asian countries as the source for COVID-19. In 2021, Asian Americans reported a single biggest increase in serious incidents of online hate and harassment with racist and xenophobic slurs blaming people of Asian descent for the spread of COVID-19. To specifically assess the impacts and measures of each movement, research was conducted to examine the racial slurs used towards Asians on social media, specifically Twitter. For analysis of the data on social media, Python programming was used to collect and analyze the ratio of racial slurs and Anti-Asian hate. In doing so, the data set was modeled through data labeling, which classified each social media tweet into one of three sub-categories. Data were classified into two types: type 1 that contains racial contents or information against Asians and type 0 that has non-racial contents. The data collection was done through Twint, a Python scraping tool for Twitter, gathering a total of over 2,000 recent tweets for keywords relevant to the movement. Then, a preprocessing step was taken through Python, involving the process of decapitalizing, lemmatizing, and tokenizing. These data were then represented by graphs and word clouds, displaying some of the most commonly used terms targeting Asians on social media. Lastly, the data went through a design of a binary classification model for filtering tweets with racial content. We compared the accuracy of classification models with three different algorithms: logistic regression, random forest, and SVM. The model created would be able to safeguard users from exposures to racist terms vastly pervaded on the internet.
Keywords
Data Science, Machine Learning, EDA, Stop Asian Hate, COVID-19
To cite this article
Jung-hun Baeck, Teresa Hyoju Chang, Jaden Chunho Chyu, Bryan Chunwoo Chyu, Chaehyun Lim, Racial Filtering Classification Model Through Data Analysis of Racial Contents in Twitter, International Journal of Data Science and Analysis. Vol. 7, No. 6, 2021, pp. 132-138. doi: 10.11648/j.ijdsa.20210706.11
Copyright
Copyright © 2021 Authors retain the copyright of this article.
This article is an open access article distributed under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
References
[ 1 ]
Muzhir Al. (18 Feb. 2021) “Rewview Study on Sciencedirect Library Based on Coronavirus COVID-19”, UHD Journal of Science and Tehcnology. 4 (2): 46.
[ 2 ]
Tessler, H., Choi, M., & Kao, G. (2020, June 10). The Anxiety of Being Asian American: Hate Crimes and Negative Biases During the COVID-19 Pandemic. American Journal of Criminal Justice.
[ 3 ]
Gover, A., Harper, S., & Langton, L. (2020). Anti-Asian hate crime during the COVID-19 pandemic: Exploring the reproduction of inequality. American Journal of Criminal Justice, 45 (4), 647-667.
[ 4 ]
John. Jame, Satt. David. Otton. Kylie (2021). “COVID-19—Related Assault on Asians: Economic Hardship in the United States and India Predicts Diminished Support for Victim Compensation and Assailant Punishment.” Int. J. Environ. Res. Public Health 2021, 18 (10), 5320.
[ 5 ]
Lan. B, Wenger. Mar (20 August 2021). “Are Asian Victims Less Likely to Report Hate Crime Victimization to the Police? Implications for Research and Policy in the Wake of the COVID-19 Pandemic.” Crime & Delinquency.
[ 6 ]
Carney, Nikita. (2016). All Lives Matter, but so Does Race: Black Lives Matter and the Evolving Role of Social Media. Humanity & Society. 40. 10.1177/0160597616643868.
[ 7 ]
H. Aggie, Ku. Karen, S. Eleanor, and Var. Edward. (2021) “Asian Americans’ Indifference to Black Lives Matter: The Role of Nativity, Belonging and Acknowledgment of Anti-Black Racism.” Soc. Sci. 2021, 10, 168. https://doi.org/10.3390/socsci10050168.
[ 8 ]
Yam, Kimmy.(28 April 2021) “New Report Finds 169 Percent Surge in Anti-Asian Hate Crimes during the First Quarter.” NBCNews.com, NBCUniversal News Group. www.nbcnews.com/news/asian-america/new-report-finds-169-percent-surge-anti-asian-hate-crimes-n1265756.
[ 9 ]
Times, Global. “Online Discrimination.” Global Times, www.globaltimes.cn/page/202104/1220979.shtml.
[ 10 ]
Johnson, Joseph. “U.S. Teens Hate Speech Social Media by Type 2018 l Statistic.” Statista, 25 Jan. 2021, www.statista.com/statistics/945392/teenagers-who-encounter-hate-speech-online-social-media-usa/.
[ 11 ]
R. Paff, and X. Kong (2015) “Python in Data Science Research and Education”, Proc of the 14th python in science conf.
[ 12 ]
Yu, Chong Ho. (2010). Exploratory data analysis in the context of data mining and resampling. International Journal of Psychological Research. 3. 10.21500/20112084.819.
[ 13 ]
Jurafsky. D, and J. Martin. (2020). “Logistic Regression.” Speech and Language Processing.
[ 14 ]
Biau, G, and Scornet, E. (2016). A random forest guided tour. TEST 25, 197–227.
[ 15 ]
Evgeniou, Theodoros & Pontil, Massimiliano. (2001). Support Vector Machines: Theory and Applications. 2049. 249-257. 10.1007/3-540-44673-7_12.
[ 16 ]
Brownlee, Jason. (23 Feb. 2020) “Develop k-Nearest Neighbors in Python From Scratch.” Machine Learning Mastery.
Browse Journals by Subject