Racial Filtering Classification Model Through Data Analysis of Racial Contents in Twitter

Jung-hun Baeck; Teresa Hyoju Chang; Jaden Chunho Chyu; Bryan Chunwoo Chyu; Chaehyun Lim

doi:doi:10.11648/j.ijdsa.20210706.11

| Peer-Reviewed

Racial Filtering Classification Model Through Data Analysis of Racial Contents in Twitter

Jung-hun Baeck, Teresa Hyoju Chang, Jaden Chunho Chyu, Bryan Chunwoo Chyu, Chaehyun Lim

Published in International Journal of Data Science and Analysis (Volume 7, Issue 6)

Received: 23 September 2021 Accepted: 20 October 2021 Published: 10 November 2021

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

Stop Asian Hate or Stop Asian American Pacific Islanders (AAPI) Hate refers to the national movement against racially-motivated attacks on Asians. This protest was initiated in line with the Black Lives Matter (BLM) movement, to dismantle the ongoing hate and targeted crimes against Asians, and to educate people of such threats. Hate crimes targeting Asians have been occurring steadily across the U.S, but with the effect of COVID-19, these crimes started increasing in number. For the Stop Asian Hate movement, the matter was exacerbated with people accusing certain Asian countries as the source for COVID-19. In 2021, Asian Americans reported a single biggest increase in serious incidents of online hate and harassment with racist and xenophobic slurs blaming people of Asian descent for the spread of COVID-19. To specifically assess the impacts and measures of each movement, research was conducted to examine the racial slurs used towards Asians on social media, specifically Twitter. For analysis of the data on social media, Python programming was used to collect and analyze the ratio of racial slurs and Anti-Asian hate. In doing so, the data set was modeled through data labeling, which classified each social media tweet into one of three sub-categories. Data were classified into two types: type 1 that contains racial contents or information against Asians and type 0 that has non-racial contents. The data collection was done through Twint, a Python scraping tool for Twitter, gathering a total of over 2,000 recent tweets for keywords relevant to the movement. Then, a preprocessing step was taken through Python, involving the process of decapitalizing, lemmatizing, and tokenizing. These data were then represented by graphs and word clouds, displaying some of the most commonly used terms targeting Asians on social media. Lastly, the data went through a design of a binary classification model for filtering tweets with racial content. We compared the accuracy of classification models with three different algorithms: logistic regression, random forest, and SVM. The model created would be able to safeguard users from exposures to racist terms vastly pervaded on the internet.

Published in	International Journal of Data Science and Analysis (Volume 7, Issue 6)
DOI	10.11648/j.ijdsa.20210706.11
Page(s)	132-138
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Data Science, Machine Learning, EDA, Stop Asian Hate, COVID-19

References

[1]	Muzhir Al. (18 Feb. 2021) “Rewview Study on Sciencedirect Library Based on Coronavirus COVID-19”, UHD Journal of Science and Tehcnology. 4 (2): 46.
[2]	Tessler, H., Choi, M., & Kao, G. (2020, June 10). The Anxiety of Being Asian American: Hate Crimes and Negative Biases During the COVID-19 Pandemic. American Journal of Criminal Justice.
[3]	Gover, A., Harper, S., & Langton, L. (2020). Anti-Asian hate crime during the COVID-19 pandemic: Exploring the reproduction of inequality. American Journal of Criminal Justice, 45 (4), 647-667.
[4]	John. Jame, Satt. David. Otton. Kylie (2021). “COVID-19—Related Assault on Asians: Economic Hardship in the United States and India Predicts Diminished Support for Victim Compensation and Assailant Punishment.” Int. J. Environ. Res. Public Health 2021, 18 (10), 5320.
[5]	Lan. B, Wenger. Mar (20 August 2021). “Are Asian Victims Less Likely to Report Hate Crime Victimization to the Police? Implications for Research and Policy in the Wake of the COVID-19 Pandemic.” Crime & Delinquency.
[6]	Carney, Nikita. (2016). All Lives Matter, but so Does Race: Black Lives Matter and the Evolving Role of Social Media. Humanity & Society. 40. 10.1177/0160597616643868.
[7]	H. Aggie, Ku. Karen, S. Eleanor, and Var. Edward. (2021) “Asian Americans’ Indifference to Black Lives Matter: The Role of Nativity, Belonging and Acknowledgment of Anti-Black Racism.” Soc. Sci. 2021, 10, 168. https://doi.org/10.3390/socsci10050168.
[8]	Yam, Kimmy.(28 April 2021) “New Report Finds 169 Percent Surge in Anti-Asian Hate Crimes during the First Quarter.” NBCNews.com, NBCUniversal News Group. www.nbcnews.com/news/asian-america/new-report-finds-169-percent-surge-anti-asian-hate-crimes-n1265756.
[9]	Times, Global. “Online Discrimination.” Global Times, www.globaltimes.cn/page/202104/1220979.shtml.
[10]	Johnson, Joseph. “U.S. Teens Hate Speech Social Media by Type 2018 l Statistic.” Statista, 25 Jan. 2021, www.statista.com/statistics/945392/teenagers-who-encounter-hate-speech-online-social-media-usa/.
[11]	R. Paff, and X. Kong (2015) “Python in Data Science Research and Education”, Proc of the 14^th python in science conf.
[12]	Yu, Chong Ho. (2010). Exploratory data analysis in the context of data mining and resampling. International Journal of Psychological Research. 3. 10.21500/20112084.819.
[13]	Jurafsky. D, and J. Martin. (2020). “Logistic Regression.” Speech and Language Processing.
[14]	Biau, G, and Scornet, E. (2016). A random forest guided tour. TEST 25, 197–227.
[15]	Evgeniou, Theodoros & Pontil, Massimiliano. (2001). Support Vector Machines: Theory and Applications. 2049. 249-257. 10.1007/3-540-44673-7_12.
[16]	Brownlee, Jason. (23 Feb. 2020) “Develop k-Nearest Neighbors in Python From Scratch.” Machine Learning Mastery.

Cite This Article

Plain Text BibTeX RIS

APA Style

Jung-hun Baeck, Teresa Hyoju Chang, Jaden Chunho Chyu, Bryan Chunwoo Chyu, Chaehyun Lim. (2021). Racial Filtering Classification Model Through Data Analysis of Racial Contents in Twitter. International Journal of Data Science and Analysis, 7(6), 132-138. https://doi.org/10.11648/j.ijdsa.20210706.11

Copy | Download

ACS Style

Jung-hun Baeck; Teresa Hyoju Chang; Jaden Chunho Chyu; Bryan Chunwoo Chyu; Chaehyun Lim. Racial Filtering Classification Model Through Data Analysis of Racial Contents in Twitter. Int. J. Data Sci. Anal. 2021, 7(6), 132-138. doi: 10.11648/j.ijdsa.20210706.11

Copy | Download

AMA Style

Jung-hun Baeck, Teresa Hyoju Chang, Jaden Chunho Chyu, Bryan Chunwoo Chyu, Chaehyun Lim. Racial Filtering Classification Model Through Data Analysis of Racial Contents in Twitter. Int J Data Sci Anal. 2021;7(6):132-138. doi: 10.11648/j.ijdsa.20210706.11

Copy | Download

@article{10.11648/j.ijdsa.20210706.11,
author = {Jung-hun Baeck and Teresa Hyoju Chang and Jaden Chunho Chyu and Bryan Chunwoo Chyu and Chaehyun Lim},
title = {Racial Filtering Classification Model Through Data Analysis of Racial Contents in Twitter},
journal = {International Journal of Data Science and Analysis},
volume = {7},
number = {6},
pages = {132-138},
doi = {10.11648/j.ijdsa.20210706.11},
url = {https://doi.org/10.11648/j.ijdsa.20210706.11},
eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20210706.11},
abstract = {Stop Asian Hate or Stop Asian American Pacific Islanders (AAPI) Hate refers to the national movement against racially-motivated attacks on Asians. This protest was initiated in line with the Black Lives Matter (BLM) movement, to dismantle the ongoing hate and targeted crimes against Asians, and to educate people of such threats. Hate crimes targeting Asians have been occurring steadily across the U.S, but with the effect of COVID-19, these crimes started increasing in number. For the Stop Asian Hate movement, the matter was exacerbated with people accusing certain Asian countries as the source for COVID-19. In 2021, Asian Americans reported a single biggest increase in serious incidents of online hate and harassment with racist and xenophobic slurs blaming people of Asian descent for the spread of COVID-19. To specifically assess the impacts and measures of each movement, research was conducted to examine the racial slurs used towards Asians on social media, specifically Twitter. For analysis of the data on social media, Python programming was used to collect and analyze the ratio of racial slurs and Anti-Asian hate. In doing so, the data set was modeled through data labeling, which classified each social media tweet into one of three sub-categories. Data were classified into two types: type 1 that contains racial contents or information against Asians and type 0 that has non-racial contents. The data collection was done through Twint, a Python scraping tool for Twitter, gathering a total of over 2,000 recent tweets for keywords relevant to the movement. Then, a preprocessing step was taken through Python, involving the process of decapitalizing, lemmatizing, and tokenizing. These data were then represented by graphs and word clouds, displaying some of the most commonly used terms targeting Asians on social media. Lastly, the data went through a design of a binary classification model for filtering tweets with racial content. We compared the accuracy of classification models with three different algorithms: logistic regression, random forest, and SVM. The model created would be able to safeguard users from exposures to racist terms vastly pervaded on the internet.},
year = {2021}
}

Copy | Download

TY - JOUR
T1 - Racial Filtering Classification Model Through Data Analysis of Racial Contents in Twitter
AU - Jung-hun Baeck
AU - Teresa Hyoju Chang
AU - Jaden Chunho Chyu
AU - Bryan Chunwoo Chyu
AU - Chaehyun Lim
Y1 - 2021/11/10
PY - 2021
N1 - https://doi.org/10.11648/j.ijdsa.20210706.11
DO - 10.11648/j.ijdsa.20210706.11
T2 - International Journal of Data Science and Analysis
JF - International Journal of Data Science and Analysis
JO - International Journal of Data Science and Analysis
SP - 132
EP - 138
PB - Science Publishing Group
SN - 2575-1891
UR - https://doi.org/10.11648/j.ijdsa.20210706.11
AB - Stop Asian Hate or Stop Asian American Pacific Islanders (AAPI) Hate refers to the national movement against racially-motivated attacks on Asians. This protest was initiated in line with the Black Lives Matter (BLM) movement, to dismantle the ongoing hate and targeted crimes against Asians, and to educate people of such threats. Hate crimes targeting Asians have been occurring steadily across the U.S, but with the effect of COVID-19, these crimes started increasing in number. For the Stop Asian Hate movement, the matter was exacerbated with people accusing certain Asian countries as the source for COVID-19. In 2021, Asian Americans reported a single biggest increase in serious incidents of online hate and harassment with racist and xenophobic slurs blaming people of Asian descent for the spread of COVID-19. To specifically assess the impacts and measures of each movement, research was conducted to examine the racial slurs used towards Asians on social media, specifically Twitter. For analysis of the data on social media, Python programming was used to collect and analyze the ratio of racial slurs and Anti-Asian hate. In doing so, the data set was modeled through data labeling, which classified each social media tweet into one of three sub-categories. Data were classified into two types: type 1 that contains racial contents or information against Asians and type 0 that has non-racial contents. The data collection was done through Twint, a Python scraping tool for Twitter, gathering a total of over 2,000 recent tweets for keywords relevant to the movement. Then, a preprocessing step was taken through Python, involving the process of decapitalizing, lemmatizing, and tokenizing. These data were then represented by graphs and word clouds, displaying some of the most commonly used terms targeting Asians on social media. Lastly, the data went through a design of a binary classification model for filtering tweets with racial content. We compared the accuracy of classification models with three different algorithms: logistic regression, random forest, and SVM. The model created would be able to safeguard users from exposures to racist terms vastly pervaded on the internet.
VL - 7
IS - 6
ER -

Copy | Download

Author Information

Jung-hun Baeck

St. Mark’s School, Southborough, United States
Teresa Hyoju Chang

Seoul International School, Seoul, South Korea
Jaden Chunho Chyu

Phillips Academy Andover, Andover, United States
Bryan Chunwoo Chyu

Phillips Academy Andover, Andover, United States
Chaehyun Lim

McLean High School, McLean, United States

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Jung-hun Baeck, Teresa Hyoju Chang, Jaden Chunho Chyu, Bryan Chunwoo Chyu, Chaehyun Lim. (2021). Racial Filtering Classification Model Through Data Analysis of Racial Contents in Twitter. International Journal of Data Science and Analysis, 7(6), 132-138. https://doi.org/10.11648/j.ijdsa.20210706.11

Copy | Download

ACS Style

Jung-hun Baeck; Teresa Hyoju Chang; Jaden Chunho Chyu; Bryan Chunwoo Chyu; Chaehyun Lim. Racial Filtering Classification Model Through Data Analysis of Racial Contents in Twitter. Int. J. Data Sci. Anal. 2021, 7(6), 132-138. doi: 10.11648/j.ijdsa.20210706.11

Copy | Download

AMA Style

Jung-hun Baeck, Teresa Hyoju Chang, Jaden Chunho Chyu, Bryan Chunwoo Chyu, Chaehyun Lim. Racial Filtering Classification Model Through Data Analysis of Racial Contents in Twitter. Int J Data Sci Anal. 2021;7(6):132-138. doi: 10.11648/j.ijdsa.20210706.11

Copy | Download

Copy | Download