Facebook Tells Me Your Gender An Exploratory Study of Gender Prediction for Turkish Facebook Users

Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction for Turkish Facebook Users ÖNDER ÇOBAN, Cukurova University ALI İNAN, Adana Alparslan Turkes Science and Technology University SELMA AYŞE ÖZEL, Cukurova University Online Social Networks (OSNs) are very popular platforms for social interaction. Data posted publicly over OSNs pose various threats against the individual privacy of OSN users. Adversaries can try to predict private attribute values, such as gender, as well as links/connections. Quantifying an adversary’s capacity in inferring the gender of an OSN user is an important first step towards privacy protection. Numerous studies have been made on the problem of predicting the gender of an author/user, especially in the context of the English language. Conversely, studies in this field are quite limited for the Turkish language and specifically in the domain of OSNs. Previous studies for gender prediction of Turkish OSN users have mostly been performed by using the content of tweets and Facebook comments. In this article, we propose using various features, not just user comments, for the gender prediction problem over the Facebook OSN. Unlike existing studies, we exploited features extracted from profile, wall content, and network structure, as well as wall interactions of the user. Therefore, our study differs from the existing work in the broadness of the features considered, machine learning and deep learning methods applied, and the size of the OSN dataset used in the experimental evaluation. Our results indicate that basic profile information provides better results; moreover, using this information together with wall interactions improves prediction quality. We measured the best accuracy value as 0.982, which was obtained by combining profile data and wall interactions of Turkish OSN users. In the wall interactions model, we introduced 34 different features that provide better results than the existing content-based studies for Turkish. CCS Concepts: • Social and professional topics → Gender; • Computing methodologies → Natural language processing; Machine learning; • Networks → Online social networks; Additional Key Words and Phrases: Facebook, online social networks, attribute inference, gender detection, text categorization ACM Reference format: Önder Çoban, Ali İnan, and Selma Ayşe Özel. 2021. Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction for Turkish Facebook Users. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 20, 4, Article 66 (May 2021), 38 pages. https://doi.org/10.1145/3448253 Authors’ addresses: Ö. Çoban (corresponding author) and S. A. Özel, Çukurova University Department of Computer Engineering 01330 Balcalı Sarıçam Adana/TÜRKİY; emails: [email protected], [email protected]; A. İnan, Adana Alparslan Turkes Science and Technology University Department of Computer Engineering 01250 Sarıçam Adana/TÜRKİYE; email: [email protected]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. © 2021 Association for Computing Machinery. 2375-4699/2021/05-ART66 $15.00 https://doi.org/10.1145/3448253 ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. 66 66:2 Ö. Çoban et al. 1 INTRODUCTION Online Social Networks (OSNs) are very popular communication mediums in today’s world. Prominent OSNs such as Facebook, Twitter, and Instagram are so extremely popular that they have hundreds of millions of users and are even said to be able to reflect real life [Wani et al. 2018]. It is well known that OSN users provide a substantial amount of personal information to OSN service providers and other users [Gayo Avello 2011]. Despite the availability of privacy settings to control and customize the visibility of personal data, privacy emerges as an important concern [Lindamood et al. 2009]. The main reason for this concern stems from the nature of OSNs in promoting high connectivity and the potential threat of misuse of personal data. Concerns against the protection of personal OSN data are not ill-founded. The Guardian has covered news reporting that trillions of tweets were up for sale [Garside 2015]. In certain disclosures, third parties with direct or indirect access to the OSN data have attacked the individual privacy of OSN users [Tang et al. 2011]. For instance, the profile information of millions of users were collected through a Facebook application and used inappropriately for political purposes in 2018 [Graham-Harrison and Cadwalladr 2018]. In an OSN environment, the adversary attacking individual privacy could be anyone in the range from a simple end user to a third-party application, or an insider—an OSN service provider employee. We focus on adversaries that are end users, and following related work, we assume that such adversaries employ statistical, analytical tools as well as machine learning (ML) techniques to infer private traits (e.g., gender, age, political view, links) of OSN users [Tang et al. 2011]. Gender prediction is an important instance of the user attribute inference problem. This is because a successful gender inference mechanism can be used to harm the privacy of OSN users. For instance, users, especially females, may be exposed to cyberstalking or real-world stalking. Gender attribute can also be used as auxiliary information to reveal other personal data of an OSN user. In such a case, gender inference can give way to serious attacks (e.g., phishing, blackmailing) that may result in economic and societal harms for users. For instance, a mother’s maiden name is one of the most frequently used authentication traits in Turkey. In such a scenario, an attacker can use the surname and gender information of a targeted user’s OSN friends to reveal his/her mother’s maiden name. A typical example of such a successful attack can harm the user in online banking, where an adversary can impersonate his target to apply for a new credit card or transfer funds through telephone or online banking. All of these cases show that gender inference over OSN data not only violates the individual privacy of OSN users but also has other potential adversarial consequences. Various solutions have been proposed to the problem of predicting an OSN user’s gender. There are studies that rely on different approaches that generally use (i) profile information [Burger et al. 2011; Liu and Ruths 2013; Tang et al. 2011], (ii) network structure [Jernigan and Mistree 2009; Zheleva and Getoor 2009], or (iii) the content of the profile owner’s wall [Alowibdi et al. 2013a]. Profile-based approaches use one or more selected attributes (e.g., display name) to infer a profile owner’s gender. Network structure–based approaches take advantage of the social network structure to detect users’ genders. Such approaches often use the links and groups of a wall owner (WO), along with attributes of other users who have friendship connections with the WO in a network [Zheleva and Getoor 2009]. For instance, a simple friendship aggregate model can look at the attribute distribution among the friends of a WO under question so as to have an inference/insight about a related item. Content-based approaches, however, use text mining methods in gender prediction. As such, OSNs have taken the attention of the research community as a rich source of data for different text mining tasks [Fatima et al. 2017; Peersman et al. 2011; ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction 66:3 Rangel and Rosso 2013; Sap et al. 2014], as the most important and well-known characteristic of OSNs is that users communicate with each other via short text messages [Peersman et al. 2011]. In this article, we perform an exploratory study for the purpose of predicting the gender of Turkish Facebook users. To achieve our goal, we first crawled the Facebook OSN and collected public data of 20K users with the help of our crawler presented in the work of Coban et al. [2020]. Then, we used different models that utilize various information obtained from a profile owner’s account to predict the gender of the corresponding user. Our models consider the network structure, profile information, wall interactions, and wall content (i.e., posts, comments, and replies) of users, and apply well-known classification techniques from ML and deep learning (DL). Notice that our work is closely related to authorship classification and gender classification, which have been studied many times before (see our literature review in Section 2), especially in the context of the English language (Table 1). However, studies for gender prediction over Turkish OSN data are quite limited. The main reason for this is that accessing OSN data is very challenging task and requires an additional effort to collect network structure along with every bit of publicly available data of users. There are also several additional challenges against text mining research for Turkish. For example, (i) language resources and NLP tools for Turkish are limited, (ii) there is no publicly available OSN data, and (iii) linguistic and semantic analyses such as word sense disambiguation and morphological analysis are complicated. One of the earliest gender detection studies in Turkish is that of Amasyali and Diri [2006], which focuses on author profiling over Turkish news texts. To the best of our knowledge, there exist only a few other studies with the aim of gender prediction on chat messages [Kucukyilmaz et al. 2006], tweets [Sezerer et al. 2019a, 2019b], and Facebook comments [Talebi and Köse 2013; Çelik and Aslan 2019]. However, it seems that our study is the first research that aims to make gender prediction of Turkish Facebook users based on various information extracted from users’ profiles. Our main contributions in this work are as follows: • We apply various models involving profile information, network structure, wall interaction, and wall content to perform gender prediction. These models have been experimented with by themselves and in combination with each other. • We conduct intensive experiments for the wall content model to explore which type of information is more effective for the gender prediction task in Facebook. • We use different traditional ML algorithms in all of our models. Additionally, we use different DL algorithms (e.g., Convolutional Neural Network (CNN), Recurrent Neural Network (RNN)) and language models (e.g., Bidirectional Encoder Representations from Transformers (BERT)) in the wall content model so as to obtain comprehensive results. • We introduce 34 different features in the wall interactions model, which provide better results than the existing content-based studies for Turkish. These features can easily be adapted to other languages as long as a lexicon of gender-oriented words in the target language is available. • We report better results than the best results previously reported in other related studies that focus on gender prediction for Turkish OSN users. • We report challenges for the gender prediction task on Facebook with respect to our models. The rest of this article is structured as follows. Section 2 reviews the gender prediction studies over OSNs. Section 3 and Section 4 introduce the datasets and methods we used, respectively. Section 5 outlines the experimental results. Section 6 presents a discussion of our experimental results, and Section 7 provides our conclusion and possible directions for future work. ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. 66:4 Ö. Çoban et al. Table 1. Summary of Research Works for Gender Detection over Different OSNs Work OSN Features Data / User Set Language Method Best Result Year [Peersman et al. 2011] Netlog Word and character n-grams 1.53M posts Dutch SVM 0.663 2011 [Tang et al. 2011] Facebook 12 different profile attributes including display name Profile information of 679K users English MNB and J48 0.952 2011 [Deitrick et al. 2012] Twitter Character uni-grams and bi-grams 3.03K tweets English Modified BW 0.985 2012 [Burger et al. 2011] Twitter Free-text profile fields and word and character n-grams 4.10M tweets of 184K users Multilingual SVM, NB, and BW 0.920 2011 [Alowibdi et al. 2013b] Twitter 5 color-based features 4 subsets of 53.3K users Language independent NB-Tree 0.743 2013 [Alowibdi et al. 2013a] Twitter First name, user name, 5 color-based features, and phoneme sequence n-grams Randomly sampled 180K of 194.2K users Language independent NB, Decision Tree, and NB-Tree 0.825 2013 [Tellez et al. 2018] Twitter Word and character n-grams and bag of visual words 3 different datasets from PAN@CLEF Arabic, English, and Spanish SVM and Rocchio 0.827 2018 [Fatima et al. 2017] Facebook Word and character n-grams and 64 different stylistic-based features Posts/comments of 479 users Roman Urdu and English J48, RF, NB, and SVM 0.875 2017 [Keeshin et al. 2010] Facebook Word, structure, and count-based features 170K status updates English NB, MaxEnt, and Perceptron 0.677 2010 [Sap et al. 2014] Facebook, Twitter Lexicon with words and weights Data of 75K Facebook users and tweets of 11K Twitter users English SVM 0.919 2014 [Rangel and Rosso 2013] Facebook Writing style–based features 1.2K comments Spanish SVM 0.590 2013 [Khandelwal 2019] Twitter Bow, character n-grams, emotions, etc. First name, profile picture, and 7.5K tweets of 1K users Code-mixed HindiEnglish SVM, RF, NB 0.805 2019 [Giannakopoulos et al. 2018] Twitter 7 features based on three fields Profile picture, display name, and theme color of 8.4K users English SVM and PNNs 0.872 2018 [Nicvist et al. 2018] Twitter Total number of male/female words, links, hashtags, emojis, etc. 436 tweets from PAN’16 English Basic Linear Classifier 0.610 2018 [Bsir and Zrigui 2018a] Twitter Word embeddings 240K tweets of 2.4K users Arabic GRU, LSTM, and CNN 0.796 2018 [Han et al. 2018] Instagram Features based on tags, activities, and images 1.2M tag, activity, and image data of 3.7K users English LR, SVM, MLP, and CNN 0.820 2018 (Continued) ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction 66:5 Table 1. Continued Work OSN Features Data / User Set Language Method Best Result Year [Corriga et al. 2018] Facebook Bow Textual tags extracted from 3K images of users Language independent NB and RF 0.630 2018 [Ciot et al. 2013] Twitter 20-top words/trigrams/hashtags and tweet/retweet/link/ mention frequencies, etc. 4 subsets of 8.61K users Multilingual including Turkish SVM 0.870 2013 [Sezerer et al. 2019a] Twitter Word embeddings Tweets of 4K users Turkish ANN 0.806 2019 [Sezerer et al. 2019b] Twitter Bow Tweets of 5.2K users Turkish SVM 0.723 2019 [Talebi and Köse 2013] Facebook n-Grams 50K comments of 550 users Turkish NB 0.908 2013 [Çelik and Aslan 2019] Facebook Bow 8.7K comments of 3K users Turkish LR 0.741 2019 Ours Facebook Profile attributes, content and interaction-based features Profile information and 361K/ 1.27M activities of 20K users depending on the created dataset Turkish Profile information (RF, 0.981) Network structure (SVM, 0.792) Wall interactions (RF, 0.966) Wall content (BERT, 0.926) Profile information + Wall interaction (RF, 0.982) 2021 2 RELATED WORK Gender detection is one of the well-studied topics in computer science. Previously, this task was performed many times for different text genres, languages, information sources, and so on. In the literature, gender detection has been made from images, textual content, and/or other manually created features depending on the domain of the task. Some of the most recently published studies on this topic include the following. Santamaria and Mihaljevic [2018] utilized five name-to-gender inference services to classify 7,076 manually labeled names and concluded that three of the tested services are able to guess the correct gender for more than 98% of the names that the system is able to classify. Cheung and She [2017] performed gender detection both for Flickr and Fotolog users from their shared images. The best accuracy rate, measured at 0.79, has been obtained by their analytical system that is based on the well-known homophily phenomenon [Khanam et al. 2020]: OSN users with the same gender are akin to share similar contents. Flekova et al. [2016] made age and gender detection from tweets, and analyzed differences between real user traits and perceived traits from text. Through experimental analysis, it has been concluded that humans use stereotypes that lead to systematic biases [Flekova et al. 2016]. Karimi et al. [2016] compared several unsupervised gender detection methods that rely on the name or image of users. They also suggested a novel method that augments name-based approaches along with gender recognition in facial images [Karimi et al. 2016]. The authors concluded that their method outperforms name-based approaches with an accuracy of 0.92 in the gender detection task. Kosinski et al. [2013] performed personal trait prediction for Facebook users based on their likes. The authors considered different traits of users including gender and obtained an 0.98 AUC value using logistic linear regression. As stated before, even though there are many existing studies in this field, in this section, we present our literature review that mainly focuses on gender prediction studies over OSNs. The ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. 66:6 Ö. Çoban et al. previous works that use text messages, profile information, and images of OSN users can be summarized as follows. Peersman et al. [2011] tried to infer age and gender of OSN users by applying text categorization [Gupta and Lehal 2009] on textual content of users’ chat activities that are collected from a Belgian OSN. It has been reported that age and gender categorization are performed with an accuracy of 0.663 by using 50K of the most informative uni-grams. Deitrick et al. [2012] extracted 9,170 features from 3,031 tweets to perform gender categorization of Twitter users. They were able to predict gender with an accuracy of 0.985 by applying a modified Balanced Winnow (BW) algorithm on the most informative 53 features selected by Weka’s feature selectors. Burger et al. [2011] performed gender detection on a large-scale tweet corpus (i.e., 4.1M tweets of 184K users) that is crawled from Twitter. They have extracted 15.6M character- and word-level n-gram features from their corpus. It is worth noting that they used both profile information (i.e., screen name, profile name, description) and textual content extracted from tweets in their MLbased model. The authors reported the best accuracy as 0.92, which was observed when tweets are utilized together with profile information. Using only tweets, accuracy measurements declined as low as 0.76. Having a huge number of features makes gender classification computationally expensive; however, Alowibdi et al. [2013b] use only five language-independent features (e.g., background color, text color) that are based on colors preferred by the user in his/her profile to detect gender of Twitter users. The authors report that they were able to achieve a quite successful result with an F1-score equal to 0.718. Alowibdi et al. [2013a] also explored profile characteristics for gender classification on Twitter. The authors proposed a technique that uses phonemes to extract and reduce feature space. It is reported that the highest observed accuracy value is 0.825, which is obtained by using 16K 3-gram phoneme-based features. Tellez et al. [2018] performed gender classification based on both text and images posted by Twitter users. It is shown that the best F1score for English is 0.826 when they solely use a text categorization approach. Ciot et al. [2013] performed gender prediction of Twitter users by using textual content from several different languages except English. They crawled messages posted in four different languages to create the datasets and achieved the highest accuracy (i.e., 0.87) for Turkish on a dataset that includes 3.6K Turkish users’ tweets. Tang et al. [2011] employed a display name–centric approach to infer the gender of Facebook users. In this study, a dataset is created by collecting information of 1.2M users from New York network with the help of a focused crawler. Their dataset contains profile information (e.g., gender, relationship, friend list) of 679K users who reveal their gender attributes. The authors achieved an accuracy of 0.952 by using an ML-based prediction model that mainly uses the first name and other profile information of users. Fatima et al. [2017] performed gender classification for the purpose of author profiling on Facebook. In this study, the authors asked selected Facebook users to provide their demographic information along with their comments/posts to create their dataset. Based on the experimental results, the best accuracy value observed is 0.875, which is for a multilingual corpus that includes posts and comments of users with word uni-gram, character 3-gram, and character 8-gram features. Keeshin et al. [2010] performed gender classification on Facebook statuses (i.e., posts) using word-based, count-based, and structure-based features. The best accuracy value is 0.677, which is achieved with the help of the Naive Bayes (NB) classifier on a dataset including 170K statuses. Sap et al. [2014] created a lexicon from a corpus of 300M words written by 75K Facebook users to predict their age and gender. Rangel and Rosso [2013] used some style features to predict the gender of Facebook users. The authors obtained an accuracy of 0.59 on a dataset that is comprised of 1.2K comments written in Spanish. Schwartz et al. [2013] applied a lexicon-based learning approach similar to Sap et al. [2014] for gender prediction of Facebook users and obtained an accuracy of 0.919. Khandelwal [2019] identified both humor and gender of the users based on their tweets, first ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction 66:7 names, and profile pictures. This study obtained 0.805 accuracy as the highest result with the help of the Support Vector Machine (SVM) classifier for the gender prediction task. Giannakopoulos et al. [2018] performed gender prediction over a Twitter dataset from Liu and Ruths [2013] and obtained the best accuracy of 0.872 by using color, image, and display name fields from user profiles. Nicvist et al. [2018] performed gender detection using a simple classifier and obtained 0.610 accuracy on a small dataset that includes 456 tweets. Bsir and Zrigui [2018a] comparatively evaluated three DL models (i.e., Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), CNN) with word embeddings for gender detection of Arabic Twitter users. They obtained the highest accuracy value, equal to 0.796, with the help of the GRU model. Han et al. [2018] executed age and gender classification of Instagram users based on activities, image objects, and tags. They obtained an accuracy of 0.820 as the highest result for gender detection by using the Multi-Layer Perceptron (MLP) algorithm. Corriga et al. [2018] detected the gender and emotion of Facebook users from the University of Cagliari. They used bag of words (bow) of textual tags that are extracted from 3K images of these users. The highest accuracy value observed is 0.63 for the gender prediction task. In context of the Turkish language, the previous studies that we are aware of are as follows. Amasyali and Diri [2006] performed author profiling over Turkish news texts. The authors obtained an accuracy of 0.96 using bi-gram and tri-gram features. Kucukyilmaz et al. [2006] performed gender prediction on Turkish chat messages and reported that they were able to obtain an accuracy of 0.81. Sezerer et al. [2019a, 2019b] performed gender detection on Turkish tweets and obtained the best accuracies as 0.806 and 0.72, respectively. In the context of Facebook, however, we could only detect a few related studies [Çelik and Aslan 2019; Talebi and Köse 2013] in which a dataset is created by Talebi and Köse [2013] that includes 50K comments collected from Turkish Facebook pages. The authors manually labeled the comments and obtained an accuracy of 0.908 using n-gram features. Çelik and Aslan [2019] performed gender prediction over Turkish comments collected from Facebook pages of companies. The authors obtained an accuracy of 0.741 by using bow features. To provide an overview, we summarize studies that focus on gender prediction over OSNs in Table 1. Notice that if a study is performed on a multilingual corpus or multiple OSNs, the best Acc/F1 score value is given for the underlined language or OSN. As is seen from Table 1, the majority of works in this field have focused on Twitter users, and only a few studies have been done for the Turkish language. Our study differs from the others with regard to the used features, methods, and size of the data. To the best of our knowledge, our study is the first one that tries to predict the gender of Turkish Facebook users by using features extracted from profile information, network structure, wall interactions, and wall content of users. In addition, our study produces better results than all other studies in the context of the Turkish language both in wall interactions and wall content models that require language-dependent processing and features. Notice that our result also outperforms the majority of studies in the context of other languages, except for Deitrick et al. [2012], who have very close results to our study. 3 DATASETS We collected data from Facebook by using our crawler that uses the HTTP approach and interacts with a browser using Selenium1 API [Coban et al. 2020]. As is seen from Table 2, this dataset includes both profile information and wall (i.e., timeline) activities (i.e., 2.76M posts, 2.74M comments, and 466K replies) of 20K users from Turkey. It can also be understood that we are able to discover 2.35M unique users by visiting friends of friends of these users in the breadth-first search 1 https://www.seleniumhq.org/. ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. 66:8 Ö. Çoban et al. Table 2. Quantitative Description of Collected Public Facebook Data Property (# of) Count Property (# of) Visited users 20,000 Posts Discovered users 3,980,270 Comments Discovered unique users 2,350,454 Replies Graph Representation Nodes 20K Edges (undirected) Count 2,762,023 2,743,513 466,995 201,150 Table 3. Distribution of Users with Respect to Publicity of the Wall Page and Gender Attribute Criteria Gender revealed Open wall Wall content Selected set Male 9,800 9,323 8,563 4,211 Users (#) Female 6,486 6,070 4,211 4,211 Total 16,286 15,393 12,774 8,422 approach. As is seen from Table 3, only 16,286 of crawled users disclose their genders and the majority of them do not keep their walls private. The number of users whose walls are public is 9,323 for males and 6,070 for females, respectively. To avoid biased results, we selected a subset of users having a complete set of features that enables us to apply different gender prediction models. Notice that in the rest of the article, we use the term wall owner (WO) to represent the user for whom gender prediction is to be done. In this phase, the following filters are applied on the crawled user base: (i) there must be at least one activity written by the WO in his/her profile, and (ii) there must be at least one activity that targets (i.e., written to answer the WO) the WO in his/her profile. Afterward, we selected an equal number of male and female users and created a balanced user set including 8,422 users in total. To perform gender detection, we used different information of these users such as profile attributes, node embeddings, wall interactions, and textual content. To employ textual content, we grouped activity content into three different sets for the purpose of exploring which type of activity has better potential to give insight into the WO’s gender. As shown in Figure 1, we use the prefix “WD” to name our text content sets and combine them as a single document by concatenating all activities in the same set. In other words, we extract three different documents from the wall activities of the WO. The three documents obtained for each user are called WD1, WD2, and WD3. Document WD1 contains all activity content that addresses/targets the WO (e.g., we are sure that the related activity is written to answer the WO). For instance, C1 and R4 are included in WD1, as they directly target the WO (i.e., Müge in this case). In Facebook, there is a similarity between post-comment and comment-reply structures. Users can type their opinions under any post or comment and the complexity of a post may change depending on the number of relative comments and replies. As such, detecting activities targeting the WO is a challenging task, especially when an activity contains more than two related activities. Therefore, in this study, we include an activity in document WD1 only if it is the first related comment or the reply of an activity by the WO. Document WD2 contains all activity content written directly by the WO. For instance, activities P1, R1, R2, and R3 are typed by Müge, who is the WO. Document WD3 does not apply any criteria for activities, and it contains all activity content on the WO’s wall except those that ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction 66:9 Fig. 1. An example Facebook post with relative comments and replies. For each post with its relative comments and replies, if they exist, we group the activity contents into three different document sets: WD1, WD2, and WD3. Table 4. Quantitative Description of Three Different Document Sets Created in the Wall Content Model Attribute/Dataset Posts Comments Replies All activities (AA) Total words in AA Average word WD1 Male Female 22,956 15,236 189,574 126,675 4,541 2,658 217K 144K 1.37M 813K 6.35 5.64 WD2 Male Female 276,176 156,846 161,061 124,663 118,221 81,351 555K 362K 6.27M 3.05M 11.3 8.30 WD3 Male Female 299,132 172,082 350,635 251,338 122,762 125,420 772K 506K 7.64M 3.81M 9.90 6.08 do not target the WO. For instance, all activity content except for C2 and C3 are included in WD3, as it is very hard to automatically determine the exact target of C2 and C3 activities (see Figure 1). In other words, WD3 is the union of WD1 and WD2 documents. In this work, we created WD1, WD2, and WD3 datasets by concatenating documents of users in related types. Table 4 presents quantitative properties of these document sets and shows that the size of the datasets varies in the range of 361K to 1.27M activities depending on the type of the content. It is also clear that male users have more activities than females, and their activities include more words on average. ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. 66:10 Ö. Çoban et al. 4 METHODS In this work, we employ five different gender prediction models, from which the naive model is our simple baseline model that provides a lower bound on the expected performance of all other models in our study. The other four models are based on profile information, network structure, wall interactions, and (iv) wall content of the WO. The profile information model uses basic attributes and friend lists of the account owner to predict his/her gender. The network structure model is based on learned distributed node embeddings of users. The wall interaction model exploits some attributes of activities found both in the profile owner’s wall and any other user’s wall who interacts with the profile owner. This model also considers the other OSN users who interact with the profile owner, and the gender-oriented words observed both in activities of the profile owner and the interacted users. The wall content model, however, uses classical text categorization techniques for the purpose of gender prediction. In the following sections, we describe the details of our models. 4.1 Naive Model (Baseline) The naive model is our baseline model that does gender prediction by making a random decision. It uses a dummy classifier to randomly predict the gender of the WO without considering any information. Therefore, the gender decision is made by flipping an unbiased coin. 4.2 Network Structure Model In the network structure model, we use features obtained from connections and locations of a node in the network. As such, this model uses network structure information to detect the gender of users. A well-known way to extract features from a graph is to use node2vec, which is an algorithmic framework for representational learning on graphs [Grover and Leskovec 2016]. Node2vec is derived from the word2vec model [Mikolov et al. 2013a, 2013b], and given any graph, it can learn continuous feature representations for the nodes by taking random walks through the graph starting at a target node, which can then be used for various downstream ML tasks. In this work, we used node embeddings of users for gender detection on a graph representation of our largest crawled snapshot [Coban et al. 2020], where nodes and edges represent users and their friendship connections, respectively. We would like to note that the graph representation of our data includes 20K nodes and their 201,150 undirected edges (please see Table 2 for more details). 4.3 Profile Information Model In the profile information model, we use both profile and friend list information to predict the the gender of the WO. The assumptions we make and the features used in this model are explained in the following sections. 4.3.1 First Name–Centric Prediction. In profile information and wall interaction models, we extract features that mostly require knowing the gender of the interacted users. However, we are not able to know the gender attributes of all users who interact with the profile owner especially in wall activities. This is because any user who interacts with the profile owner may not reveal his/her gender, and, moreover, his/her profile still may not be visited by our crawler even he/she discloses his/her gender. Note that our crawler discovered 2.35M users (see Table 2), but only 20K of these users are visited. To handle this challenge, we use a simple function that is inspired from Tang et al. [2011]. We call this function First Name-Centric Gender (FCG), which returns a first name –centric value for a given first name according to its usage frequency among male and female users on the OSN. Frequency represents how often a given first name is used by male and female users. Therefore, we also define FNM and FNF values to represent the number of male and female users ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction 66:11 Fig. 2. Example FCG calculation and Facebook posts. (a) FCG calculation. (b) A usual post with one of its relative comments. (c) A direct post without relative comment(s). who have the given first name. Notice that FNM and FNF values should be obtained from a global online or offline external source of Turkish person names. However, we obtain these values by practically counting frequencies of first names of users who revealed their gender on our crawled snapshot (see Section 3). As Facebook insists that its members use their real names, users mostly adhere to Facebook’s real name policy. Therefore, the FCG function returns a value based on the number of male and female users who have the same first name with the WO on the OSN. Let A be an arbitrary user, and let FNM and FNF represent the number of male and female users who are using the same first name with A. Then, the FCG of user A is determined as in Equation (1), where 2 means that the probability of being male is higher, and 1 represents that the probability of being female is higher. If the FCG value is equal to 0, this means that the first name is a unisex name that cannot be used for gender prediction. ⎧ ⎪ ⎪ 2, if FNM > FNF FCG = ⎨ ⎪ 1, if FNM < FNF ⎪ ⎩ 0, otherwise (1) We use the FCG value both in profile information and wall interaction (see Section 4.4) models by using the first token of the display name as the first name of the user at hand. However, we use the second token of the display name as the first name if we have an abbreviation in the first token. In the profile information model, we also use the FCG value based on the username to extract additional features, because usernames are one of the most discriminative and fundamental elements of OSNs for user identification [Malhotra et al. 2012]. Therefore, usernames can reflect users’ characteristics and may give a clue about the WO’s gender. Notice that usernames can also be completely a numeric value (e.g., “123***89”) assigned by Facebook. In this study, we ignore such usernames and consider the usernames that have multiple tokens concatenated by dots. For instance, “onder.coban.32”, “inan.ali”, and “ahmet.y.someone” would be valid usernames. In such a case, the first token of the username is accepted as the first name (i.e., “onder”, “inan”, and “ahmet”) of the WO and the FCG value is also calculated according to Equation (1). An example calculation of the FCG value is shown in Figure 2(a), where the central node (i.e., a user) represents any user who does not reveal his/her gender. In this example, the FCG value of this user is 1, as there are two other female users who use the same first name (i.e., “Elif”). ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. 66:12 Ö. Çoban et al. 4.3.2 Family Members and Relationships. Facebook users frequently disclose their family members and private relationships. This information can give a clue about the gender of the WO. We can explain our assumption as follows. Let A (i.e., the WO in this case) and B be two Facebook users such that B reveals that A is his wife. If so, A is most likely female, and this information can be accessed from the family member section of A and B (if B also accepts to reveal this information). This is also true for other relationships revealed by the users. Therefore, such information in users’ profiles can be used to infer their genders. Note that we take a family member into consideration only if his/her kinship type is observed in one of the following two groups, K1 and K2: • K1 (Male oriented): {brother, uncle, husband, son, father, grandfather} • K2 (Female oriented): {sister, aunt, wife, daughter, mother, grandmother} The groups K1 and K2 given above are defined by considering the gender orientation of kinship types. For instance, if B reveals that A is his/her family member with one of the kinship types in K1, then A is most likely male. Using this idea, we use the number of male- and female-oriented kinships of the WO to extract features. Additionally, we extract features based on the FCG values of users who are revealed by the WO as a family member or private relationship. 4.3.3 Important Events. Facebook users generally share important events about their private relationship, education, and so on. This information is included both on the user’s wall and in the important events section of Facebook. As such information of users only contains important cases (i.e., buying a car, moving home) in their lives, we believe that it can give some hints about the gender of the WO. For instance, if an event from the WO’s profile says that the user gave birth (“doğum yaptı”) or became a mother (“anne oldu”), then we can infer that the WO is most likely female. Similarly, if the WO reveals that he or she is in a relationship with a female user (i.e., event actor), then the WO is most likely male. Hence, we believe that using event information is useful for the gender prediction task. Based on the assumptions described in the preceding sections, we extracted 15 features that include basic profile and friendship interaction information of the users. Table 5 presents a list of these extracted features and their description used in the profile information model. Note that a value of zero represents the cases that related information is not disclosed or could not be obtained by our crawler. 4.4 Wall Interaction Model This model makes its decision by using features that are extracted from the following: • All activities written by the WO both on his/her own wall and any other user’s wall. • Users who interact (i.e., typing a message, tagging the WO) with the WO in these activities. Before giving the details of the extracted features, we first introduce the basic assumptions behind the wall interaction model. As is seen from Figure 2(b), Facebook shows each post in a separate block, and comments are written under the related posts. It is also possible to write a reply under the related comment(s) of any post. If the WO’s wall is public, all activities are visible to anyone and it is possible to extract what is written by the WO and his/her friends. In the wall interaction model, by using these assumptions, we introduce and extract previously unused features from the activities and interactions described above. We would like to emphasize that this is the first study that extracts and uses features from wall interactions of users. In this model, we grouped features into three sets, which are described in the following sections. Notice that we use first name–centric prediction (see Section 4.3.1) as in the profile information model to extract features that require knowing the gender of any user who interacts with the WO. ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction 66:13 Table 5. Features and Their Definitions Used in the Profile Information Model No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Definition Relation status of the WO: single (1), in a relationship (2), engaged (3), married (4) FCG value for a user who is in a relationship with the WO (based on the display name) FCG value for the WO (based on the display name) FCG value for the WO (based on the username) Interest (or sexual orientation) of the WO: females (1), males (2), or males and females (3) Number of users in the OSN who have male (M)- and female (F)-oriented kinships (see K1 and K2) and use the same first name as the WO: M > F (1), M < F (2) If exists, the number of male (M)- and female (F)-oriented kinships (see K1 and K2) of the WO: M > F (1), M < F (2) WO’s total number of friends (t) Total number of male (1) and female (2) friends of the WO (only considers the friends who reveal their gender) Overall FCG value (based on display names) of all friends of the WO Whether the WO has an event post that is related to his/her gender or not The WO has an event post that discloses he or she: is in a relationship (1), is married (2), is engaged (3), met someone (4) FCG value for the event actor (based on the display name), if exists Whether the WO shared a biography or not Whether the WO shared a quotation or not Value(s) 0–4 0–2 0–2 0–2 0–3 0–2 0–2 t 0–2 0–2 0, 1 0, 1 0–2 0, 1 0, 1 4.4.1 Feature Set Extracted from What Users Say to Each Other. In this group, we use a lexiconbased approach and quantify the number of gender-specific Turkish words used in wall activities. The basic idea is that what the WO says to others and what others say to the WO can give some clues about his/her gender. As is seen from Figure 3(a), if other users mostly use male words in their activities on the WO’s wall, the WO is most likely male. However, this model has a drawback for the following reasons. For one, it is too hard to detect which activity directly targets the WO under any activity on his/her wall. For instance, there may be many comments and replies under any post from the WO. In this case, it is a challenging task to detect which activity written by other users directly targets (i.e., written to answer the WO) the WO, as other users may answer each other under an activity from the WO. Therefore, we assume an activity targets only the WO when it satisfies one of the following two conditions: • It is a direct post shared by another user. Direct posts are used when an information is directly posted on another user’s wall. Therefore, the targeted user is clear in this case. • It is the first activity under any activity from WO. However, there are other difficulties. Words may be used in different senses, and it is possible to produce different surface forms of a word from its root in Turkish. Additionally, users often ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. 66:14 Ö. Çoban et al. Fig. 3. (a) User A is most likely female with respect to both the number of interacted female users and female words observed on his/her wall activities. (b) The word cloud obtained from simple surface forms (without inflections) of Turkish gender words in our lexicon. Blue and pink colors represent male- and femaleoriented words, respectively. Notice that this cloud does not include different surface forms and variations (e.g., “emmi” → “amca (paternal uncle)”) of given words. write their messages with an informal language. These challenges make extracting features more intricate in the context of the Turkish language. For instance, it is possible to observe different gender words (see Figure 3(a)) even in activities that target the WO. The main reason for this case stems from some challenges that needs to be handled. Let A and C be two Facebook users from Figure 3(a), and user A is female. The main reasons behind observing a male word (i.e., “abi (brother)”) in a targeting activity from C (or F ) are as follows: • Context: Even it is the first activity located under any activity from A, user C may mention another user (most likely male in this case) in this activity. • Topic: The topic of the wall activity may be important, as it can affect what other users say. For instance, let female A share a photograph of her son in a post. In this case, other users mostly mention A’s son, not herself, and the majority of gender words will be male oriented in this case. • Word sense: User C may use the gender-oriented word in a non-gender-oriented sense. Let us consider two activities, namely A1 and A2, that include the same word “kız (girl)”: A1: “Bugün kız kardeşimin dogum günü (Today is my sister’s birthday)” A2: “Kız kulesi küçük, şirin bir kuledir (Girl tower is a small, cozy tower)” The word kız is used to show the gender of someone in A1, but the same is not true for the word observed in A2 where it is the name of the tower. Detecting such cases requires an effective word sense disambiguation process. Handling these difficulties requires solving numerous challenging tasks, and they are outside the scope of this article. Here, we extract features by assuming that the “majority of OSN users often interact with the users of the same gender.” This assumption is based on the principle of homophily, which is the tendency of people to form edges (i.e., friendships) with other people who have similar traits [Korkmaz et al. 2020]. Homophily is a well-established phenomenon that has been observed to occur frequently in OSNs [Khanam et al. 2020]. For instance, it is observed that users who share visually similar images are more likely to have the same gender [Cheung and She 2017]. According to another study, there is substantial level of topical similarity between users who are close to each other in a social network [Aiello et al. 2012]. It is also possible to derive many such examples, since studying homophily is an attractive field of OSN research and can provide eminent insight into other important sub-research fields, such as social tie prediction and link prediction. ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction 66:15 To extract features with respect to this assumption, we first build a lexicon that consists of lexically and semantically gender-oriented Turkish words, as well as their frequently used abbreviations and different surface forms [Doğan 2011] in Facebook. As Facebook users use informal language and mostly write their messages without adhering to the grammatical rules, we also include the most observed different variations including forms that have common syntax errors (e.g., “kardeş” (brother) → {“gardaş”, “birader”, “bilader”}) of these syntactically correct words. Figure 3(b) depicts the word cloud of these syntactically correct words (different surface forms and variations are not included to reduce complexity) used in this study. Afterward, we count (i.e., by searching terms in our lexicon by using regular expressions) male and female words in wall activities and use their values as our features. In this phase, we apply the following steps for all activities: • As mentioned previously, a gender-oriented word may also be used in a different sense. Therefore, we also created an exceptional words lexicon that includes frequently used but not gender-oriented words and phrases (e.g., “anneler”, “analı”, “kardeşlik”, “annesiz”, “hanım köylü”). Then, we remove such words from the wall activity content, if observed, to avoid biased results. Note that this lexicon is very small when compared to the genderoriented words lexicon, as it is just based on our observations. • ASCII conversion of Turkish vowels are made for the text content obtained from the wall activity to increase the matching rate with the lexicon. Note that users mostly use English equivalents of the Turkish vowel letters (e.g., instead of writing “kızım”, users generally type “kizim” , which requires conversion between letters i and ı). • Text content in the wall activity is converted into tokens. • Each token is normalized by removing repetitive letters to increase the matching rate with the lexicon. This step is applied as described in Section 4.5.2. • Each normalized token is searched from the lexicon by using regular expressions, and their frequencies are counted. Finally, we use the number of counted male- and female-oriented words to extract different features in the wall interaction model. As stated before, we extract different features based on whether the activity targets the WO or not. Notice that the fifth and sixth features in Table 6 allow the model to apply a gender-based filter while counting gender-oriented words. This filter is used so as to eliminate words such that gender orientation of the word conflicts with the targeted user’s gender. For instance, let us consider again the example depicted in Figure 3(a). In this case, if the WO is male, the gender filter will eliminate the words “abla”, “teyze”, and “yenge” , which are female-oriented words. 4.4.2 Features Based on Interacted Users. Another assumption is that the number of male and female users who write an activity on the WO’s wall may be used as a feature to determine the gender of the WO. As shown in Figure 3(a), if the number of female users who write under any activity of the WO is more than the number of male users, the WO is most likely female. Using this idea, we extract several features based on the number of male and female users interacted with the WO in activities. The extracted features in this group mainly depend on the following values: (i) the number of male and female users tagged by the WO (this information is extracted from the post titles, e.g., “Ahmet, Ali ve Ayşe ile birlikte film seyrediyor (Ahmet is watching a movie with Ali and Ayşe)”), if they exist); (ii) the number of male and female users who interact with the WO both on their wall and the WO’s wall; and (iii) quantities of the activity types. 4.4.3 Features Based on the WO’s Activities. This feature set only considers the activities written by the WO. The extracted features are mainly based on the quantity and types of the WO’s own activities on his/her wall. ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. 66:16 Ö. Çoban et al. Table 6. Features and Their Definitions Used in the Wall Interaction Model No. 1, 2 3, 4 5, 6 7 8 9, 10 11, 12 13, 14 15 16, 17 18–20 21–24 25–27 28, 29 30 31, 32 33, 34 Definition Number of male/female words in all activities written by the WO (2 attributes) Number of male/female words addressed to the WO (the target is the WO, 2 attributes) Number of male/female words addressed to the WO (the target is the WO and applies the term gender filter, 2 attributes) Average number of replies written under comments by the WO (includes comments by the WO on other users’ walls) Average number of replies for other comments written under posts by the WO Average number of comments/replies written under the posts by the WO (2 attributes) Number of all male/female users who write an answer under the posts and comments by the WO (either the target is the WO or not, 2 attributes) Number of male/female words in activities written by other users under posts by the WO (either the target is the WO or not, 2 attributes) Total number of emoticons (i.e., like, sad, angry, wow, haha, and love) found in posts by the WO Number of male/female users who are labeled/tagged (e.g., “Ali is together with Ahmet and Mehmet”) by the WO in his/her post title (2 attributes) Total number of posts/comments/replies written by the WO (3 attributes) Number of posts by the WO that include photograph/user-generated text/video/memory (4 attributes) Number of posts by the WO that include location/survey/link (3 attributes) Number of posts by the WO that include information about his/her playing a game/important event (2 attributes) Number of direct posts (see Figure 2(c)) on the WO’s wall. Such posts often are used to give a direct message or publish a greeting (e.g., “Happy birthday brother”) on another user’s wall in Facebook. Number of male/female users who are respondedto by the WO under their comments and replies (2 attributes) Number of male/female users who write an activity under activities by the WO (target is the WO, 2 attributes) Using these three groups of features as explained above, we obtained a list of 34 features that are used to infer the gender of the WO. Table 6 summarizes all of these features and their definitions used in the wall interaction model. 4.5 Wall Content Model The wall content model uses a content-based approach and extracts features from the activities to perform gender detection with the help of classical text categorization techniques ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction 66:17 [Gupta and Lehal 2009]. The basic idea behind this model is similar to our assumptions used in the wall interaction model. We assume that male and female users use different words, phrases, emoticons, and so on. In addition, activities that are written by the WO and targeting the WO may help to infer his/her gender. In this model, we use WD1, WD2, and WD3 datasets and extract word- and character-level information to represent activities by using different approaches, which are described as follows: • Word-level information: We use classical bow as features. We also use distributed representations of words (i.e., word embeddings) that are obtained by the word2vec model. • Character-level information: We use character n-grams, which is also known as the bag of character n-grams. In text categorization, preprocessing directly affects the accuracy of the results because the applied preprocessing steps determine the set of features extracted. In addition, OSN users generally use informal language, and their writings mostly include grammar and syntax errors. As such, this model is intricate in the context of Turkish language like the wall interaction model. In the wall content model, we applied two preprocessing methods: basic preprocessing and linguistic preprocessing. 4.5.1 Basic Preprocessing. We apply this type of preprocessing only when the character-level information is extracted from the text content. In basic preprocessing, we just remove all punctuation marks and digits from the textual data and apply lowercase conversion. 4.5.2 Linguistic Preprocessing. We apply this preprocessing only when the word-level information is used to represent texts. As word-level features often lead to high-dimensional feature space, linguistic preprocessing is mostly applied to reduce the number of features and extract more meaningful features for text categorization. Our linguistic preprocessing includes the following steps: • De-asciifying: Users may write words without Turkish characters, and this prevents us from applying linguistic tasks, as our NLP tool cannot recognize these words as valid Turkish words. Hence, we apply de-asciifying (e.g., “aldik” is converted into “aldık”) at first. • Stemming: In Turkish, it is possible to produce many surface forms from a single root word. In this step, we reduce words to their roots (e.g., “geldi” is stemmed as “gel”). • Stopwords removal: We remove stopwords introduced in Lucene API.2 • Normalization: We reduce the number of repetitive letters with more than two occurrences until the word is recognized as a Turkish word by Zemberek.3 If the word is not recognized as Turkish even we reduce repetitive letters into two, we use this form of the word as Turkish and include words having double consecutive letters (e.g., “kardeşiiiiimm” → “kardeşim”, “gardaaaaş” → “gardaaş”). Note that we use Zemberek 2.0 [Akın and Akın 2007], which is an open source Turkish NLP tool to perform all linguistic analyses. 4.5.3 Feature Extraction. We consider each word as a feature in the bow model by ignoring its position in the text. We also use character bi-grams and tri-grams as features in the n-gram model. In the bow and n-gram models, we apply term weighting after the feature extraction with the help of the term frequency (tf), binary, and term frequency-inverse document frequency (tf*idf) 2 http://lucene.apache.org/. 3 https://code.google.com/archive/p/zemberek/. ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. 66:18 Ö. Çoban et al. Table 7. Obtaining the Document Vector by Averaging Its Word Vectors [Lin et al. 2015] Word room very clean neat Average Value d1 –1,102 –6 –287 –101 –374 d2 –202 355 –343 –399 –148 d3 –668 –605 1,077 –274 –118 ... ... ... ... ... ... d 300 –646 –460 –232 –986 –581 methods, which are well-known traditional unsupervised weighting schemes employed in text categorization [Salton and Buckley 1988]. These schemes can be formulated as follows: 1, if di contains term t (2) WBinar y (t, di ) = 0, otherwise N (3) |{di ∈ D : t ∈ di }| where t is a term and di is the document that is processed. D and N represent the document collection and number of documents in the collection, respectively. T f (t, di ) also corresponds to the observed frequency of term t in document di . In the distributed representation model, we represent each word with a real-valued and lowdimensional vector that is obtained by the word embedding method that is trained on our textual data [Manchanda and Karypis 2018]. The Word2Vec model is one of the word embedding methods that is based on a neural network structure [Mikolov et al. 2013a, 2013b]. This model uses two different architectures, namely continuous bag of words (CBOW) and skip-gram, to learn a vector representation of words. The CBOW architecture predicts the current word based on its context, whereas the skip-gram predicts surrounding words for the given current word [Zhang et al. 2015]. Unlike traditional feature models, the Word2Vec model does not lose or ignore the ordering and semantics of the words. Therefore, this model is quite popular and often shows superior performance in the text categorization field. In this work, we use the Word2Vec model to build distributed representation of the wall activities. WT f ∗I d f (t, di ) = T f (t, di ) ∗ loд 4.5.4 Text Representation. Unlike the previous models, we use both DL and traditional ML classifiers in the wall content model. On the ML side, we employ classical content features to form each instance vector directly. However, we apply a simple averaging approach to represent text instances with distributed representations of words (i.e., word embeddings). This is because the word2vec model is an unsupervised approach, and each text has to be converted into a single feature vector so as to be used with supervised ML techniques. In this task, we apply the averaging method [Hayran and Sert 2017; Lin et al. 2015] on word embeddings as exemplified in Table 7. Note that for convenient display, the value of each dimension is multiplied by 10,000 and indicated by di ˜(i = 1, . . . , 300). In this technique, we represent each text document by taking the average of all word vectors in the document. If the Word2Vec model does not have a vector representation for a word, it is represented by a zero vector. For DL methods, however, we convert each text document into an n × k matrix, where n represents the number of unique words in the vocabulary, whereas k corresponds to the layer size of word embeddings [Kim 2014]. ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction 66:19 Fig. 4. (a) General CNN architecture for NLP [Kim 2014]. (b) Basic RNN architecture before (left) and after (right) the unfolding [Zhang et al. 2018]. 4.6 Classification 4.6.1 Traditional ML Classifiers. We employ ML classifiers to build our gender prediction models. On the ML side, we use Weka’s4 well-known classifiers, including NB, Naive Bayes Multinomial (NBM), SVM by importing the LibSVM5 package, Decision Tree (C4.5), Random Forest (RF), and k-Nearest Neighbor (IBk). 4.6.2 DL Classifiers. In this work, we additionally employ DL classifiers in the wall content model. On the DL side, we use CNN and RNN, which are two main architectures used in text categorization and content-based gender detection [Bsir and Zrigui 2018a, 2018b; Kowsari et al. 2020]. Additionally, we apply the recently proposed BERT [Devlin et al. 2019] language model to predict gender attributes of users. A general CNN architecture for NLP tasks is shown in Figure 4(a), where each text with length of n words is represented by k-dimensional word embeddings. Input is prepared by concatenating vectors of each word in the text, and then the convolution step(s) involving filter(s) in a window of h words is(are) applied to generate new features. Next, a max-over-time pooling is applied over the feature map obtained in the previous step so as to get the maximum value as the feature for the related filter. In this work, we build and utilize a CNN structure that is almost identical to the CNN used in the work of Kim [2014] with two variations. These CNN variants are listed as follows: • CNN-static: Uses pre-trained word vectors that are not modified during the training. • CNN-non-static: Same as the preceding model, except word vectors are updated during the training. A basic architecture of RNN, however, is depicted in Figure 4(b), which shows the RNN architecture before and after the unfolding where x t and ht represent the input and hidden state vectors, respectively at timestep t. Unlike traditional neural networks, inputs and outputs are not independent of each other in RNNs. They are called recurrent because they perform the same task for every element of the sequence with the output being dependent on the previous computations. There are two commonly used RNN architectures, namely LSTM and GRU, which were developed to overcome the exploding and vanishing gradient problem observed in basic RNN. The hidden state of the LSTM is computed by the following equations [Karpathy et al. 2015; Lipton et al. 2015; Zhang et al. 2018]: (4) i t = σ (Wi · x t + Ui · ht −1 + bi ), ft = σ (Wf · x t + Uf · ht −1 + bf ), (5) 4 https://www.cs.waikato.ac.nz/ml/weka/. 5 https://www.csie.ntu.edu.tw/∼jlin/libsvm/. ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. 66:20 Ö. Çoban et al. C̃t = tanh(Wc · x t + Uc · ht −1 + bc ), (6) Ct = ft × Ct −1 + i t × C̃t (7) ot = σ (Wo · x t + Uo · ht −1 + bo ), (8) (9) ht = ot × tanh(Ct ), where i t , ft , and ot are input, forget, and output gates, respectively. x t and ht are the input vector and hidden layer value of the cell at timestep t. The Ct , C̃t , and Ct −1 represent current, candidate, and previous cell states, respectively, whereas σ (i.e., sigmoid) and tanh are activation functions. Similarly, how a GRU cell is updated at each timestep t is computed by the following equations [Karpathy et al. 2015; Zhang et al. 2018]: zt = σ (Wx · x t + Uz · ht −1 + bz ), (10) r t = σ (Wr · x t + Ur · ht −1 + br ), (11) h̃t = σ (Wh · x t + Uh · (r t × ht −1 ) + bh ), (12) (13) ht = (1 − zt ) × ht −1 + zt × h̃t where r t and zt represent vectors of reset and update gates differently from the variables used above. In all equations given above, × represents the element-wise (i.e., Hadamard) product of two matrices/vectors. Notice that unidirectional RNN layers can be wrapped in bidirectional layers as well by allowing bidirectional connections in the hidden layer [Zhang et al. 2018]. In bidirectional RNNs, the forward RNN reads the input sequences from start to end, whereas the backward RNN reads it from end to start [Lipton et al. 2015]. The reader can refer to other words [Lipton et al. 2015; Schuster and Paliwal 1997] for more detailed information about bidirectional RNN architectures. The BERT technique is actually a language model that is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both the left and right context in all layers [Devlin et al. 2019]. Whereas previous word representation models (e.g., word2vec [Mikolov et al. 2013a, 2013b]) focus on learning context-independent word representations, BERT focuses on learning context-dependent word representations. BERT learns to understand the relationship between words with the help of the masked language model (MLM), which is randomly masked words in a sequence and hence can be used for learning bidirectional representations [Lee et al. 2020]. The BERT architecture is based on bidirectional transformers, and it comes with the following two pre-trained general types: • BERT BASE : 12 layers (i.e., transformer blocks), 12 attention heads, and 110 million parameters. • BERT LARGE : 24 layers, 16 attention heads, and 340 million parameters. The authors of the BERT model trained their models for two NLP tasks including MLM and next sentence prediction. The BERT model can be applied by fine tuning the pre-trained language representations to downstream tasks, and it obtains state-of-the-art performance on most NLP tasks, including question answering, text categorization, sentiment analysis, and so on. We refer readers to the work of Devlin et al. [2019] for a more detailed description of the BERT model. ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction 66:21 In this work, we build our CNN and RNN architectures with the help of keras,6 which is a DL library in Python. To employ the BERT model, we use two pre-trained models from hugging face7 (an open source provider of NLP technologies) and build our own BERT model with the help of the keras library. 4.6.3 Performance Evaluation. In this article, all ML classifiers are tuned with default parameters and all results (except for the dummy classifier in the naive model) are obtained by fivefold cross validation. DL algorithms used in this work are also configured to run with fivefold cross validation so as to make a fair comparison. The performance evaluation for all ML and DL methods is done using several well-known evaluation metrics that are based on values from the confusion matrix. True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN) values are used to compare the actual and predicted class labels in the confusion matrix [Tripathy et al. 2016]. Here, we use accuracy (Acc for short) for performance evaluation of both ML and DL classifiers. This score is based on the TP, TN, FP, and FN values, and it is calculated as follows: TP +TN (14) Acc = N where N is sum of all predictions such that N = T P + T N + F P + F N . 4.7 Measuring Feature Importance To measure feature importance, we use the mutual information (MI) method, which is also widely used for the purpose of feature selection. Given to random variables x and y, their MI is defined in terms of their probabilistic density functions p(x ), p(y), and p(x, y) [Peng et al. 2005]: p(x, y) dxdy. (15) MI (x; y) = p(x, y)loд p(x )p(y) MI is a measure between two random variables that quantifies the amount of information obtained about one random variable through the other random variable. In this work, we use MI to measure dependence between features and user gender by using its implementation in yellowbrick8 Python package. 5 EXPERIMENTAL RESULTS In this section, we present our experimental results obtained by using both stand-alone and an ensemble of our gender prediction models in the following sections. 5.1 Results of the Naive Model We first obtained the experimental results by using the naive model (i.e., dummy classifier), which gives the baseline result to make the performance evaluation for the other models. As expected, the naive model achieves an accuracy of 0.504 on our dataset. 5.2 Results of the Network Structure Model In the network structure model, we run node2vec to learn feature representation for every node in our network that includes 20K nodes and 201K undirected edges (see Table 2). In this phase, we used default parameter settings provided by Grover and Leskovec [2016], applying an efficient SGD optimization process. Specifically, we set the number of random walks (r ) to 10, length of walk (l) to 80, and context size (k) to 10, and used other parameters with their default values. We 6 https://keras.io/. 7 https://huggingface.co/. 8 https://www.scikit-yb.org/en/latest/index.html. ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. 66:22 Ö. Çoban et al. Table 8. Accuracy Values of the Network Structure Model with Respect to Different Numbers of Dimensions and Classifiers Dimension (d) 16 32 64 128 256 Classifier SVM IBk C4.5 0.765 0.751 0.738 0.769 0.745 0.713 0.778 0.751 0.705 0.792 0.750 0.702 0.791 0.741 0.705 NB 0.689 0.662 0.642 0.630 0.622 RF 0.770 0.773 0.772 0.769 0.763 Table 9. Accuracy Values of the Profile Information and Wall Interaction Models with Respect to Different Classifiers Model Profile information Wall interaction NB 0.974 0.688 NBM 0.616 0.917 Classifier SVM C4.5 0.918 0.980 0.574 0.962 RF 0.981 0.966 IBk 0.973 0.828 also utilized different numbers of dimensions (d), where d is {16, 32, 64, 128, 256} so as to explore the effects of the number of dimensions on the results. After obtaining the feature representations, we associated each user’s node vector with his/her gender. Finally, we fed the classification-ready data into different classifiers and obtained results with respect to different numbers of dimensions. To evaluate this model, we applied well-known traditional ML classifiers, namely NB, NBM, SVM, C4.5, RF, and IBk, as described in Section 4.6. Table 8 presents our results, which show that the best accuracy (i.e., 0.792) is obtained with the SVM classifier when the number of dimensions is equal to 128. 5.3 Results of the Profile Information Model Similar to the network structure model, we evaluated the profile information model by using traditional ML classifiers. As is seen from Table 9, the results of classifiers are slightly different from each other. Nevertheless, the best result is obtained by the RF classifier, whereas the worst is produced by the NBM classifier, which is a specific instance of NB classifier and uses a multinomial distribution for each feature. NBM provides the worst result in this model as it works well for data that can easily be converted into frequency values, such as word counts in text. In the profile information model, we achieved an accuracy of 0.981 as the best value, whereas our baseline and network structure models achieve a value of 0.504 and 0.792 accuracy, respectively. 5.4 Results of the Wall Interaction Model Next, we performed experiments on the wall interaction model that extracts features (e.g., writer, gender of writer, number of total posts) from the activities of the WO’s wall. It also employs a lexicon-based approach that makes use of the male/female words observed in these activities to extract features. Notice that in this step, only female/male words are considered, not all content of activities used for feature extraction. As this model is based on the homophily phenomenon, we first performed a simple analysis so as to explore correctness of our assumption that users often interact with users of the same gender. We obtained an average number of male and female ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction 66:23 Table 10. Number of Unique Features Extracted with Respect to the Document Set, Feature Model, and Preprocessing (PRP) Method for the Wall Content Model Feature Model bow bi-gram tri-gram PRP ✗ ✗ ✗ Document Set WD1 WD2 WD3 223,577 594,695 480,009 24,117 62,295 48,651 990 1,020 1,021 11,222 12,474 11,974 friends of users in our user set. Our results showed that male users have 19.67 male and 3.72 female friends on average. Female users, have, have 9.67 female and 5.54 male friends on average. Our feature dependency analysis (see Section 5.6) also shows that the most discriminative features in this model are based on the numbers of interacted male/female users by the WO. These results show that our assumption is based on a correct basis, and users often form edges and interact with others of the same gender. Afterward, we applied the well-known classifiers as in the profile information model to infer the gender of the WO. As is seen from Table 9, the most successful result is again obtained by the RF classifier. However, the worst result is obtained by the SVM classifier. NBM has better performance in the wall interaction model with respect to its performance in the profile information model. In the wall interaction model, we achieved the best result at an accuracy of 0.966 as compared to 0.504, 0.792, and 0.981 obtained by the naive, network structure, and profile information models, respectively. When an overall comparison is made, the profile information model has the best performance among the three models except for the NBM classifier. As we use lexicon-based features that are extracted from the wall activities in the wall interaction model, the NBM classifier has better performance for this model. 5.5 Results of the Wall Content Model The wall content model utilizes text categorization techniques and uses word-level (i.e., bow and word embeddings) and character-level (i.e., n-grams) information to extract features from content of the wall activities of users. In this model, we used wall datasets (i.e., WD1, WD2, and WD3) so as to explore which type of activity contains more valuable information to predict the gender of Facebook users when content-based analysis is done. We performed experiments using the wall content model in two different ways, namely by applying traditional ML and DL methods. 5.5.1 Results of Traditional ML Methods. In this phase, we first applied classical text categorization steps on WD1, WD2, and WD3 document sets, including feature extraction, term weighting, and classification. To reach our goal, we extracted word and n-gram features (with different n values) to represent documents. Note that before feature extraction, we applied preprocessing (PRP for short) in two different ways (i.e., basic and linguistic) in all cases, and linguistic preprocessing was only applied on word-level models (bow and word embeddings). For the wall content model, results of the task after linguistic preprocessing (if applied) are shown with , whereas results for basic preprocessing are shown with ✗. After the preprocessing, we performed feature extraction, and the number of features obtained are presented in Table 10. According to this table, in the content-based analysis with classical text categorization techniques, we have to deal with very high dimensional feature spaces especially in the bow model without linguistic preprocessing. The ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. 66:24 Ö. Çoban et al. Table 11. Accuracy Values of the Wall Content Model Using Bow and n-Gram Methods with Various Combinations of Preprocessing (PRP), Classifiers (CLS), and Weighting Schemes CLS NB NBM SVM C4.5 RF IBk WD1 WD2 WD3 Feature Set Binary Tf Tf*Idf Binary Tf Tf*Idf Binary Tf Tf*Idf bow 0.634 0.629 0.640 0.596 0.597 0.597 0.629 0.592 0.592 ✗ bi-gram 0.597 0.629 0.628 0.607 0.642 0.646 0.627 0.683 0.689 ✗ tri-gram 0.650 0.684 0.682 0.643 0.663 0.662 0.683 0.716 0.716 bow 0.830 0.862 0.843 0.823 0.826 0.820 0.823 0.836 0.828 ✗ bi-gram 0.736 0.790 0.750 0.691 0.713 0.718 0.754 0.797 0.771 ✗ tri-gram 0.821 0.830 0.822 0.750 0.737 0.757 0.811 0.815 0.815 bow 0.613 0.718 0.741 0.585 0.690 0.696 0.621 0.720 0.748 ✗ bi-gram 0.806 0.811 0.773 0.759 0.758 0.737 0.807 0.803 0.781 ✗ tri-gram 0.819 0.816 0.844 0.764 0.777 0.786 0.790 0.794 0.823 bow 0.804 0.806 0.806 0.774 0.787 0.765 0.804 0.804 0.802 ✗ bi-gram 0.682 0.694 0.673 0.652 0.650 0.654 0.627 0.696 0.688 ✗ tri-gram 0.760 0.768 0.764 0.682 0.687 0.682 0.747 0.747 0.749 bow 0.740 0.750 0.743 0.687 0.707 0.707 0.741 0.756 0.745 ✗ bi-gram 0.671 0.714 0.675 0.630 0.656 0.644 0.703 0.730 0.706 ✗ tri-gram 0.708 0.720 0.715 0.639 0.661 0.659 0.712 0.724 0.729 bow 0.644 0.622 0.592 0.659 0.670 0.686 0.692 0.664 0.690 ✗ bi-gram 0.643 0.662 0.596 0.608 0.648 0.585 0.667 0.684 0.610 ✗ tri-gram 0.631 0.610 0.607 0.626 0.614 0.606 0.662 0.646 0.642 PRP number of features in the WD2 documents set is higher than the other sets as it contains activity contents written by the WO. Following the feature extraction, we applied term weighting using three different schemes, including binary, tf, and tf*idf, then classification experiments were performed on the three document sets. As is seen from Table 11, we obtained our highest results as 0.862, 0.826, and 0.836 on WD1, WD2, and WD3 sets, respectively. The most successful classifier is generally NBM as we do text classification, whereas the worst is IBk. Among the weighting schemes, tf often produces better results than the others. In the n-gram model, there is a direct proportion between length of the n-grams and the classification performance. It is observed that tri-grams generally outperform the others. In all cases, the best result is obtained on the WD1 dataset that includes content of activities that are targeting the WO. These results in the first step show that classical feature models fall behind both the profile information and wall interaction models with respect to accuracy values, which are 0.981 and 0.966, respectively. Second, we used word vectors (i.e., word embeddings) to represent documents in the wall content model. To create distributed representations of our document sets (i.e., WD1, WD2, and WD3), we first trained our Word2Vec model on a huge training collection that contains all activities (i.e., 5.9M activities from collected data including 2.76M posts, 2.74M comments, and 466K replies crawled from the 20K users’ wall (see Table 2). We employed the DL4J9 library to build the word vectors and applied different layer sizes n where n ∈ {50, 100, 200, 300, 500} to investigate its effects on results. The selected parameter values are as follows: minimum word frequency is 1, learning rate equals 0.025, iterations is set to 5, window size is equal to 5, and epochs is set to 5. Note that 9 https://deeplearning4j.org/. ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction 66:25 Table 12. Accuracy Values of the Wall Content Model Using Word Embeddings with Various Combinations of Preprocessing (PRP), Layer Size, and Classifiers (CLS) WD1 CLS NB SVM IBk C4.5 RF PRP 200 WD2 300 500 50 100 200 WD3 50 100 300 0.744 0.732 0.716 0.710 0.692 0.723 0.736 0.696 0.679 ✗ 0.736 0.722 0.711 0.706 0.699 0.737 0.839 0.823 0.795 0.771 0.725 0.822 ✗ 0.842 0.833 0.807 0.778 ✗ 0.789 500 50 100 200 0.657 0.764 0.761 0.736 0.715 0.691 300 500 0.734 0.714 0.700 0.680 0.796 0.784 0.772 0.763 0.741 0.797 0.761 0.730 0.688 0.842 0.833 0.812 0.798 0.775 0.785 0.749 0.826 0.808 0.774 0.752 0.716 0.850 0.842 0.824 0.813 0.794 0.786 0.789 0.793 0.791 0.774 0.772 0.778 0.781 0.776 0.813 0.803 0.810 0.815 0.810 0.791 0.796 0.791 0.793 0.778 0.775 0.780 0.784 0.782 0.808 0.820 0.820 0.818 0.820 0.765 0.756 0.753 0.751 0.743 0.753 0.746 0.730 0.740 0.740 0.783 0.780 0.773 0.773 0.761 ✗ 0.771 0.762 0.750 0.740 0.739 0.755 0.750 0.730 0.727 0.724 0.793 0.781 0.774 0.759 0.763 0.810 0.809 0.801 0.800 0.783 0.797 0.798 0.793 0.784 0.783 0.830 0.825 0.820 0.820 0.811 ✗ 0.814 0.801 0.805 0.789 0.787 0.800 0.790 0.781 0.784 0.779 0.834 0.832 0.822 0.810 0.805 we used different values from their typical values for epochs and iterations to improve training success. We selected the number of epochs as lower than 15 since greater values may cause overfitting [Tezgider et al. 2018]. We built two distributed models for each of the document sets depending on the preprocessing option chosen. Then, we used the mean vector of the observed words’ vector to represent each document in each set. Finally, we performed classification experiments and obtained results under different circumstances as presented in Table 12. In this experiment, we were unable to obtain results with the NBM classifier as distributed representations of our texts contain negative values.10 As is seen from Table 12, the most successful classifier is SVM, whereas the worst one is generally NB. The highest classification accuracies for each of the three sets are obtained when the layer size is set to 50, and it is observed that classification performance often decreases while the layer size increases. The Word2Vec model is not sensitive to the preprocessing option used, as the results that are obtained with basic preprocessing are slightly different from the results obtained with linguistic preprocessing in all cases. The best result observed is an accuracy of 0.850 on the WD3 document set, which shows that the wall content model cannot outperform the profile information and wall interaction models again. However, the Word2Vec model is very successful when compared to the classical feature models used in the first experiment of the wall content model. The distributed text representation method achieves slightly different results only by using 50-dimensional document vectors, whereas classical feature models need tens of thousands of features to achieve a similar result. Note that the Word2Vec model outperforms classical feature models on the WD3 dataset, and it produces slightly different results on WD1 and WD2 sets as well. Figure 5 lists the top 10 nearest words to the male-oriented query term “abi” and the female-oriented term “abla” with respect to the cosine similarity measure. The word vectors were computed from the WD3 dataset using basic preprocessing and with layer size set to 50. This figure shows that the Word2Vec model is very successful in clustering semantically similar words. 5.5.2 Results of DL Methods. In this phase, we first used CNN and RNN algorithms, which automatically discover features from word embeddings. In experiments of this section, we employed word embeddings with layer size 50 that produced better results for ML-based classifiers in 10 The NBM classifier cannot handle negative feature values. ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. 66:26 Ö. Çoban et al. Fig. 5. Top 10 nearest words for query terms “abi” and “abla” with respect to the cosine similarity. Table 13. Accuracy Values for CNN with Respect to the Wall Datasets CLS Variant CNN Static Non-static WD1 0.852 0.919 Dataset WD2 WD3 0.850 0.866 0.879 0.913 previous sections. To run DL algorithms, we used Google Colab,11 which is a free-of-charge service and enables the research community to use the Tesla K80 GPU. First of all, we compared classification accuracy of the two variants of the CNN algorithm on wall datasets. Note that we obtained results of our CNN structure with the parameter tuning of Kim [2014], where we applied filter sizes of 3, 4, and 5 with 128 filters, dropout rate of 0.5, and batch size of 64. Figure 8(a) in the Appendix depicts the summary of our CNN structure. We configured the CNN to run with five epochs based on our experiments showing that it often achieves the best result with five epochs. We then injected our previously trained word embeddings with layer size of 50 in CNN-static and CNN-non-static variations. As is seen from Table 13, CNN-non-static provides better results than CNN-static for each of three datasets. The best result, however, is obtained with an accuracy of 0.919 on the WD1 dataset. As a second step of our DL experiments, we used the RNN algorithm to perform gender detection. In this step, we performed experiments only for the WD1 dataset, which provides the best result for CNN. We would also like to note that we used trainable (i.e., non-static) word embeddings for the RNN algorithm because the non-static word embeddings provided better results for the CNN algorithm in the previous step of our DL experiments. Before running the experiments, we performed parameter tuning by using Bayesian optimization with the help of the skopt12 package. We performed tuning by using an RNN structure with bidirectional LSTM cells for the purpose of detecting the number of layers and network parameters. We searched for many parameters, which are the number of LSTM layer(s), the number of neurons in LSTM layer(s), the number of neurons in layer(s), whether to use dropout and batch normalization, dropout rate (if dropout is used), and optimizer. After searching, we detected the optimum network structure, which is stacked with three bidirectional LSTM layers having 16 neurons each, dropout rate of 0.2, optimizer with rmsprop, and without batch normalization and a dense layer before the output layer. Figure 8(b) 11 https://colab.research.google.com/notebooks/intro.ipynb. 12 https://scikit-optimize.github.io/stable/. ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction 66:27 Table 14. Accuracy Values of RNN and Combination of CNN and RNN with Respect to Their Different Variants on the WD1 Dataset CLS RNN Combined Variant Unidirectional GRU Bidirectional GRU Unidirectional LSTM Bidirectional LSTM Bidirectional LSTM + CNN-non-static CNN-non-static + Bidirectional LSTM Acc 0.865 0.864 0.868 0.873 0.881 0.887 shows the summary of created network structure after the parameter optimization process. Next, we performed experiments on this structure to detect the number of epochs for better results. Based on these experiments, we additionally found that this structure provides better results with eight epochs. Using our RNN structure and parameter settings, we run experiments by changing the recurrent unit (i.e., LSTM or GRU) and type (i.e., unidirectional or bidirectional) so as to explore their effects on the results as well. Table 14 presents our experimental results, which are obtained with LSTM and GRU units with unidirectional and bidirectional types of our network. As is seen from the table, the best accuracy (i.e., 0.873) is obtained with the bidirectional LSTM variant of the RNN algorithm. All variants of RNN outperform static CNN, and the best result is obtained with an accuracy of 0.873 with bidirectional LSTM. However, variants of RNN still fall behind with CNN-non-static. Additionally, we combined best-performer variants of the CNN and RNN algorithms by using their optimum structure and parameter settings that have been observed from the previous two experiments. As stated earlier, bidirectional LSTM and CNN-non-static are the most successful variants of the CNN and RNN algorithms, respectively. As such, we used these two variants to create our combined models, namely CNN-static + Bidirectional LSTM, and Bidirectional LSTM + CNN-non-static with the same parameter settings. Figure 8(c) and (d) show the structures of these combined models, respectively. Note that we configured these models to run with five and seven epochs, respectively. Table 14 also presents obtained accuracy values for these combined models. As is seen from the table, combined models provide better results than the singleton run of bidirectional LSTM, but they cannot outperform the sole running of CNN-non-static. As a last step in our DL experiments, we utilized the BERT method in our wall content model by applying three different variations. Our aim is to investigate the effect of the domain of texts on which the BERT model is trained. For this purpose, we used two pre-trained BERT models with the help of the transformers13 library. These models are bert-base-multilingual-uncased14 and dbmdz/bert-base-turkish-128k-cased15 from which the first one is trained on lowercased text in the top 102 languages including Turkish with Wikipedia dump, whereas the second one is trained on a large Turkish corpus that has a size of 35 GB and a vocabulary size of 128K. Additionally, we trained our BERT-like model that uses MLM with the help of using open source implementation16 of the keras library. In the rest of this article, we use the term bert_fb to represent our BERT model that is trained on 300K randomly selected types of activity content on Facebook. 13 https://pypi.org/project/transformers/. 14 https://github.com/google-research/bert/blob/master/multilingual.md. 15 https://huggingface.co/dbmdz/bert-base-turkish-128k-cased. 16 https://keras.io/examples/nlp/masked_language_modeling/. ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. 66:28 Ö. Çoban et al. Table 15. Accuracy Values of Three Different Fine-Tuned BERT Models on the WD1 Dataset BERT Model bert-base-multilingual-uncased dbmdz/bert-base-turkish-128k-cased bert_fb Acc 0.849 0.884 0.926 For pre-trained models, we downloaded each of them and fine-tuned over our WD1 dataset so as to predict gender of users. We selected the WD1 dataset as the best accuracy values in the wall content model is generally obtained with this dataset. To train the bert_fb model, we randomly selected 300K messages from all of the crawled activities of 20K users. We then included [CLS] and [SEP] special tokens for each sentence both in these 300K messages and WD1 dataset. We then trained the bert_fb model with 5 epochs and default parameter settings with 128 hidden layers, and 8 attention heads. We also set the number of maximum sequence length to 256, embedding dimension to 128, and vocabulary size to 24K. After training the bert_fb model, we again trained and fine-tuned it on the WD1 dataset with 6 epochs to predict gender of users. We would like to note that the best results are often achieved with 6 epochs for bert_fb model. For each of the BERT models, we obtained results with 5-fold cross validation and used accuracy metric so as to make a fair comparison with other ML and DL methods. We present obtained results of the BERT models in Table 15 which shows that the best accuracy (i.e., 0.926) among the BERT models is achieved by the bert_fb model. This shows that one needs to train BERT model on a dataset that is from the same domain with the training data. These results also show that the most successful method in the wall content model is BERT with the highest accuracy of 0.926 compared to the best accuracies of the CNN (0.919), RNN (0.873), CNN + RNN (0.887), traditional ML methods (0.862). 5.6 Feature Dependence on Gender in Different Models In this section, we measure and present the dependence between user gender and features obtained with our models. To measure the dependence, we used the MI method, which is described briefly in Section 4.7. We conducted this experiment on features obtained from profile information, wall interaction, and wall content models. We excluded features that correspond to distributed representation of words in the wall content model and nodes in the network structure model. This is because word2vec and its derivation, the node2vec, automatically use distributed vectors they have learned. Note that using the word2vec model, document representation is performed by taking the average of the vectors of words observed in the relevant document (see Table 7). Measuring attribute importance gives information about which index of these distributed vectors is more important. Since each index stands for a learned feature that does not have a specific name, measuring MI dependence does not provide meaningful information. We would also like to note that we measured feature dependencies only for bow features extracted from the WD1 dataset and weighted with raw term frequencies (i.e., tf) in the wall content model. The reason for this is that bow features produce the best result among traditional content features using raw term frequencies on the WD1 dataset (see Table 11). Figure 6(a) depicts obtained dependencies of features (see Table 5) used in the profile information model. As is seen from Figure 6(a), the most important features in profile information are numbered with 6, 3, 4, and 8, respectively. When definitions of these features are examined from Table 5, it is clear that kinship relations and the display name of the WO are the most important features in the profile information model. ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction 66:29 Fig. 6. Mutual information dependence between user gender and features with respect to the profile information (a), wall interaction (b), and wall content model (c). Figure 6(b) likewise shows dependencies of features (see Table 6) used in the wall interaction model. As is seen from Figure 6(b), the most dependent features on user gender in the wall interaction model are numbered with 32, 33, 31, 11, 34, 13, and 12. Having a closer look at Table 6 for feature descriptions, it is seen that the gender of users who interact with the WO and genderoriented words they use in their activities are the most important features in the wall interaction model. Figure 6(c), however, shows the top 50 words with respect to their MI dependency on user gender in the wall content model. As is seen from Figure 6(c), the most dependent features include genderoriented words in Turkish along with their informal forms: ”abi”, ”kardeş”, ”bey”, ”abla”, ”gardaş”, ”abim”, ”kız”, ”adam”, and so on. This result shows that extracting features based on a lexicon of gender-oriented words is an appropriate method in the wall interaction model. 5.7 Experimental Results of the Ensemble Models In the last step of our experiments, we combined profile information, network structure, wall interaction, and wall content models to investigate whether any combination of the three models can improve our results. To combine models, we applied a feature extension approach that extends the current feature space by combining it with the other features obtained from different model(s). For instance, to combine the profile information and wall interaction models, we use an extended feature space that includes a union of features from Table 5 and Table 6. For instance, assume that we have two different feature models and an arbitrary male user’s instance vector that is <2, 3, Male> with feature model 1, whereas it is <4, 5, Male> for feature model 2. When we combine feature models 1 and 2, the instance vector will be <2, 3, 4, 5, Male> with the help of feature extension. In this phase, we used this simple approach to combine our feature extraction models. ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. 66:30 Ö. Çoban et al. Table 16. Accuracy Values of the Ensemble Model with Respect to Different ML Classifiers Model Code Profile Information + Wall Interaction Profile Information + Wall ContentWE Profile Information + Wall ContentBow Wall Interaction + Wall ContentWE Wall Interaction + Wall ContentBow Wall ContentBow + Wall ContentWE C1 + Wall ContentWE C1 + Wall ContentBow C1 + Network Structure C1 + C6 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 Classifier NB NBM SVM IBk C4.5 RF 0.805 0.936 0.650 0.800 0.671 0.642 0.905 0.683 0.624 0.685 0.876 NA 0.759 NA 0.921 NA NA 0.889 NA NA 0.563 0.917 0.703 0.615 0.842 0.709 0.589 0.795 0.513 0.795 0.968 0.968 0.946 0.819 0.589 0.675 0.968 0.943 0.608 0.951 0.979 0.979 0.979 0.958 0.961 0.826 0.977 0.977 0.680 0.980 0.982 0.981 0.894 0.964 0.875 0.785 0.981 0.913 0.761 0.899 For this purpose, we selected the best cases (i.e., feature set, document set, parameter settings) for the four models and experimentally evaluate all combinations of these models. We would like to note that we have taken different feature sets in the wall content model and therefore used the best cases for each of them separately to combine this model with other models. Among traditional feature models (i.e., bow, n-gram), the best result (i.e., 0.862) is provided by the WD1 document set with bow features and tf weighting. As such, we used this document set and term weighting and named it wall contentBow . Similarly, wall contentWE represents the wall content model applied on the WD3 dataset, with the layer size set to 50, as this case gave the best result (i.e., 0.850). We obtained results of our ensemble models in two steps: (i) combining best cases of the wall content model with the handcrafted features of the profile information and wall interaction models, as well as node embedding features of the network structure model, and (ii) combining learned features of the DL-based wall content model with the best case of the previous step. The experimental results of the first step are presented in Table 16, which shows that combining profile information and wall interaction models, coded with C1, improves the performance of gender prediction and produces an accuracy of 0.982, which is the highest value obtained so far in this work. Note that NA means that we are unable to obtain the accuracy value, as the NBM classifier cannot run over datasets that have negative feature values. In this phase, the best results are obtained with the help of C4.5 or RF classifiers. In the second step, we combined features learned by the CNN algorithm with the best feature set detected in the previous step of experiments. This is because CNN with trainable word embeddings produces higher results (i.e., 0.919) than that of RNN and CNN + RNN methods on the WD1 dataset in the wall content model. In this phase, we first combined CNN with ML algorithms by using features extracted by the CNN to feed traditional ML classifiers. This process is actually a means to create an ensemble at the algorithm level. To do this, we used outputs of the fully connected layer (i.e., dense layer with 30 neurons) as the learned features and made the predictions by replacing the softmax layer of our CNN (see Figure 8(a)) with any of the ML algorithms. We implemented our ensemble model that is inspired from Wu et al. [2018] and again used fivefold cross validation. The ensemble of CNN and any ML classifier (with/without feature extension) is depicted in Figure 7 and can be summarized as follows: • For the training process, the WD1 dataset and pre-trained word embeddings (with layer size of 50) are fed to the CNN. ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction 66:31 Fig. 7. Implementation process of ensemble CNN-any ML model. Table 17. Accuracy Values for the Ensemble of CNN with Traditional ML Classifiers with Respect to Feature Extension Feature Set CNN Features CNN Features + C1 (see Table 16) C4.5 0.880 0.878 RF 0.883 0.885 ML Classifier IBk NB NBM 0.875 0.888 0.888 0.869 0.887 0.819 SVM 0.886 0.538 • After training the CNN, the corresponding feature vector is automatically extracted for each input text. • The softmax layer is replaced with the ML classifier that is trained with the automatically extracted feature vectors. • For the test process, a given text is fed to the well-trained CNN and the test feature vector is obtained. • The well-trained ML classifier performs the classification using the test feature vector. Note that this process is shown with a dashed blue line in Figure 7. We also performed experiments by using this ensemble classifier on the extended feature space that includes CNN features and other model features. This additional process is also shown with dashed red line in Figure 7. Using this two-way implementation, we obtained results for the ensemble of CNN with traditional ML classifiers in the wall content model. Next, we also obtained results for the ensemble model on the extended feature space that includes wall content features automatically extracted by CNN and the best features coded with C1 in the previous step of our experiments. Table 17 presents results of the experiments with/without feature extension. As is seen from Table 17, the best results are obtained with the NB classifier and the results are not sensitive to feature extension. Based on our previous results, it is clear that this ensemble model produces better results in both cases than a single run of the traditional ML classifiers on the WD1 dataset (see Table 11 and Table 12). However, its accuracy falls behind in accuracy (i.e., 0.919 obtained on the WD1 dataset) of the single run of the CNN (see Table 13.) Finally, we tried to create an ensemble of the BERT-based wall content model with other models. This is because the best accuracy is obtained by the BERT model (i.e., fb_bert) in the wall content model. For this purpose, we extracted sentence vectors from the fb_bert model and fed into the ML classifiers as done by Kazameini et al. [2020]. Notice that likewise in the ensemble of CNN, this process is actually an algorithmic-level combination of the methods. To extract sentence vectors from the BERT model, we used two approaches: (i) concatenating the last four hidden layers that ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. 66:32 Ö. Çoban et al. Table 18. Accuracy Values for the Ensemble of BERT with Traditional ML Classifiers BERT Sentence Vectors [CLS] Concatenation of last four C4.5 0.654 0.699 RF 0.691 0.749 ML Classifier IBk NB NBM 0.637 0.575 NA 0.697 0.724 NA SVM 0.627 0.785 Table 19. Comparison of Our Study with Previous Gender Prediction Studies Focusing on Turkish OSN Users with Respect to Employed Information and the Ensemble Model in the Feature or Algorithm Level Work [Ciot et al. 2013] [Sezerer et al. 2019a] [Sezerer et al. 2019b] [Talebi and Köse 2013] [Çelik and Aslan 2019] Ours Prediction Model(s) Depend(s) on . . . Content Profile InNetwork Wall In(Wall formation Structure teractions Content) ✗ ✗ ✗ (0.870) ✗ ✗ ✗ (0.806) ✗ ✗ ✗ (0.723) ✗ ✗ ✗ (0.908) ✗ ✗ ✗ (0.741) (0.981) (0.792) (0.966) (0.926) Ensemble Model ✗ ✗ ✗ ✗ ✗ (0.982) give the best representation for a word [Devlin et al. 2019], and (ii) using a fully connected layer over the final hidden state corresponding to the [CLS] input token [Adhikari et al. 2019]. We again applied this process with fivefold cross validation that is similar to the ensemble process of the CNN. We obtained sentence vectors using the preceding two ways by getting predictions of the BERT model for our encoded activity content in the WD1 dataset. Table 18 presents results of this experiment, which shows that running ML classifiers over sentence vectors learned by BERT does not improve accuracy. 6 DISCUSSION In this article, we studied the problem of gender prediction of Facebook users from Turkey. As is seen from Table 1 and Table 19,17 existing gender prediction studies often use the content of OSN users. However, in this study, we employed different models that use basic profile information, network structure, wall interactions, and wall content of users. As is seen from our summarized results in Table 19, the profile information model is more successful in gender prediction with respect to other models. The main reason for this is that revealed profile attributes are more reliable than node embeddings, and content-based and interaction-based features. As stated before (see Table 16), however, building an ensemble of profile and wall interactions models at the feature level provides the best accuracy (i.e., 0.982) in this study. Table 19 additionally makes it clear that we obtained better results than the results of the existing Turkish-oriented studies by the previously unused 34 features (see Table 6) in the wall interactions model and employing the BERT language model in the wall content model. 17 The value between parentheses next to the indicates the best accuracy of the corresponding study with respect to the related information/model. ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction 66:33 Despite the challenges described in Section 4.4.1, the wall interaction model produces the second-best results. This shows that wall activities and other interacting users in these activities can be very effective in determining the gender of a Facebook user. This is one of the most important results in this work, because gender detection based on profile information may not always be possible due to OSN data is often incomplete and privacy-aware users keep their attributes private. If an adversary is unable to employ the profile information model, then he/she can use the wall interaction model, which is very effective. Even though the wall interaction model requires language-dependent processes to detect whether an observed gender-oriented word in any activity is used for gender-based sense, the model’s success can be improved by applying morphological analysis and word sense disambiguation to eliminate words used in non-gender-oriented sense. In the wall content model, the best result is often achieved from the WD1 document set. This proves that activities that target the WO are more important in predicting the WO’s gender. However, the success of the wall content model falls behind the profile information and wall interaction models. The reason for this is that Facebook activity content is often dirty and contains many typos. To handle the dirty content, spelling correction can also be employed in the preprocessing step to extract more meaningful features. For spelling correction, well-known algorithms or tools may not provide satisfactory performance, as users mostly use different abbreviations and write words without obeying any grammatical rules. Creating and using a lexicon may help to correct typos. The wall content model has also some other drawbacks: (i) compared to the other three models, the wall content model that is applied with bow and n-gram features has a huge number of features that makes gender classification computationally expensive, and (ii) extracting discriminative gender-specific features in this model is hard because Facebook users generally post activities about everything they face in their daily lives. In this model, DL algorithms including CNN, RNN, and their combinations (e.g., CNN + RNN) often outperform traditional ML algorithms used with word embeddings. This shows that it is possible to overcome the challenge of having highdimensional feature space without loss of performance by employing word embeddings. However, using the BERT language model outperforms all of the traditional ML classifiers and other DL algorithms in the wall content model. This is because BERT automatically learns context-aware word embeddings, unlike traditional word embeddings obtained by the word2vec model. An important note here is that BERT produces better results when trained on a corpus that is is from the same domain with training data. This is because textual content from the same domain is often akin to including similar words, phrases, sentences, and so on. Creating an ensemble of CNN and ML classifiers (especially SVM) often improves classification accuracy for image data. However, in this work, a single run of the CNN and BERT models produces better results than their ensemble models created with ML classifiers. This is possibly caused because the discriminative power of the automatically learned features does not increase without going through the activation function in the last softmax layer. Therefore, we suggest this investigation for other languages in the context of gender prediction based on user-generated content. Facebook has some obstacles stemming from its nature such that detecting to whom an activity targets among users who posted before under the same parent activity is a very challenging task. Developing an effective solution to this problem will make it possible to further increase the performance of the wall interaction and wall content models. However, a first name–centric strategy used in the profile information and wall interaction models is quite successful, and the first name is very effective in the gender prediction task. It is possible to infer the gender attribute if the username has been selected to include the user’s first name, even if the account is completely private. This outcome is also verified by feature dependencies of user gender in different models. In the ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. 66:34 Ö. Çoban et al. profile information model, the most discriminative feature is the display name of users, whereas the most important features in the wall interaction model depend on the gender of interacted users and the number of gender-oriented words in their activities. This also proves that our assumption in the wall interaction model is based correctly. As stated before, OSN data is often incomplete. Users may keep their profile attributes, friendship connections, and wall activities private. Some of them may just keep some information private, whereas others may completely keep their Facebook account private. This depends on privacy awareness of each user in the network. Therefore, profile information and wall interaction models can be employed together for detecting the gender of users with a high degree of accuracy. Therefore, the models evaluated in this article can be employed for any Facebook user. If the WO keeps his/her data private, then an adversary can use the naive model to make a random guess. Otherwise, he/she can use our models separately or in combination depending on the available public data on the target user’s account. Note that our fivefold results indicate that an adversary is able to perform gender detection with accuracies of 0.981, 0.792, 0.966, and 0.926 by using the profile information, network structure, wall interaction, and wall content models, respectively. He/she can also improve the best accuracy to 0.982 by combining wall interaction and wall content models if profile information and wall interactions exist in the WO’s account. To ensure the privacy of Facebook users, we advise that (i) the wall and other profile pages must be kept private, and (ii) the selected username should consist of only numbers, or not include any part of the WO’s real name, especially the first name. However, these precautions may not be enough if your connections are not careful. Even if you keep your profile information and wall content private, if one of your connections in your social network does not keep his/her connection with you private, your privacy may be violated. For instance, if any user reveals that he/she is in a relationship with you, or you are his/her family member, then any other person is able to have insight to your sensitive information, including your gender. Additionally, if one of your friends types an activity that includes any gender-oriented word(s) and targets you, whether on your wall or his/her wall, then he/she may reveal your gender. 7 CONCLUSION AND FUTURE WORK In this article, we perform gender detection of Facebook users not only by using textual content but also by using the profile information, network structure, and wall interactions. We explore and report the best model by using our models separately and in combination. Based on our experimental results, we conclude that the gender of Facebook users can be inferred even just by using a person’s display name. Moreover, other different models can be employed for gender detection with a high degree of accuracy if profile information is not available. The wall interaction model is one of these models, as it is able to achieve second-best accuracy and outperforms other contentbased existing studies in the context of the Turkish language. We conclude that this is one of the most important findings of this work. Wall content and wall interaction models require language-specific tasks, and therefore further improvement is possible by using effective preprocessing, word sense disambiguation, and language models. In the wall content model, the BERT model trained on a corpus from the same domain with the training data outperforms all ML classifiers and other DL algorithms. We also conclude that the BERT model may provide much better results when trained on a much larger corpus. As future work, we are planning to focus on wall interaction and wall content models. In the wall interaction model, we will try to create a lexicon including words along with their polarity scores in terms of their use by male and female users. In the wall content model, we will investigate ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction 66:35 how the performance of the BERT language model changes depending on the size of corpus it is trained on. APPENDIX Fig. 8. The summary of our CNN (a), stacked RNN (b), CNN + RNN (c), and RNN + CNN (d) structures. DATA AVAILABILITY This article experimented with publicly available OSN data collected from Facebook according to the methodology published in the work of Coban et al. [2020]. Regardless of the data being shared publicly by the corresponding users, the dataset contains sensitive personal information. To respect these users’ individual privacy, the data will not be shared in its raw form. The corresponding author offers his best effort toward providing a thoroughly anonymized form of the dataset under reasonable justification. ACKNOWLEDGMENTS We would like to thank our referees and editors for their valuable suggestions that helped us significantly improve the article. REFERENCES Ashutosh Adhikari, Achyudh Ram, Raphael Tang, and Jimmy Lin. 2019. DocBERT: BERT for document classification. arxiv:1904.08398 Luca Maria Aiello, Alain Barrat, Rossano Schifanella, Ciro Cattuto, Benjamin Markines, and Filippo Menczer. 2012. Friendship prediction and homophily in social media. ACM Transactions on the Web 6, 2 (2012), 1–33. Ahmet Afsin Akın and Mehmet Dündar Akın. 2007. Zemberek, an open source NLP framework for Turkic languages. Structure 10 (2007), 1–5. Jalal S. Alowibdi, Ugo A. Buy, and Philip Yu. 2013a. Empirical evaluation of profile characteristics for gender classification on Twitter. In Proceedings of the 2013 12th International Conference on Machine Learning and Applications, Vol. 1. IEEE, Los Alamitos, CA, 365–369. ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. 66:36 Ö. Çoban et al. Jalal S. Alowibdi, Ugo A. Buy, and Philip Yu. 2013b. Language independent gender classification on Twitter. In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE, Los Alamitos, CA, 739–743. M. Fatih Amasyalı and Banu Diri. 2006. Automatic Turkish text categorization in terms of author, genre and gender. In Proceedings of the International Conference on Application of Natural Language to Information Systems. 221–226. Bassem Bsir and Mounir Zrigui. 2018a. Gender identification: A comparative study of deep learning architectures. In Proceedings of the International Conference on Intelligent Systems Design and Applications. 792–800. Bassem Bsir and Mounir Zrigui. 2018b. Enhancing deep learning gender identification with gated recurrent units architecture in social text. Computación y Sistemas 22, 3 (2018), 757–766. John D. Burger, John Henderson, George Kim, and Guido Zarrella. 2011. Discriminating gender on Twitter. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. ACM, New York, NY, 1301–1309. Özer Çelik and Ahmet Faruk Aslan. 2019. Gender prediction from social media comments with artificial intelligence. Sakarya Üniversitesi Fen Bilimleri Enstitüsü Dergisi 23, 6 (2019), 1256–1264. Ming Cheung and James She. 2017. An analytic system for user gender identification through user shared images. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 3 (2017), 1–20. Morgane Ciot, Morgan Sonderegger, and Derek Ruths. 2013. Gender inference of Twitter users in non-English contexts. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1136–1145. Onder Coban, Ali Inan, and Selma Ayse Ozel. 2020. Towards the design and implementation of an OSN crawler: A case of Turkish Facebook users. International Journal of Information Security Science 9, 2 (2020), 76–93. Andrea Corriga, Simone Cusimano, Francesca Malloci, Lodovica Marchesi, and Diego Reforgiato Recupero. 2018. Leveraging cognitive computing for gender and emotion detection. In Proceedings of the 4th Workshop on Sentic Computing, Sentiment Analysis, Opinion Mining, and Emotion Detection (EMSASW’18). 47–56. William Deitrick, Zachary Miller, Benjamin Valyou, Brian Dickinson, Timothy Munson, and Wei Hu. 2012. Gender identification on Twitter using the Modified Balanced Winnow. Communications and Network 4, 3 (2012), 189–195. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. arxiv:cs.CL/1810.04805 Enfel Doğan. 2011. Türkiye Türkçesine Cinsiyet Kategorisinin İzleri. Journal of International Social Research 4, 17 (2011), 89–98. Mehwish Fatima, Komal Hasan, Saba Anwar, and Rao Muhammad Adeel Nawab. 2017. Multilingual author profiling on Facebook. Information Processing & Management 53, 4 (2017), 886–904. Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar, and Daniel Preoţiuc-Pietro. 2016. Analyzing biases in human perception of user age and gender from text. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 843–854. Juliette Garside. 2015. Twitter puts trillions of tweets up for sale to data miners. The Guardian. Retrieved March 23, 2021 from https://www.theguardian.com/technology/2015/mar/18/twitter-puts-trillions-tweets-for-sale-data-miners. Daniel Gayo Avello. 2011. All liaisons are dangerous when all your friends are known to us. In Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia. ACM, New York, NY, 171–180. Orestis Giannakopoulos, Nikos Kalatzis, Ioanna Roussaki, and Symeon Papavassiliou. 2018. Gender recognition based on social networks for multimedia production. In Proceedings of the 2018 IEEE 13th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP’18). IEEE, Los Alamitos, CA, 1–5. Emma Graham-Harrison and Carole Cadwalladr. 2018. Revealed: 50 million Facebook profiles harvested for Cambridge Analytica in major data breach. The Guardian. Retrieved March 23, 2021 from https://www.theguardian.com/news/ 2018/mar/17/cambridge-analytica-facebook-influence-us-election. Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 855–864. Vishal Gupta and Gurpreet S. Lehal. 2009. A survey of text mining techniques and applications. Journal of Emerging Technologies in Web Intelligence 1, 1 (2009), 60–76. Kyungsik Han, Yonggeol Jo, Youngseung Jeon, Bogoan Kim, Junho Song, and Sang-Wook Kim. 2018. Photos don’t have me, but how do you know me? Analyzing and predicting users on Instagram. In Adjunct Publication of the 26th Conference on User Modeling, Adaptation and Personalization. ACM, New York, NY, 251–256. Ahmet Hayran and Mustafa Sert. 2017. Sentiment analysis on microblog data based on word embedding and fusion techniques. In Proceedings of the 2017 25th Signal Processing and Communications Applications Conference (SIU’17). IEEE, Los Alamitos, CA, 1–4. Carter Jernigan and Behram F. T. Mistree. 2009. Gaydar: Facebook friendships expose sexual orientation. First Monday 14, 10 (2009). https://firstmonday.org/ojs/index.php/fm/article/download/2611/2302. Fariba Karimi, Claudia Wagner, Florian Lemmerich, Mohsen Jadidi, and Markus Strohmaier. 2016. Inferring gender from names on the web: A comparative evaluation of gender detection methods. In Proceedings of the 25th International Conference Companion on World Wide Web. 53–54. ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction 66:37 Andrej Karpathy, Justin Johnson, and Li Fei-Fei. 2015. Visualizing and understanding recurrent networks. arxiv:cs.LG/1506.02078 Amirmohammad Kazameini, Samin Fatehi, Yash Mehta, Sauleh Eetemadi, and Erik Cambria. 2020. Personality trait detection using bagged SVM over BERT word embedding ensembles. arxiv:cs.CL/2010.01309 Jeremy Keeshin, Zach Galant, and David Kravitz. 2010. Machine Learning and Feature Based Approaches to Gender Classification of Facebook Statuses. Kazi Zainab Khanam, Gautam Srivastava, and Vijay Mago. 2020. The homophily principle in social network analysis. arxiv:cs.SI/2008.10383 Ankush Khandelwal. 2019. Towards Identifying Humor and Author’s Gender in Code-Mixed Social Media Content. Ph.D. Dissertation. International Institute of Information Technology Hyderabad. Yoon Kim. 2014. Convolutional neural networks for sentence classification. arxiv:cs.CL/1408.5882 Gizem Korkmaz, Chris J. Kuhlman, Joshua Goldstein, and Fernando Vega-Redondo. 2020. A computational study of homophily and diffusion of common knowledge on social networks based on a model of Facebook. Social Network Analysis and Mining 10, 1 (2020), 5. Michal Kosinski, David Stillwell, and Thore Graepel. 2013. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences 110, 15 (2013), 5802–5805. Kamran Kowsari, Mojtaba Heidarysafa, Tolu Odukoya, Philip Potter, Laura E. Barnes, and Donald E. Brown. 2020. Gender detection on social networks using ensemble deep learning. In Proceedings of the Future Technologies Conference. 346– 358. Tayfun Kucukyilmaz, B. Barla Cambazoglu, Cevdet Aykanat, and Fazli Can. 2006. Chat mining for gender prediction. In Proceedings of the International Conference on Advances in Information Systems. 274–283. Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2020. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (2020), 1234– 1240. Yiou Lin, Hang Lei, Jia Wu, and Xiaoyu Li. 2015. An empirical study on sentiment classification of Chinese review using word embedding. arxiv:cs.CL/1511.01665 Jack Lindamood, Raymond Heatherly, Murat Kantarcioglu, and Bhavani Thuraisingham. 2009. Inferring private information using social network data. In Proceedings of the 18th International Conference on World Wide Web. ACM, New York, NY, 1145–1146. Zachary C. Lipton, John Berkowitz, and Charles Elkan. 2015. A critical review of recurrent neural networks for sequence learning. arxiv:cs.LG/1506.00019 Wendy Liu and Derek Ruths. 2013. What’s in a name? Using first names as features for gender inference in Twitter. In Proceedings of the 2013 AAAI Spring Symposium Series. 10–16. Anshu Malhotra, Luam Totti, Wagner Meira Jr., Ponnurangam Kumaraguru, and Virgilio Almeida. 2012. Studying user footprints in different online social networks. In Proceedings of the 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE, Los Alamitos, CA, 1065–1070. Saurav Manchanda and George Karypis. 2018. Distributed representation of multi-sense words: A loss driven approach. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 337–349. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector space. arxiv:cs.CL/1301.3781 Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013b. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111–3119. Sergei Nicvist, Daria Bogatireva, and Victoria Bobivec. 2018. Tweet author gender identification, PAN 2016 task. In Proceedings of the International Conference on Telecommunications, Electronics, and Informatics. 344–347. Claudia Peersman, Walter Daelemans, and Leona Van Vaerenbergh. 2011. Predicting age and gender in online social networks. In Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents. ACM, New York, NY, 37–44. Hanchuan Peng, Fuhui Long, and Chris Ding. 2005. Feature selection based on mutual information criteria of maxdependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 8 (2005), 1226–1238. Francisco Rangel and Paolo Rosso. 2013. On the identification of emotions and authors’ gender in Facebook comments on the basis of their writing style. In Proceedings of the International Workshop on Emotion and Sentiment in Social and Expressive Media. 34–46. Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information Processing & Management 24, 5 (1988), 513–523. Lucia Santamaria and Helena Mihaljevic. 2018. Comparison and benchmark of name-to-gender inference services. PeerJ Computer Science 4 (2018), e156. ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021. 66:38 Ö. Çoban et al. Maarten Sap, Gregory Park, Johannes Eichstaedt, Margaret Kern, David Stillwell, Michal Kosinski, Lyle Ungar, and H. Andrew Schwartz. 2014. Developing age and gender predictive lexica over social media. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1146–1151. Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 11 (1997), 2673–2681. H. Andrew Schwartz, Johannes C. Eichstaedt, Margaret L. Kern, Lukasz Dziurzynski, Stephanie M. Ramones, Megha Agrawal, Achal Shah, et al. 2013. Personality, gender, and age in the language of social media: The open-vocabulary approach. PLoS ONE 8, 9 (2013), e73791. Erhan Sezerer, Ozan Polatbilek, and Selma Tekir. 2019a. Gender prediction from Turkish tweets with neural networks. In Proceedings of the 2019 27th Signal Processing and Communications Applications Conference (SIU’19). IEEE, Los Alamitos, CA, 1–4. Erhan Sezerer, Ozan Polatbilek, and Selma Tekir. 2019b. A Turkish dataset for gender identification of Twitter users. In Proceedings of the 13th Linguistic Annotation Workshop. 203–207. Masoud Talebi and Cemal Köse. 2013. Identifying gender, age and education level by analyzing comments on Facebook. In Proceedings of the 2013 21st Signal Processing and Communications Applications Conference (SIU’13). IEEE, Los Alamitos, CA, 1–4. Cong Tang, Keith Ross, Nitesh Saxena, and Ruichuan Chen. 2011. What’s in a name: A study of names, gender inference, and gender behavior in Facebook. In Proceedings of the International Conference on Database Systems for Advanced Applications. 344–356. Eric S. Tellez, Sabino Miranda-Jiménez, Daniela Moctezuma, Mario Graff, Vladimir Salgado, and José Ortiz-Bejar. 2018. Gender identification through multi-modal tweet analysis using MicroTC and bag of visual words. In Proceedings of the 9th International Conference of the CLEF Association (CLEF’18). http://ceur-ws.org/Vol-2125/. Murat Tezgider, Beytullah Yıldız, and Galip Aydın. 2018. Improving word representation by tuning Word2Vec parameters with deep learning model. In Proceedings of the 2018 International Conference on Artificial Intelligence and Data Processing (IDAP’18). IEEE, New York, NY, 1–7. Abinash Tripathy, Ankit Agrawal, and Santanu Kumar Rath. 2016. Classification of sentiment reviews using n-gram machine learning approach. Expert Systems with Applications 57 (2016), 117–126. Mudasir Ahmad Wani, Nancy Agarwal, Suraiya Jabin, and Syed Zeeshan Hussai. 2018. Design and implementation of iMacros-based data crawler for behavioral analysis of Facebook users. arxiv:cs.SI/1802.09566 Haifeng Wu, Qing Huang, Daqing Wang, and Lifu Gao. 2018. A CNN-SVM combined model for pattern recognition of knee motion using mechanomyography signals. Journal of Electromyography and Kinesiology 42 (2018), 136–142. Dongwen Zhang, Hua Xu, Zengcai Su, and Yunfeng Xu. 2015. Chinese comments sentiment classification based on word2vec and SVMperf. Expert Systems with Applications 42, 4 (2015), 1857–1863. Lei Zhang, Shuai Wang, and Bing Liu. 2018. Deep learning for sentiment analysis: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 4 (2018), e1253. Elena Zheleva and Lise Getoor. 2009. To join or not to join: The illusion of privacy in social networks with mixed public and private user profiles. In Proceedings of the 18th International Conference on World Wide Web. ACM, New York, NY, 531–540. Received April 2020; revised November 2020; accepted January 2021 ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.

Facebook Tells Me Your Gender An Exploratory Study of Gender Prediction for Turkish Facebook Users

Documentos relacionados

Productos

Apoyo

Facebook Tells Me Your Gender An Exploratory Study of Gender Prediction for Turkish Facebook Users

Documentos relacionados

Añadir este documento a la recogida (s)

Añadir a este documento guardado

Sugiéranos cómo mejorar StudyLib