Subido por gifeca7818

Facebook Tells Me Your Gender An Exploratory Study of Gender Prediction for Turkish Facebook Users

Anuncio
Facebook Tells Me Your Gender: An Exploratory Study
of Gender Prediction for Turkish Facebook Users
ÖNDER ÇOBAN, Cukurova University
ALI İNAN, Adana Alparslan Turkes Science and Technology University
SELMA AYŞE ÖZEL, Cukurova University
Online Social Networks (OSNs) are very popular platforms for social interaction. Data posted publicly over
OSNs pose various threats against the individual privacy of OSN users. Adversaries can try to predict private
attribute values, such as gender, as well as links/connections. Quantifying an adversary’s capacity in inferring
the gender of an OSN user is an important first step towards privacy protection. Numerous studies have been
made on the problem of predicting the gender of an author/user, especially in the context of the English
language. Conversely, studies in this field are quite limited for the Turkish language and specifically in the
domain of OSNs. Previous studies for gender prediction of Turkish OSN users have mostly been performed
by using the content of tweets and Facebook comments. In this article, we propose using various features,
not just user comments, for the gender prediction problem over the Facebook OSN. Unlike existing studies,
we exploited features extracted from profile, wall content, and network structure, as well as wall interactions
of the user. Therefore, our study differs from the existing work in the broadness of the features considered,
machine learning and deep learning methods applied, and the size of the OSN dataset used in the experimental
evaluation. Our results indicate that basic profile information provides better results; moreover, using this
information together with wall interactions improves prediction quality. We measured the best accuracy
value as 0.982, which was obtained by combining profile data and wall interactions of Turkish OSN users. In
the wall interactions model, we introduced 34 different features that provide better results than the existing
content-based studies for Turkish.
CCS Concepts: • Social and professional topics → Gender; • Computing methodologies → Natural
language processing; Machine learning; • Networks → Online social networks;
Additional Key Words and Phrases: Facebook, online social networks, attribute inference, gender detection,
text categorization
ACM Reference format:
Önder Çoban, Ali İnan, and Selma Ayşe Özel. 2021. Facebook Tells Me Your Gender: An Exploratory Study of
Gender Prediction for Turkish Facebook Users. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 20, 4, Article
66 (May 2021), 38 pages.
https://doi.org/10.1145/3448253
Authors’ addresses: Ö. Çoban (corresponding author) and S. A. Özel, Çukurova University Department of Computer Engineering 01330 Balcalı Sarıçam Adana/TÜRKİY; emails: [email protected], [email protected]; A. İnan, Adana Alparslan
Turkes Science and Technology University Department of Computer Engineering 01250 Sarıçam Adana/TÜRKİYE; email:
[email protected].
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from [email protected].
© 2021 Association for Computing Machinery.
2375-4699/2021/05-ART66 $15.00
https://doi.org/10.1145/3448253
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
66
66:2
Ö. Çoban et al.
1 INTRODUCTION
Online Social Networks (OSNs) are very popular communication mediums in today’s world.
Prominent OSNs such as Facebook, Twitter, and Instagram are so extremely popular that they
have hundreds of millions of users and are even said to be able to reflect real life [Wani et al.
2018]. It is well known that OSN users provide a substantial amount of personal information
to OSN service providers and other users [Gayo Avello 2011]. Despite the availability of privacy settings to control and customize the visibility of personal data, privacy emerges as an
important concern [Lindamood et al. 2009]. The main reason for this concern stems from the
nature of OSNs in promoting high connectivity and the potential threat of misuse of personal
data.
Concerns against the protection of personal OSN data are not ill-founded. The Guardian has
covered news reporting that trillions of tweets were up for sale [Garside 2015]. In certain disclosures, third parties with direct or indirect access to the OSN data have attacked the individual
privacy of OSN users [Tang et al. 2011]. For instance, the profile information of millions of users
were collected through a Facebook application and used inappropriately for political purposes in
2018 [Graham-Harrison and Cadwalladr 2018].
In an OSN environment, the adversary attacking individual privacy could be anyone in the range
from a simple end user to a third-party application, or an insider—an OSN service provider employee. We focus on adversaries that are end users, and following related work, we assume that
such adversaries employ statistical, analytical tools as well as machine learning (ML) techniques
to infer private traits (e.g., gender, age, political view, links) of OSN users [Tang et al. 2011]. Gender prediction is an important instance of the user attribute inference problem. This is because
a successful gender inference mechanism can be used to harm the privacy of OSN users. For instance, users, especially females, may be exposed to cyberstalking or real-world stalking. Gender
attribute can also be used as auxiliary information to reveal other personal data of an OSN user.
In such a case, gender inference can give way to serious attacks (e.g., phishing, blackmailing) that
may result in economic and societal harms for users. For instance, a mother’s maiden name is one
of the most frequently used authentication traits in Turkey. In such a scenario, an attacker can use
the surname and gender information of a targeted user’s OSN friends to reveal his/her mother’s
maiden name. A typical example of such a successful attack can harm the user in online banking,
where an adversary can impersonate his target to apply for a new credit card or transfer funds
through telephone or online banking. All of these cases show that gender inference over OSN
data not only violates the individual privacy of OSN users but also has other potential adversarial
consequences.
Various solutions have been proposed to the problem of predicting an OSN user’s gender. There
are studies that rely on different approaches that generally use (i) profile information [Burger
et al. 2011; Liu and Ruths 2013; Tang et al. 2011], (ii) network structure [Jernigan and Mistree
2009; Zheleva and Getoor 2009], or (iii) the content of the profile owner’s wall [Alowibdi et al.
2013a]. Profile-based approaches use one or more selected attributes (e.g., display name) to infer
a profile owner’s gender. Network structure–based approaches take advantage of the social
network structure to detect users’ genders. Such approaches often use the links and groups of a
wall owner (WO), along with attributes of other users who have friendship connections with the
WO in a network [Zheleva and Getoor 2009]. For instance, a simple friendship aggregate model
can look at the attribute distribution among the friends of a WO under question so as to have
an inference/insight about a related item. Content-based approaches, however, use text mining
methods in gender prediction. As such, OSNs have taken the attention of the research community
as a rich source of data for different text mining tasks [Fatima et al. 2017; Peersman et al. 2011;
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction
66:3
Rangel and Rosso 2013; Sap et al. 2014], as the most important and well-known characteristic of
OSNs is that users communicate with each other via short text messages [Peersman et al. 2011].
In this article, we perform an exploratory study for the purpose of predicting the gender of Turkish Facebook users. To achieve our goal, we first crawled the Facebook OSN and collected public
data of 20K users with the help of our crawler presented in the work of Coban et al. [2020]. Then,
we used different models that utilize various information obtained from a profile owner’s account
to predict the gender of the corresponding user.
Our models consider the network structure, profile information, wall interactions, and wall content (i.e., posts, comments, and replies) of users, and apply well-known classification techniques
from ML and deep learning (DL). Notice that our work is closely related to authorship classification and gender classification, which have been studied many times before (see our literature
review in Section 2), especially in the context of the English language (Table 1). However, studies
for gender prediction over Turkish OSN data are quite limited. The main reason for this is that
accessing OSN data is very challenging task and requires an additional effort to collect network
structure along with every bit of publicly available data of users. There are also several additional
challenges against text mining research for Turkish. For example, (i) language resources and NLP
tools for Turkish are limited, (ii) there is no publicly available OSN data, and (iii) linguistic and
semantic analyses such as word sense disambiguation and morphological analysis are complicated.
One of the earliest gender detection studies in Turkish is that of Amasyali and Diri [2006],
which focuses on author profiling over Turkish news texts. To the best of our knowledge, there
exist only a few other studies with the aim of gender prediction on chat messages [Kucukyilmaz
et al. 2006], tweets [Sezerer et al. 2019a, 2019b], and Facebook comments [Talebi and Köse 2013;
Çelik and Aslan 2019]. However, it seems that our study is the first research that aims to make
gender prediction of Turkish Facebook users based on various information extracted from users’
profiles. Our main contributions in this work are as follows:
• We apply various models involving profile information, network structure, wall interaction,
and wall content to perform gender prediction. These models have been experimented with
by themselves and in combination with each other.
• We conduct intensive experiments for the wall content model to explore which type of
information is more effective for the gender prediction task in Facebook.
• We use different traditional ML algorithms in all of our models. Additionally, we use different DL algorithms (e.g., Convolutional Neural Network (CNN), Recurrent Neural
Network (RNN)) and language models (e.g., Bidirectional Encoder Representations
from Transformers (BERT)) in the wall content model so as to obtain comprehensive
results.
• We introduce 34 different features in the wall interactions model, which provide better
results than the existing content-based studies for Turkish. These features can easily be
adapted to other languages as long as a lexicon of gender-oriented words in the target
language is available.
• We report better results than the best results previously reported in other related studies
that focus on gender prediction for Turkish OSN users.
• We report challenges for the gender prediction task on Facebook with respect to our models.
The rest of this article is structured as follows. Section 2 reviews the gender prediction studies
over OSNs. Section 3 and Section 4 introduce the datasets and methods we used, respectively.
Section 5 outlines the experimental results. Section 6 presents a discussion of our experimental
results, and Section 7 provides our conclusion and possible directions for future work.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
66:4
Ö. Çoban et al.
Table 1. Summary of Research Works for Gender Detection over Different OSNs
Work
OSN
Features
Data / User Set
Language
Method
Best
Result
Year
[Peersman et al.
2011]
Netlog
Word and character
n-grams
1.53M posts
Dutch
SVM
0.663
2011
[Tang et al. 2011]
Facebook
12 different profile
attributes including
display name
Profile
information of
679K users
English
MNB and J48
0.952
2011
[Deitrick et al.
2012]
Twitter
Character uni-grams
and bi-grams
3.03K tweets
English
Modified BW
0.985
2012
[Burger et al.
2011]
Twitter
Free-text profile
fields and word and
character n-grams
4.10M tweets of
184K users
Multilingual
SVM, NB, and
BW
0.920
2011
[Alowibdi et al.
2013b]
Twitter
5 color-based
features
4 subsets of
53.3K users
Language
independent
NB-Tree
0.743
2013
[Alowibdi et al.
2013a]
Twitter
First name, user
name, 5 color-based
features, and
phoneme sequence
n-grams
Randomly
sampled 180K of
194.2K users
Language
independent
NB, Decision
Tree, and
NB-Tree
0.825
2013
[Tellez et al.
2018]
Twitter
Word and character
n-grams and bag of
visual words
3 different
datasets from
PAN@CLEF
Arabic,
English, and
Spanish
SVM and
Rocchio
0.827
2018
[Fatima et al.
2017]
Facebook
Word and character
n-grams and 64
different
stylistic-based
features
Posts/comments
of 479 users
Roman
Urdu and
English
J48, RF, NB, and
SVM
0.875
2017
[Keeshin et al.
2010]
Facebook
Word, structure, and
count-based features
170K status
updates
English
NB, MaxEnt,
and Perceptron
0.677
2010
[Sap et al. 2014]
Facebook,
Twitter
Lexicon with words
and weights
Data of 75K
Facebook users
and tweets of
11K Twitter
users
English
SVM
0.919
2014
[Rangel and
Rosso 2013]
Facebook
Writing style–based
features
1.2K comments
Spanish
SVM
0.590
2013
[Khandelwal
2019]
Twitter
Bow, character
n-grams, emotions,
etc.
First name,
profile picture,
and 7.5K tweets
of 1K users
Code-mixed
HindiEnglish
SVM, RF, NB
0.805
2019
[Giannakopoulos
et al. 2018]
Twitter
7 features based on
three fields
Profile picture,
display name,
and theme color
of 8.4K users
English
SVM and PNNs
0.872
2018
[Nicvist et al.
2018]
Twitter
Total number of
male/female words,
links, hashtags,
emojis, etc.
436 tweets from
PAN’16
English
Basic Linear
Classifier
0.610
2018
[Bsir and Zrigui
2018a]
Twitter
Word embeddings
240K tweets of
2.4K users
Arabic
GRU, LSTM,
and CNN
0.796
2018
[Han et al. 2018]
Instagram
Features based on
tags, activities, and
images
1.2M tag,
activity, and
image data of
3.7K users
English
LR, SVM, MLP,
and CNN
0.820
2018
(Continued)
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction
66:5
Table 1. Continued
Work
OSN
Features
Data / User Set
Language
Method
Best
Result
Year
[Corriga et al.
2018]
Facebook
Bow
Textual tags
extracted from
3K images of
users
Language
independent
NB and RF
0.630
2018
[Ciot et al. 2013]
Twitter
20-top words/trigrams/hashtags and
tweet/retweet/link/
mention frequencies,
etc.
4 subsets of
8.61K users
Multilingual
including
Turkish
SVM
0.870
2013
[Sezerer et al.
2019a]
Twitter
Word embeddings
Tweets of 4K
users
Turkish
ANN
0.806
2019
[Sezerer et al.
2019b]
Twitter
Bow
Tweets of 5.2K
users
Turkish
SVM
0.723
2019
[Talebi and Köse
2013]
Facebook
n-Grams
50K comments
of 550 users
Turkish
NB
0.908
2013
[Çelik and Aslan
2019]
Facebook
Bow
8.7K comments
of 3K users
Turkish
LR
0.741
2019
Ours
Facebook
Profile attributes,
content and
interaction-based
features
Profile
information and
361K/ 1.27M
activities of 20K
users depending
on the created
dataset
Turkish
Profile information (RF,
0.981)
Network structure (SVM,
0.792)
Wall interactions (RF, 0.966)
Wall content (BERT, 0.926)
Profile information +
Wall interaction (RF, 0.982)
2021
2 RELATED WORK
Gender detection is one of the well-studied topics in computer science. Previously, this task was
performed many times for different text genres, languages, information sources, and so on. In the
literature, gender detection has been made from images, textual content, and/or other manually
created features depending on the domain of the task. Some of the most recently published studies
on this topic include the following. Santamaria and Mihaljevic [2018] utilized five name-to-gender
inference services to classify 7,076 manually labeled names and concluded that three of the tested
services are able to guess the correct gender for more than 98% of the names that the system is able
to classify. Cheung and She [2017] performed gender detection both for Flickr and Fotolog users
from their shared images. The best accuracy rate, measured at 0.79, has been obtained by their
analytical system that is based on the well-known homophily phenomenon [Khanam et al. 2020]:
OSN users with the same gender are akin to share similar contents. Flekova et al. [2016] made age
and gender detection from tweets, and analyzed differences between real user traits and perceived
traits from text. Through experimental analysis, it has been concluded that humans use stereotypes
that lead to systematic biases [Flekova et al. 2016]. Karimi et al. [2016] compared several unsupervised gender detection methods that rely on the name or image of users. They also suggested
a novel method that augments name-based approaches along with gender recognition in facial
images [Karimi et al. 2016]. The authors concluded that their method outperforms name-based
approaches with an accuracy of 0.92 in the gender detection task. Kosinski et al. [2013] performed
personal trait prediction for Facebook users based on their likes. The authors considered different
traits of users including gender and obtained an 0.98 AUC value using logistic linear regression.
As stated before, even though there are many existing studies in this field, in this section, we
present our literature review that mainly focuses on gender prediction studies over OSNs. The
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
66:6
Ö. Çoban et al.
previous works that use text messages, profile information, and images of OSN users can be summarized as follows. Peersman et al. [2011] tried to infer age and gender of OSN users by applying
text categorization [Gupta and Lehal 2009] on textual content of users’ chat activities that are
collected from a Belgian OSN. It has been reported that age and gender categorization are performed with an accuracy of 0.663 by using 50K of the most informative uni-grams. Deitrick et al.
[2012] extracted 9,170 features from 3,031 tweets to perform gender categorization of Twitter users.
They were able to predict gender with an accuracy of 0.985 by applying a modified Balanced
Winnow (BW) algorithm on the most informative 53 features selected by Weka’s feature selectors. Burger et al. [2011] performed gender detection on a large-scale tweet corpus (i.e., 4.1M tweets
of 184K users) that is crawled from Twitter. They have extracted 15.6M character- and word-level
n-gram features from their corpus. It is worth noting that they used both profile information (i.e.,
screen name, profile name, description) and textual content extracted from tweets in their MLbased model. The authors reported the best accuracy as 0.92, which was observed when tweets are
utilized together with profile information. Using only tweets, accuracy measurements declined as
low as 0.76.
Having a huge number of features makes gender classification computationally expensive; however, Alowibdi et al. [2013b] use only five language-independent features (e.g., background color,
text color) that are based on colors preferred by the user in his/her profile to detect gender of
Twitter users. The authors report that they were able to achieve a quite successful result with
an F1-score equal to 0.718. Alowibdi et al. [2013a] also explored profile characteristics for gender classification on Twitter. The authors proposed a technique that uses phonemes to extract
and reduce feature space. It is reported that the highest observed accuracy value is 0.825, which
is obtained by using 16K 3-gram phoneme-based features. Tellez et al. [2018] performed gender
classification based on both text and images posted by Twitter users. It is shown that the best F1score for English is 0.826 when they solely use a text categorization approach. Ciot et al. [2013]
performed gender prediction of Twitter users by using textual content from several different languages except English. They crawled messages posted in four different languages to create the
datasets and achieved the highest accuracy (i.e., 0.87) for Turkish on a dataset that includes 3.6K
Turkish users’ tweets. Tang et al. [2011] employed a display name–centric approach to infer the
gender of Facebook users. In this study, a dataset is created by collecting information of 1.2M users
from New York network with the help of a focused crawler. Their dataset contains profile information (e.g., gender, relationship, friend list) of 679K users who reveal their gender attributes. The
authors achieved an accuracy of 0.952 by using an ML-based prediction model that mainly uses
the first name and other profile information of users.
Fatima et al. [2017] performed gender classification for the purpose of author profiling on Facebook. In this study, the authors asked selected Facebook users to provide their demographic information along with their comments/posts to create their dataset. Based on the experimental results,
the best accuracy value observed is 0.875, which is for a multilingual corpus that includes posts and
comments of users with word uni-gram, character 3-gram, and character 8-gram features. Keeshin
et al. [2010] performed gender classification on Facebook statuses (i.e., posts) using word-based,
count-based, and structure-based features. The best accuracy value is 0.677, which is achieved
with the help of the Naive Bayes (NB) classifier on a dataset including 170K statuses. Sap et al.
[2014] created a lexicon from a corpus of 300M words written by 75K Facebook users to predict
their age and gender. Rangel and Rosso [2013] used some style features to predict the gender of
Facebook users. The authors obtained an accuracy of 0.59 on a dataset that is comprised of 1.2K
comments written in Spanish. Schwartz et al. [2013] applied a lexicon-based learning approach
similar to Sap et al. [2014] for gender prediction of Facebook users and obtained an accuracy of
0.919. Khandelwal [2019] identified both humor and gender of the users based on their tweets, first
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction
66:7
names, and profile pictures. This study obtained 0.805 accuracy as the highest result with the help
of the Support Vector Machine (SVM) classifier for the gender prediction task. Giannakopoulos
et al. [2018] performed gender prediction over a Twitter dataset from Liu and Ruths [2013] and
obtained the best accuracy of 0.872 by using color, image, and display name fields from user profiles. Nicvist et al. [2018] performed gender detection using a simple classifier and obtained 0.610
accuracy on a small dataset that includes 456 tweets. Bsir and Zrigui [2018a] comparatively evaluated three DL models (i.e., Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM),
CNN) with word embeddings for gender detection of Arabic Twitter users. They obtained the
highest accuracy value, equal to 0.796, with the help of the GRU model. Han et al. [2018] executed
age and gender classification of Instagram users based on activities, image objects, and tags. They
obtained an accuracy of 0.820 as the highest result for gender detection by using the Multi-Layer
Perceptron (MLP) algorithm. Corriga et al. [2018] detected the gender and emotion of Facebook
users from the University of Cagliari. They used bag of words (bow) of textual tags that are extracted from 3K images of these users. The highest accuracy value observed is 0.63 for the gender
prediction task.
In context of the Turkish language, the previous studies that we are aware of are as follows.
Amasyali and Diri [2006] performed author profiling over Turkish news texts. The authors obtained an accuracy of 0.96 using bi-gram and tri-gram features. Kucukyilmaz et al. [2006] performed gender prediction on Turkish chat messages and reported that they were able to obtain an
accuracy of 0.81. Sezerer et al. [2019a, 2019b] performed gender detection on Turkish tweets and
obtained the best accuracies as 0.806 and 0.72, respectively. In the context of Facebook, however,
we could only detect a few related studies [Çelik and Aslan 2019; Talebi and Köse 2013] in which
a dataset is created by Talebi and Köse [2013] that includes 50K comments collected from Turkish
Facebook pages. The authors manually labeled the comments and obtained an accuracy of 0.908
using n-gram features. Çelik and Aslan [2019] performed gender prediction over Turkish comments collected from Facebook pages of companies. The authors obtained an accuracy of 0.741 by
using bow features.
To provide an overview, we summarize studies that focus on gender prediction over OSNs in
Table 1. Notice that if a study is performed on a multilingual corpus or multiple OSNs, the best
Acc/F1 score value is given for the underlined language or OSN. As is seen from Table 1, the
majority of works in this field have focused on Twitter users, and only a few studies have been
done for the Turkish language. Our study differs from the others with regard to the used features,
methods, and size of the data. To the best of our knowledge, our study is the first one that tries to
predict the gender of Turkish Facebook users by using features extracted from profile information,
network structure, wall interactions, and wall content of users. In addition, our study produces
better results than all other studies in the context of the Turkish language both in wall interactions
and wall content models that require language-dependent processing and features. Notice that our
result also outperforms the majority of studies in the context of other languages, except for Deitrick
et al. [2012], who have very close results to our study.
3 DATASETS
We collected data from Facebook by using our crawler that uses the HTTP approach and interacts with a browser using Selenium1 API [Coban et al. 2020]. As is seen from Table 2, this dataset
includes both profile information and wall (i.e., timeline) activities (i.e., 2.76M posts, 2.74M comments, and 466K replies) of 20K users from Turkey. It can also be understood that we are able to
discover 2.35M unique users by visiting friends of friends of these users in the breadth-first search
1 https://www.seleniumhq.org/.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
66:8
Ö. Çoban et al.
Table 2. Quantitative Description of Collected Public Facebook Data
Property (# of)
Count
Property (# of)
Visited users
20,000
Posts
Discovered users
3,980,270 Comments
Discovered unique users 2,350,454 Replies
Graph Representation
Nodes
20K
Edges (undirected)
Count
2,762,023
2,743,513
466,995
201,150
Table 3. Distribution of Users with Respect to
Publicity of the Wall Page and Gender Attribute
Criteria
Gender revealed
Open wall
Wall content
Selected set
Male
9,800
9,323
8,563
4,211
Users (#)
Female
6,486
6,070
4,211
4,211
Total
16,286
15,393
12,774
8,422
approach. As is seen from Table 3, only 16,286 of crawled users disclose their genders and the
majority of them do not keep their walls private. The number of users whose walls are public is
9,323 for males and 6,070 for females, respectively. To avoid biased results, we selected a subset
of users having a complete set of features that enables us to apply different gender prediction
models. Notice that in the rest of the article, we use the term wall owner (WO) to represent the
user for whom gender prediction is to be done. In this phase, the following filters are applied on
the crawled user base: (i) there must be at least one activity written by the WO in his/her profile,
and (ii) there must be at least one activity that targets (i.e., written to answer the WO) the WO in
his/her profile. Afterward, we selected an equal number of male and female users and created a
balanced user set including 8,422 users in total.
To perform gender detection, we used different information of these users such as profile attributes, node embeddings, wall interactions, and textual content. To employ textual content, we
grouped activity content into three different sets for the purpose of exploring which type of activity has better potential to give insight into the WO’s gender. As shown in Figure 1, we use the
prefix “WD” to name our text content sets and combine them as a single document by concatenating all activities in the same set. In other words, we extract three different documents from
the wall activities of the WO. The three documents obtained for each user are called WD1, WD2,
and WD3. Document WD1 contains all activity content that addresses/targets the WO (e.g., we are
sure that the related activity is written to answer the WO). For instance, C1 and R4 are included in
WD1, as they directly target the WO (i.e., Müge in this case). In Facebook, there is a similarity between post-comment and comment-reply structures. Users can type their opinions under any post
or comment and the complexity of a post may change depending on the number of relative comments and replies. As such, detecting activities targeting the WO is a challenging task, especially
when an activity contains more than two related activities. Therefore, in this study, we include
an activity in document WD1 only if it is the first related comment or the reply of an activity by
the WO. Document WD2 contains all activity content written directly by the WO. For instance,
activities P1, R1, R2, and R3 are typed by Müge, who is the WO. Document WD3 does not apply
any criteria for activities, and it contains all activity content on the WO’s wall except those that
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction
66:9
Fig. 1. An example Facebook post with relative comments and replies. For each post with its relative comments and replies, if they exist, we group the activity contents into three different document sets: WD1,
WD2, and WD3.
Table 4. Quantitative Description of Three Different Document Sets Created
in the Wall Content Model
Attribute/Dataset
Posts
Comments
Replies
All activities (AA)
Total words in AA
Average word
WD1
Male
Female
22,956
15,236
189,574 126,675
4,541
2,658
217K
144K
1.37M
813K
6.35
5.64
WD2
Male
Female
276,176 156,846
161,061 124,663
118,221 81,351
555K
362K
6.27M
3.05M
11.3
8.30
WD3
Male
Female
299,132 172,082
350,635 251,338
122,762 125,420
772K
506K
7.64M
3.81M
9.90
6.08
do not target the WO. For instance, all activity content except for C2 and C3 are included in WD3,
as it is very hard to automatically determine the exact target of C2 and C3 activities (see Figure 1).
In other words, WD3 is the union of WD1 and WD2 documents.
In this work, we created WD1, WD2, and WD3 datasets by concatenating documents of users
in related types. Table 4 presents quantitative properties of these document sets and shows that
the size of the datasets varies in the range of 361K to 1.27M activities depending on the type of
the content. It is also clear that male users have more activities than females, and their activities
include more words on average.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
66:10
Ö. Çoban et al.
4 METHODS
In this work, we employ five different gender prediction models, from which the naive model is
our simple baseline model that provides a lower bound on the expected performance of all other
models in our study. The other four models are based on profile information, network structure,
wall interactions, and (iv) wall content of the WO. The profile information model uses basic attributes and friend lists of the account owner to predict his/her gender. The network structure
model is based on learned distributed node embeddings of users. The wall interaction model exploits some attributes of activities found both in the profile owner’s wall and any other user’s wall
who interacts with the profile owner. This model also considers the other OSN users who interact
with the profile owner, and the gender-oriented words observed both in activities of the profile
owner and the interacted users. The wall content model, however, uses classical text categorization techniques for the purpose of gender prediction. In the following sections, we describe the
details of our models.
4.1 Naive Model (Baseline)
The naive model is our baseline model that does gender prediction by making a random decision.
It uses a dummy classifier to randomly predict the gender of the WO without considering any
information. Therefore, the gender decision is made by flipping an unbiased coin.
4.2
Network Structure Model
In the network structure model, we use features obtained from connections and locations of a node
in the network. As such, this model uses network structure information to detect the gender of
users. A well-known way to extract features from a graph is to use node2vec, which is an algorithmic framework for representational learning on graphs [Grover and Leskovec 2016]. Node2vec
is derived from the word2vec model [Mikolov et al. 2013a, 2013b], and given any graph, it can
learn continuous feature representations for the nodes by taking random walks through the graph
starting at a target node, which can then be used for various downstream ML tasks. In this work,
we used node embeddings of users for gender detection on a graph representation of our largest
crawled snapshot [Coban et al. 2020], where nodes and edges represent users and their friendship
connections, respectively. We would like to note that the graph representation of our data includes
20K nodes and their 201,150 undirected edges (please see Table 2 for more details).
4.3 Profile Information Model
In the profile information model, we use both profile and friend list information to predict the the
gender of the WO. The assumptions we make and the features used in this model are explained in
the following sections.
4.3.1 First Name–Centric Prediction. In profile information and wall interaction models, we
extract features that mostly require knowing the gender of the interacted users. However, we are
not able to know the gender attributes of all users who interact with the profile owner especially in
wall activities. This is because any user who interacts with the profile owner may not reveal his/her
gender, and, moreover, his/her profile still may not be visited by our crawler even he/she discloses
his/her gender. Note that our crawler discovered 2.35M users (see Table 2), but only 20K of these
users are visited. To handle this challenge, we use a simple function that is inspired from Tang
et al. [2011]. We call this function First Name-Centric Gender (FCG), which returns a first name
–centric value for a given first name according to its usage frequency among male and female users
on the OSN. Frequency represents how often a given first name is used by male and female users.
Therefore, we also define FNM and FNF values to represent the number of male and female users
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction
66:11
Fig. 2. Example FCG calculation and Facebook posts. (a) FCG calculation. (b) A usual post with one of its
relative comments. (c) A direct post without relative comment(s).
who have the given first name. Notice that FNM and FNF values should be obtained from a global
online or offline external source of Turkish person names. However, we obtain these values by
practically counting frequencies of first names of users who revealed their gender on our crawled
snapshot (see Section 3).
As Facebook insists that its members use their real names, users mostly adhere to Facebook’s
real name policy. Therefore, the FCG function returns a value based on the number of male and
female users who have the same first name with the WO on the OSN. Let A be an arbitrary user,
and let FNM and FNF represent the number of male and female users who are using the same first
name with A. Then, the FCG of user A is determined as in Equation (1), where 2 means that the
probability of being male is higher, and 1 represents that the probability of being female is higher.
If the FCG value is equal to 0, this means that the first name is a unisex name that cannot be used
for gender prediction.
⎧
⎪
⎪ 2, if FNM > FNF
FCG = ⎨
⎪ 1, if FNM < FNF
⎪
⎩ 0, otherwise
(1)
We use the FCG value both in profile information and wall interaction (see Section 4.4) models
by using the first token of the display name as the first name of the user at hand. However, we
use the second token of the display name as the first name if we have an abbreviation in the
first token. In the profile information model, we also use the FCG value based on the username to
extract additional features, because usernames are one of the most discriminative and fundamental
elements of OSNs for user identification [Malhotra et al. 2012]. Therefore, usernames can reflect
users’ characteristics and may give a clue about the WO’s gender. Notice that usernames can also
be completely a numeric value (e.g., “123***89”) assigned by Facebook. In this study, we ignore
such usernames and consider the usernames that have multiple tokens concatenated by dots. For
instance, “onder.coban.32”, “inan.ali”, and “ahmet.y.someone” would be valid usernames. In such a
case, the first token of the username is accepted as the first name (i.e., “onder”, “inan”, and “ahmet”)
of the WO and the FCG value is also calculated according to Equation (1).
An example calculation of the FCG value is shown in Figure 2(a), where the central node (i.e.,
a user) represents any user who does not reveal his/her gender. In this example, the FCG value of
this user is 1, as there are two other female users who use the same first name (i.e., “Elif”).
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
66:12
Ö. Çoban et al.
4.3.2 Family Members and Relationships. Facebook users frequently disclose their family members and private relationships. This information can give a clue about the gender of the WO. We
can explain our assumption as follows. Let A (i.e., the WO in this case) and B be two Facebook users
such that B reveals that A is his wife. If so, A is most likely female, and this information can be
accessed from the family member section of A and B (if B also accepts to reveal this information).
This is also true for other relationships revealed by the users. Therefore, such information in users’
profiles can be used to infer their genders. Note that we take a family member into consideration
only if his/her kinship type is observed in one of the following two groups, K1 and K2:
• K1 (Male oriented): {brother, uncle, husband, son, father, grandfather}
• K2 (Female oriented): {sister, aunt, wife, daughter, mother, grandmother}
The groups K1 and K2 given above are defined by considering the gender orientation of kinship
types. For instance, if B reveals that A is his/her family member with one of the kinship types in
K1, then A is most likely male. Using this idea, we use the number of male- and female-oriented
kinships of the WO to extract features. Additionally, we extract features based on the FCG values
of users who are revealed by the WO as a family member or private relationship.
4.3.3 Important Events. Facebook users generally share important events about their private
relationship, education, and so on. This information is included both on the user’s wall and in the
important events section of Facebook. As such information of users only contains important cases
(i.e., buying a car, moving home) in their lives, we believe that it can give some hints about the
gender of the WO. For instance, if an event from the WO’s profile says that the user gave birth
(“doğum yaptı”) or became a mother (“anne oldu”), then we can infer that the WO is most likely
female. Similarly, if the WO reveals that he or she is in a relationship with a female user (i.e., event
actor), then the WO is most likely male. Hence, we believe that using event information is useful
for the gender prediction task.
Based on the assumptions described in the preceding sections, we extracted 15 features that
include basic profile and friendship interaction information of the users. Table 5 presents a list of
these extracted features and their description used in the profile information model. Note that a
value of zero represents the cases that related information is not disclosed or could not be obtained
by our crawler.
4.4 Wall Interaction Model
This model makes its decision by using features that are extracted from the following:
• All activities written by the WO both on his/her own wall and any other user’s wall.
• Users who interact (i.e., typing a message, tagging the WO) with the WO in these activities.
Before giving the details of the extracted features, we first introduce the basic assumptions behind the wall interaction model. As is seen from Figure 2(b), Facebook shows each post in a separate
block, and comments are written under the related posts. It is also possible to write a reply under
the related comment(s) of any post. If the WO’s wall is public, all activities are visible to anyone
and it is possible to extract what is written by the WO and his/her friends. In the wall interaction
model, by using these assumptions, we introduce and extract previously unused features from the
activities and interactions described above. We would like to emphasize that this is the first study
that extracts and uses features from wall interactions of users. In this model, we grouped features
into three sets, which are described in the following sections. Notice that we use first name–centric
prediction (see Section 4.3.1) as in the profile information model to extract features that require
knowing the gender of any user who interacts with the WO.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction
66:13
Table 5. Features and Their Definitions Used in the Profile Information Model
No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Definition
Relation status of the WO: single (1), in a relationship (2),
engaged (3), married (4)
FCG value for a user who is in a relationship with the WO (based
on the display name)
FCG value for the WO (based on the display name)
FCG value for the WO (based on the username)
Interest (or sexual orientation) of the WO: females (1), males
(2), or males and females (3)
Number of users in the OSN who have male (M)- and female
(F)-oriented kinships (see K1 and K2) and use the same first
name as the WO: M > F (1), M < F (2)
If exists, the number of male (M)- and female (F)-oriented
kinships (see K1 and K2) of the WO: M > F (1), M < F (2)
WO’s total number of friends (t)
Total number of male (1) and female (2) friends of the WO
(only considers the friends who reveal their gender)
Overall FCG value (based on display names) of all friends of the
WO
Whether the WO has an event post that is related to his/her
gender or not
The WO has an event post that discloses he or she: is in a
relationship (1), is married (2), is engaged (3), met someone (4)
FCG value for the event actor (based on the display name), if
exists
Whether the WO shared a biography or not
Whether the WO shared a quotation or not
Value(s)
0–4
0–2
0–2
0–2
0–3
0–2
0–2
t
0–2
0–2
0, 1
0, 1
0–2
0, 1
0, 1
4.4.1 Feature Set Extracted from What Users Say to Each Other. In this group, we use a lexiconbased approach and quantify the number of gender-specific Turkish words used in wall activities.
The basic idea is that what the WO says to others and what others say to the WO can give some
clues about his/her gender. As is seen from Figure 3(a), if other users mostly use male words in
their activities on the WO’s wall, the WO is most likely male. However, this model has a drawback
for the following reasons. For one, it is too hard to detect which activity directly targets the WO
under any activity on his/her wall. For instance, there may be many comments and replies under
any post from the WO. In this case, it is a challenging task to detect which activity written by
other users directly targets (i.e., written to answer the WO) the WO, as other users may answer
each other under an activity from the WO. Therefore, we assume an activity targets only the WO
when it satisfies one of the following two conditions:
• It is a direct post shared by another user. Direct posts are used when an information is
directly posted on another user’s wall. Therefore, the targeted user is clear in this case.
• It is the first activity under any activity from WO.
However, there are other difficulties. Words may be used in different senses, and it is possible
to produce different surface forms of a word from its root in Turkish. Additionally, users often
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
66:14
Ö. Çoban et al.
Fig. 3. (a) User A is most likely female with respect to both the number of interacted female users and
female words observed on his/her wall activities. (b) The word cloud obtained from simple surface forms
(without inflections) of Turkish gender words in our lexicon. Blue and pink colors represent male- and femaleoriented words, respectively. Notice that this cloud does not include different surface forms and variations
(e.g., “emmi” → “amca (paternal uncle)”) of given words.
write their messages with an informal language. These challenges make extracting features more
intricate in the context of the Turkish language. For instance, it is possible to observe different
gender words (see Figure 3(a)) even in activities that target the WO. The main reason for this
case stems from some challenges that needs to be handled. Let A and C be two Facebook users
from Figure 3(a), and user A is female. The main reasons behind observing a male word (i.e., “abi
(brother)”) in a targeting activity from C (or F ) are as follows:
• Context: Even it is the first activity located under any activity from A, user C may mention
another user (most likely male in this case) in this activity.
• Topic: The topic of the wall activity may be important, as it can affect what other users say.
For instance, let female A share a photograph of her son in a post. In this case, other users
mostly mention A’s son, not herself, and the majority of gender words will be male oriented
in this case.
• Word sense: User C may use the gender-oriented word in a non-gender-oriented sense. Let
us consider two activities, namely A1 and A2, that include the same word “kız (girl)”:
A1: “Bugün kız kardeşimin dogum günü (Today is my sister’s birthday)”
A2: “Kız kulesi küçük, şirin bir kuledir (Girl tower is a small, cozy tower)”
The word kız is used to show the gender of someone in A1, but the same is not true for the
word observed in A2 where it is the name of the tower. Detecting such cases requires an
effective word sense disambiguation process.
Handling these difficulties requires solving numerous challenging tasks, and they are outside
the scope of this article. Here, we extract features by assuming that the “majority of OSN users often
interact with the users of the same gender.” This assumption is based on the principle of homophily,
which is the tendency of people to form edges (i.e., friendships) with other people who have similar
traits [Korkmaz et al. 2020]. Homophily is a well-established phenomenon that has been observed
to occur frequently in OSNs [Khanam et al. 2020]. For instance, it is observed that users who share
visually similar images are more likely to have the same gender [Cheung and She 2017]. According
to another study, there is substantial level of topical similarity between users who are close to each
other in a social network [Aiello et al. 2012]. It is also possible to derive many such examples, since
studying homophily is an attractive field of OSN research and can provide eminent insight into
other important sub-research fields, such as social tie prediction and link prediction.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction
66:15
To extract features with respect to this assumption, we first build a lexicon that consists of
lexically and semantically gender-oriented Turkish words, as well as their frequently used abbreviations and different surface forms [Doğan 2011] in Facebook. As Facebook users use informal
language and mostly write their messages without adhering to the grammatical rules, we also include the most observed different variations including forms that have common syntax errors (e.g.,
“kardeş” (brother) → {“gardaş”, “birader”, “bilader”}) of these syntactically correct words. Figure 3(b)
depicts the word cloud of these syntactically correct words (different surface forms and variations
are not included to reduce complexity) used in this study. Afterward, we count (i.e., by searching
terms in our lexicon by using regular expressions) male and female words in wall activities and
use their values as our features. In this phase, we apply the following steps for all activities:
• As mentioned previously, a gender-oriented word may also be used in a different sense.
Therefore, we also created an exceptional words lexicon that includes frequently used
but not gender-oriented words and phrases (e.g., “anneler”, “analı”, “kardeşlik”, “annesiz”,
“hanım köylü”). Then, we remove such words from the wall activity content, if observed,
to avoid biased results. Note that this lexicon is very small when compared to the genderoriented words lexicon, as it is just based on our observations.
• ASCII conversion of Turkish vowels are made for the text content obtained from the wall
activity to increase the matching rate with the lexicon. Note that users mostly use English
equivalents of the Turkish vowel letters (e.g., instead of writing “kızım”, users generally
type “kizim” , which requires conversion between letters i and ı).
• Text content in the wall activity is converted into tokens.
• Each token is normalized by removing repetitive letters to increase the matching rate with
the lexicon. This step is applied as described in Section 4.5.2.
• Each normalized token is searched from the lexicon by using regular expressions, and their
frequencies are counted.
Finally, we use the number of counted male- and female-oriented words to extract different
features in the wall interaction model. As stated before, we extract different features based on
whether the activity targets the WO or not. Notice that the fifth and sixth features in Table 6 allow
the model to apply a gender-based filter while counting gender-oriented words. This filter is used
so as to eliminate words such that gender orientation of the word conflicts with the targeted user’s
gender. For instance, let us consider again the example depicted in Figure 3(a). In this case, if the
WO is male, the gender filter will eliminate the words “abla”, “teyze”, and “yenge” , which are
female-oriented words.
4.4.2 Features Based on Interacted Users. Another assumption is that the number of male and
female users who write an activity on the WO’s wall may be used as a feature to determine the
gender of the WO. As shown in Figure 3(a), if the number of female users who write under any
activity of the WO is more than the number of male users, the WO is most likely female. Using this
idea, we extract several features based on the number of male and female users interacted with
the WO in activities. The extracted features in this group mainly depend on the following values:
(i) the number of male and female users tagged by the WO (this information is extracted from the
post titles, e.g., “Ahmet, Ali ve Ayşe ile birlikte film seyrediyor (Ahmet is watching a movie with Ali
and Ayşe)”), if they exist); (ii) the number of male and female users who interact with the WO both
on their wall and the WO’s wall; and (iii) quantities of the activity types.
4.4.3 Features Based on the WO’s Activities. This feature set only considers the activities written
by the WO. The extracted features are mainly based on the quantity and types of the WO’s own
activities on his/her wall.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
66:16
Ö. Çoban et al.
Table 6. Features and Their Definitions Used in the Wall Interaction Model
No.
1, 2
3, 4
5, 6
7
8
9, 10
11, 12
13, 14
15
16, 17
18–20
21–24
25–27
28, 29
30
31, 32
33, 34
Definition
Number of male/female words in all activities written by the WO
(2 attributes)
Number of male/female words addressed to the WO (the target is the
WO, 2 attributes)
Number of male/female words addressed to the WO (the target is the
WO and applies the term gender filter, 2 attributes)
Average number of replies written under comments by the WO
(includes comments by the WO on other users’ walls)
Average number of replies for other comments written under posts by
the WO
Average number of comments/replies written under the posts by the
WO (2 attributes)
Number of all male/female users who write an answer under the posts
and comments by the WO (either the target is the WO or not, 2
attributes)
Number of male/female words in activities written by other users
under posts by the WO (either the target is the WO or not, 2 attributes)
Total number of emoticons (i.e., like, sad, angry, wow, haha, and love)
found in posts by the WO
Number of male/female users who are labeled/tagged (e.g., “Ali is
together with Ahmet and Mehmet”) by the WO in his/her post title (2
attributes)
Total number of posts/comments/replies written by the WO (3
attributes)
Number of posts by the WO that include photograph/user-generated
text/video/memory (4 attributes)
Number of posts by the WO that include location/survey/link (3
attributes)
Number of posts by the WO that include information about his/her
playing a game/important event (2 attributes)
Number of direct posts (see Figure 2(c)) on the WO’s wall. Such posts
often are used to give a direct message or publish a greeting (e.g., “Happy
birthday brother”) on another user’s wall in Facebook.
Number of male/female users who are respondedto by the WO under
their comments and replies (2 attributes)
Number of male/female users who write an activity under activities by
the WO (target is the WO, 2 attributes)
Using these three groups of features as explained above, we obtained a list of 34 features that are
used to infer the gender of the WO. Table 6 summarizes all of these features and their definitions
used in the wall interaction model.
4.5 Wall Content Model
The wall content model uses a content-based approach and extracts features from the activities to perform gender detection with the help of classical text categorization techniques
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction
66:17
[Gupta and Lehal 2009]. The basic idea behind this model is similar to our assumptions used in
the wall interaction model. We assume that male and female users use different words, phrases,
emoticons, and so on. In addition, activities that are written by the WO and targeting the WO
may help to infer his/her gender. In this model, we use WD1, WD2, and WD3 datasets and extract
word- and character-level information to represent activities by using different approaches, which
are described as follows:
• Word-level information: We use classical bow as features. We also use distributed representations of words (i.e., word embeddings) that are obtained by the word2vec model.
• Character-level information: We use character n-grams, which is also known as the bag of
character n-grams.
In text categorization, preprocessing directly affects the accuracy of the results because the applied preprocessing steps determine the set of features extracted. In addition, OSN users generally
use informal language, and their writings mostly include grammar and syntax errors. As such,
this model is intricate in the context of Turkish language like the wall interaction model. In the
wall content model, we applied two preprocessing methods: basic preprocessing and linguistic
preprocessing.
4.5.1 Basic Preprocessing. We apply this type of preprocessing only when the character-level
information is extracted from the text content. In basic preprocessing, we just remove all punctuation marks and digits from the textual data and apply lowercase conversion.
4.5.2 Linguistic Preprocessing. We apply this preprocessing only when the word-level information is used to represent texts. As word-level features often lead to high-dimensional feature space,
linguistic preprocessing is mostly applied to reduce the number of features and extract more meaningful features for text categorization. Our linguistic preprocessing includes the following steps:
• De-asciifying: Users may write words without Turkish characters, and this prevents us from
applying linguistic tasks, as our NLP tool cannot recognize these words as valid Turkish
words. Hence, we apply de-asciifying (e.g., “aldik” is converted into “aldık”) at first.
• Stemming: In Turkish, it is possible to produce many surface forms from a single root word.
In this step, we reduce words to their roots (e.g., “geldi” is stemmed as “gel”).
• Stopwords removal: We remove stopwords introduced in Lucene API.2
• Normalization: We reduce the number of repetitive letters with more than two occurrences
until the word is recognized as a Turkish word by Zemberek.3 If the word is not recognized
as Turkish even we reduce repetitive letters into two, we use this form of the word as Turkish
and include words having double consecutive letters (e.g., “kardeşiiiiimm” → “kardeşim”,
“gardaaaaş” → “gardaaş”).
Note that we use Zemberek 2.0 [Akın and Akın 2007], which is an open source Turkish NLP
tool to perform all linguistic analyses.
4.5.3 Feature Extraction. We consider each word as a feature in the bow model by ignoring its
position in the text. We also use character bi-grams and tri-grams as features in the n-gram model.
In the bow and n-gram models, we apply term weighting after the feature extraction with the
help of the term frequency (tf), binary, and term frequency-inverse document frequency (tf*idf)
2 http://lucene.apache.org/.
3 https://code.google.com/archive/p/zemberek/.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
66:18
Ö. Çoban et al.
Table 7. Obtaining the Document Vector by Averaging Its
Word Vectors [Lin et al. 2015]
Word
room
very
clean
neat
Average Value
d1
–1,102
–6
–287
–101
–374
d2
–202
355
–343
–399
–148
d3
–668
–605
1,077
–274
–118
...
...
...
...
...
...
d 300
–646
–460
–232
–986
–581
methods, which are well-known traditional unsupervised weighting schemes employed in text
categorization [Salton and Buckley 1988]. These schemes can be formulated as follows:
1, if di contains term t
(2)
WBinar y (t, di ) =
0, otherwise
N
(3)
|{di ∈ D : t ∈ di }|
where t is a term and di is the document that is processed. D and N represent the document
collection and number of documents in the collection, respectively. T f (t, di ) also corresponds to
the observed frequency of term t in document di .
In the distributed representation model, we represent each word with a real-valued and lowdimensional vector that is obtained by the word embedding method that is trained on our
textual data [Manchanda and Karypis 2018]. The Word2Vec model is one of the word embedding methods that is based on a neural network structure [Mikolov et al. 2013a, 2013b].
This model uses two different architectures, namely continuous bag of words (CBOW) and
skip-gram, to learn a vector representation of words. The CBOW architecture predicts the
current word based on its context, whereas the skip-gram predicts surrounding words for
the given current word [Zhang et al. 2015]. Unlike traditional feature models, the Word2Vec
model does not lose or ignore the ordering and semantics of the words. Therefore, this
model is quite popular and often shows superior performance in the text categorization field.
In this work, we use the Word2Vec model to build distributed representation of the wall
activities.
WT f ∗I d f (t, di ) = T f (t, di ) ∗ loд
4.5.4 Text Representation. Unlike the previous models, we use both DL and traditional ML
classifiers in the wall content model. On the ML side, we employ classical content features to
form each instance vector directly. However, we apply a simple averaging approach to represent text instances with distributed representations of words (i.e., word embeddings). This is because the word2vec model is an unsupervised approach, and each text has to be converted into
a single feature vector so as to be used with supervised ML techniques. In this task, we apply
the averaging method [Hayran and Sert 2017; Lin et al. 2015] on word embeddings as exemplified in Table 7. Note that for convenient display, the value of each dimension is multiplied
by 10,000 and indicated by di ˜(i = 1, . . . , 300). In this technique, we represent each text document by taking the average of all word vectors in the document. If the Word2Vec model does
not have a vector representation for a word, it is represented by a zero vector. For DL methods,
however, we convert each text document into an n × k matrix, where n represents the number
of unique words in the vocabulary, whereas k corresponds to the layer size of word embeddings
[Kim 2014].
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction
66:19
Fig. 4. (a) General CNN architecture for NLP [Kim 2014]. (b) Basic RNN architecture before (left) and after
(right) the unfolding [Zhang et al. 2018].
4.6 Classification
4.6.1 Traditional ML Classifiers. We employ ML classifiers to build our gender prediction models. On the ML side, we use Weka’s4 well-known classifiers, including NB, Naive Bayes Multinomial (NBM), SVM by importing the LibSVM5 package, Decision Tree (C4.5), Random Forest
(RF), and k-Nearest Neighbor (IBk).
4.6.2 DL Classifiers. In this work, we additionally employ DL classifiers in the wall content
model. On the DL side, we use CNN and RNN, which are two main architectures used in text
categorization and content-based gender detection [Bsir and Zrigui 2018a, 2018b; Kowsari et al.
2020]. Additionally, we apply the recently proposed BERT [Devlin et al. 2019] language model to
predict gender attributes of users.
A general CNN architecture for NLP tasks is shown in Figure 4(a), where each text with length
of n words is represented by k-dimensional word embeddings. Input is prepared by concatenating
vectors of each word in the text, and then the convolution step(s) involving filter(s) in a window
of h words is(are) applied to generate new features. Next, a max-over-time pooling is applied over
the feature map obtained in the previous step so as to get the maximum value as the feature for the
related filter. In this work, we build and utilize a CNN structure that is almost identical to the CNN
used in the work of Kim [2014] with two variations. These CNN variants are listed as follows:
• CNN-static: Uses pre-trained word vectors that are not modified during the training.
• CNN-non-static: Same as the preceding model, except word vectors are updated during the
training.
A basic architecture of RNN, however, is depicted in Figure 4(b), which shows the RNN architecture before and after the unfolding where x t and ht represent the input and hidden state vectors,
respectively at timestep t. Unlike traditional neural networks, inputs and outputs are not independent of each other in RNNs. They are called recurrent because they perform the same task for
every element of the sequence with the output being dependent on the previous computations.
There are two commonly used RNN architectures, namely LSTM and GRU, which were developed
to overcome the exploding and vanishing gradient problem observed in basic RNN. The hidden
state of the LSTM is computed by the following equations [Karpathy et al. 2015; Lipton et al. 2015;
Zhang et al. 2018]:
(4)
i t = σ (Wi · x t + Ui · ht −1 + bi ),
ft = σ (Wf · x t + Uf · ht −1 + bf ),
(5)
4 https://www.cs.waikato.ac.nz/ml/weka/.
5 https://www.csie.ntu.edu.tw/∼jlin/libsvm/.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
66:20
Ö. Çoban et al.
C̃t = tanh(Wc · x t + Uc · ht −1 + bc ),
(6)
Ct = ft × Ct −1 + i t × C̃t
(7)
ot = σ (Wo · x t + Uo · ht −1 + bo ),
(8)
(9)
ht = ot × tanh(Ct ),
where i t , ft , and ot are input, forget, and output gates, respectively. x t and ht are the input vector
and hidden layer value of the cell at timestep t. The Ct , C̃t , and Ct −1 represent current, candidate, and previous cell states, respectively, whereas σ (i.e., sigmoid) and tanh are activation functions. Similarly, how a GRU cell is updated at each timestep t is computed by the following equations [Karpathy et al. 2015; Zhang et al. 2018]:
zt = σ (Wx · x t + Uz · ht −1 + bz ),
(10)
r t = σ (Wr · x t + Ur · ht −1 + br ),
(11)
h̃t = σ (Wh · x t + Uh · (r t × ht −1 ) + bh ),
(12)
(13)
ht = (1 − zt ) × ht −1 + zt × h̃t
where r t and zt represent vectors of reset and update gates differently from the variables used
above. In all equations given above, × represents the element-wise (i.e., Hadamard) product of two
matrices/vectors. Notice that unidirectional RNN layers can be wrapped in bidirectional layers
as well by allowing bidirectional connections in the hidden layer [Zhang et al. 2018]. In bidirectional RNNs, the forward RNN reads the input sequences from start to end, whereas the backward
RNN reads it from end to start [Lipton et al. 2015]. The reader can refer to other words [Lipton
et al. 2015; Schuster and Paliwal 1997] for more detailed information about bidirectional RNN
architectures.
The BERT technique is actually a language model that is designed to pre-train deep bidirectional
representations from unlabeled text by jointly conditioning on both the left and right context in all
layers [Devlin et al. 2019]. Whereas previous word representation models (e.g., word2vec [Mikolov
et al. 2013a, 2013b]) focus on learning context-independent word representations, BERT focuses
on learning context-dependent word representations. BERT learns to understand the relationship
between words with the help of the masked language model (MLM), which is randomly masked
words in a sequence and hence can be used for learning bidirectional representations [Lee et al.
2020]. The BERT architecture is based on bidirectional transformers, and it comes with the following two pre-trained general types:
• BERT BASE : 12 layers (i.e., transformer blocks), 12 attention heads, and 110 million parameters.
• BERT LARGE : 24 layers, 16 attention heads, and 340 million parameters.
The authors of the BERT model trained their models for two NLP tasks including MLM and
next sentence prediction. The BERT model can be applied by fine tuning the pre-trained language representations to downstream tasks, and it obtains state-of-the-art performance on most
NLP tasks, including question answering, text categorization, sentiment analysis, and so on. We
refer readers to the work of Devlin et al. [2019] for a more detailed description of the BERT
model.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction
66:21
In this work, we build our CNN and RNN architectures with the help of keras,6 which is a
DL library in Python. To employ the BERT model, we use two pre-trained models from hugging
face7 (an open source provider of NLP technologies) and build our own BERT model with the help
of the keras library.
4.6.3 Performance Evaluation. In this article, all ML classifiers are tuned with default parameters and all results (except for the dummy classifier in the naive model) are obtained by fivefold
cross validation. DL algorithms used in this work are also configured to run with fivefold cross
validation so as to make a fair comparison. The performance evaluation for all ML and DL methods
is done using several well-known evaluation metrics that are based on values from the confusion
matrix. True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN)
values are used to compare the actual and predicted class labels in the confusion matrix [Tripathy
et al. 2016]. Here, we use accuracy (Acc for short) for performance evaluation of both ML and DL
classifiers. This score is based on the TP, TN, FP, and FN values, and it is calculated as follows:
TP +TN
(14)
Acc =
N
where N is sum of all predictions such that N = T P + T N + F P + F N .
4.7 Measuring Feature Importance
To measure feature importance, we use the mutual information (MI) method, which is also
widely used for the purpose of feature selection. Given to random variables x and y, their MI is
defined in terms of their probabilistic density functions p(x ), p(y), and p(x, y) [Peng et al. 2005]:
p(x, y)
dxdy.
(15)
MI (x; y) =
p(x, y)loд
p(x )p(y)
MI is a measure between two random variables that quantifies the amount of information obtained
about one random variable through the other random variable. In this work, we use MI to measure dependence between features and user gender by using its implementation in yellowbrick8
Python package.
5 EXPERIMENTAL RESULTS
In this section, we present our experimental results obtained by using both stand-alone and an
ensemble of our gender prediction models in the following sections.
5.1 Results of the Naive Model
We first obtained the experimental results by using the naive model (i.e., dummy classifier), which
gives the baseline result to make the performance evaluation for the other models. As expected,
the naive model achieves an accuracy of 0.504 on our dataset.
5.2 Results of the Network Structure Model
In the network structure model, we run node2vec to learn feature representation for every node
in our network that includes 20K nodes and 201K undirected edges (see Table 2). In this phase,
we used default parameter settings provided by Grover and Leskovec [2016], applying an efficient
SGD optimization process. Specifically, we set the number of random walks (r ) to 10, length of
walk (l) to 80, and context size (k) to 10, and used other parameters with their default values. We
6 https://keras.io/.
7 https://huggingface.co/.
8 https://www.scikit-yb.org/en/latest/index.html.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
66:22
Ö. Çoban et al.
Table 8. Accuracy Values of the Network Structure Model with
Respect to Different Numbers of Dimensions and Classifiers
Dimension (d)
16
32
64
128
256
Classifier
SVM
IBk C4.5
0.765 0.751 0.738
0.769 0.745 0.713
0.778 0.751 0.705
0.792 0.750 0.702
0.791 0.741 0.705
NB
0.689
0.662
0.642
0.630
0.622
RF
0.770
0.773
0.772
0.769
0.763
Table 9. Accuracy Values of the Profile Information and Wall Interaction
Models with Respect to Different Classifiers
Model
Profile information
Wall interaction
NB
0.974
0.688
NBM
0.616
0.917
Classifier
SVM C4.5
0.918 0.980
0.574 0.962
RF
0.981
0.966
IBk
0.973
0.828
also utilized different numbers of dimensions (d), where d is {16, 32, 64, 128, 256} so as to explore
the effects of the number of dimensions on the results. After obtaining the feature representations,
we associated each user’s node vector with his/her gender. Finally, we fed the classification-ready
data into different classifiers and obtained results with respect to different numbers of dimensions.
To evaluate this model, we applied well-known traditional ML classifiers, namely NB, NBM, SVM,
C4.5, RF, and IBk, as described in Section 4.6. Table 8 presents our results, which show that the
best accuracy (i.e., 0.792) is obtained with the SVM classifier when the number of dimensions is
equal to 128.
5.3 Results of the Profile Information Model
Similar to the network structure model, we evaluated the profile information model by using traditional ML classifiers. As is seen from Table 9, the results of classifiers are slightly different from
each other. Nevertheless, the best result is obtained by the RF classifier, whereas the worst is produced by the NBM classifier, which is a specific instance of NB classifier and uses a multinomial
distribution for each feature. NBM provides the worst result in this model as it works well for
data that can easily be converted into frequency values, such as word counts in text. In the profile
information model, we achieved an accuracy of 0.981 as the best value, whereas our baseline and
network structure models achieve a value of 0.504 and 0.792 accuracy, respectively.
5.4
Results of the Wall Interaction Model
Next, we performed experiments on the wall interaction model that extracts features (e.g., writer,
gender of writer, number of total posts) from the activities of the WO’s wall. It also employs a
lexicon-based approach that makes use of the male/female words observed in these activities to
extract features. Notice that in this step, only female/male words are considered, not all content
of activities used for feature extraction. As this model is based on the homophily phenomenon,
we first performed a simple analysis so as to explore correctness of our assumption that users
often interact with users of the same gender. We obtained an average number of male and female
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction
66:23
Table 10. Number of Unique Features Extracted with Respect to
the Document Set, Feature Model, and Preprocessing (PRP)
Method for the Wall Content Model
Feature Model
bow
bi-gram
tri-gram
PRP
✗
✗
✗
Document Set
WD1
WD2
WD3
223,577 594,695 480,009
24,117
62,295
48,651
990
1,020
1,021
11,222
12,474
11,974
friends of users in our user set. Our results showed that male users have 19.67 male and 3.72
female friends on average. Female users, have, have 9.67 female and 5.54 male friends on average.
Our feature dependency analysis (see Section 5.6) also shows that the most discriminative features
in this model are based on the numbers of interacted male/female users by the WO. These results
show that our assumption is based on a correct basis, and users often form edges and interact with
others of the same gender.
Afterward, we applied the well-known classifiers as in the profile information model to infer the
gender of the WO. As is seen from Table 9, the most successful result is again obtained by the RF
classifier. However, the worst result is obtained by the SVM classifier. NBM has better performance
in the wall interaction model with respect to its performance in the profile information model. In
the wall interaction model, we achieved the best result at an accuracy of 0.966 as compared to
0.504, 0.792, and 0.981 obtained by the naive, network structure, and profile information models,
respectively. When an overall comparison is made, the profile information model has the best performance among the three models except for the NBM classifier. As we use lexicon-based features
that are extracted from the wall activities in the wall interaction model, the NBM classifier has
better performance for this model.
5.5 Results of the Wall Content Model
The wall content model utilizes text categorization techniques and uses word-level (i.e., bow and
word embeddings) and character-level (i.e., n-grams) information to extract features from content
of the wall activities of users. In this model, we used wall datasets (i.e., WD1, WD2, and WD3) so
as to explore which type of activity contains more valuable information to predict the gender of
Facebook users when content-based analysis is done. We performed experiments using the wall
content model in two different ways, namely by applying traditional ML and DL methods.
5.5.1 Results of Traditional ML Methods. In this phase, we first applied classical text categorization steps on WD1, WD2, and WD3 document sets, including feature extraction, term weighting,
and classification. To reach our goal, we extracted word and n-gram features (with different n values) to represent documents. Note that before feature extraction, we applied preprocessing (PRP
for short) in two different ways (i.e., basic and linguistic) in all cases, and linguistic preprocessing
was only applied on word-level models (bow and word embeddings). For the wall content model,
results of the task after linguistic preprocessing (if applied) are shown with , whereas results
for basic preprocessing are shown with ✗. After the preprocessing, we performed feature extraction, and the number of features obtained are presented in Table 10. According to this table, in
the content-based analysis with classical text categorization techniques, we have to deal with very
high dimensional feature spaces especially in the bow model without linguistic preprocessing. The
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
66:24
Ö. Çoban et al.
Table 11. Accuracy Values of the Wall Content Model Using Bow and n-Gram Methods with Various
Combinations of Preprocessing (PRP), Classifiers (CLS), and Weighting Schemes
CLS
NB
NBM
SVM
C4.5
RF
IBk
WD1
WD2
WD3
Feature
Set
Binary
Tf
Tf*Idf
Binary
Tf
Tf*Idf
Binary
Tf
Tf*Idf
bow
0.634
0.629
0.640
0.596
0.597
0.597
0.629
0.592
0.592
✗
bi-gram
0.597
0.629
0.628
0.607
0.642
0.646
0.627
0.683
0.689
✗
tri-gram
0.650
0.684
0.682
0.643
0.663
0.662
0.683
0.716
0.716
bow
0.830
0.862
0.843
0.823
0.826
0.820
0.823
0.836
0.828
✗
bi-gram
0.736
0.790
0.750
0.691
0.713
0.718
0.754
0.797
0.771
✗
tri-gram
0.821
0.830
0.822
0.750
0.737
0.757
0.811
0.815
0.815
bow
0.613
0.718
0.741
0.585
0.690
0.696
0.621
0.720
0.748
✗
bi-gram
0.806
0.811
0.773
0.759
0.758
0.737
0.807
0.803
0.781
✗
tri-gram
0.819
0.816
0.844
0.764
0.777
0.786
0.790
0.794
0.823
bow
0.804
0.806
0.806
0.774
0.787
0.765
0.804
0.804
0.802
✗
bi-gram
0.682
0.694
0.673
0.652
0.650
0.654
0.627
0.696
0.688
✗
tri-gram
0.760
0.768
0.764
0.682
0.687
0.682
0.747
0.747
0.749
bow
0.740
0.750
0.743
0.687
0.707
0.707
0.741
0.756
0.745
✗
bi-gram
0.671
0.714
0.675
0.630
0.656
0.644
0.703
0.730
0.706
✗
tri-gram
0.708
0.720
0.715
0.639
0.661
0.659
0.712
0.724
0.729
bow
0.644
0.622
0.592
0.659
0.670
0.686
0.692
0.664
0.690
✗
bi-gram
0.643
0.662
0.596
0.608
0.648
0.585
0.667
0.684
0.610
✗
tri-gram
0.631
0.610
0.607
0.626
0.614
0.606
0.662
0.646
0.642
PRP
number of features in the WD2 documents set is higher than the other sets as it contains activity
contents written by the WO.
Following the feature extraction, we applied term weighting using three different schemes, including binary, tf, and tf*idf, then classification experiments were performed on the three document sets. As is seen from Table 11, we obtained our highest results as 0.862, 0.826, and 0.836 on
WD1, WD2, and WD3 sets, respectively. The most successful classifier is generally NBM as we
do text classification, whereas the worst is IBk. Among the weighting schemes, tf often produces
better results than the others. In the n-gram model, there is a direct proportion between length of
the n-grams and the classification performance. It is observed that tri-grams generally outperform
the others. In all cases, the best result is obtained on the WD1 dataset that includes content of
activities that are targeting the WO. These results in the first step show that classical feature models fall behind both the profile information and wall interaction models with respect to accuracy
values, which are 0.981 and 0.966, respectively.
Second, we used word vectors (i.e., word embeddings) to represent documents in the wall content model. To create distributed representations of our document sets (i.e., WD1, WD2, and WD3),
we first trained our Word2Vec model on a huge training collection that contains all activities
(i.e., 5.9M activities from collected data including 2.76M posts, 2.74M comments, and 466K replies
crawled from the 20K users’ wall (see Table 2). We employed the DL4J9 library to build the word
vectors and applied different layer sizes n where n ∈ {50, 100, 200, 300, 500} to investigate its effects
on results. The selected parameter values are as follows: minimum word frequency is 1, learning
rate equals 0.025, iterations is set to 5, window size is equal to 5, and epochs is set to 5. Note that
9 https://deeplearning4j.org/.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction
66:25
Table 12. Accuracy Values of the Wall Content Model Using Word Embeddings with Various
Combinations of Preprocessing (PRP), Layer Size, and Classifiers (CLS)
WD1
CLS
NB
SVM
IBk
C4.5
RF
PRP
200
WD2
300
500
50
100
200
WD3
50
100
300
0.744
0.732 0.716
0.710 0.692 0.723
0.736 0.696 0.679
✗
0.736
0.722 0.711
0.706 0.699 0.737
0.839
0.823 0.795
0.771 0.725 0.822
✗
0.842 0.833 0.807
0.778
✗
0.789
500
50
100
200
0.657 0.764
0.761
0.736 0.715 0.691
300
500
0.734 0.714 0.700
0.680 0.796
0.784
0.772 0.763 0.741
0.797 0.761 0.730
0.688 0.842
0.833
0.812 0.798 0.775
0.785 0.749 0.826
0.808 0.774 0.752
0.716 0.850 0.842
0.824 0.813 0.794
0.786 0.789
0.793 0.791 0.774
0.772 0.778 0.781
0.776 0.813
0.803
0.810 0.815 0.810
0.791 0.796
0.791 0.793 0.778
0.775 0.780 0.784
0.782 0.808
0.820
0.820 0.818 0.820
0.765
0.756 0.753
0.751 0.743 0.753
0.746 0.730 0.740
0.740 0.783
0.780
0.773 0.773 0.761
✗
0.771
0.762 0.750
0.740 0.739 0.755
0.750 0.730 0.727
0.724 0.793
0.781
0.774 0.759 0.763
0.810
0.809 0.801
0.800 0.783 0.797
0.798 0.793 0.784
0.783 0.830
0.825
0.820 0.820 0.811
✗
0.814
0.801 0.805
0.789 0.787 0.800
0.790 0.781 0.784
0.779 0.834
0.832
0.822 0.810 0.805
we used different values from their typical values for epochs and iterations to improve training
success. We selected the number of epochs as lower than 15 since greater values may cause overfitting [Tezgider et al. 2018].
We built two distributed models for each of the document sets depending on the preprocessing
option chosen. Then, we used the mean vector of the observed words’ vector to represent each
document in each set. Finally, we performed classification experiments and obtained results under
different circumstances as presented in Table 12. In this experiment, we were unable to obtain
results with the NBM classifier as distributed representations of our texts contain negative values.10
As is seen from Table 12, the most successful classifier is SVM, whereas the worst one is generally
NB. The highest classification accuracies for each of the three sets are obtained when the layer size
is set to 50, and it is observed that classification performance often decreases while the layer size
increases. The Word2Vec model is not sensitive to the preprocessing option used, as the results
that are obtained with basic preprocessing are slightly different from the results obtained with
linguistic preprocessing in all cases.
The best result observed is an accuracy of 0.850 on the WD3 document set, which shows that
the wall content model cannot outperform the profile information and wall interaction models
again. However, the Word2Vec model is very successful when compared to the classical feature
models used in the first experiment of the wall content model. The distributed text representation
method achieves slightly different results only by using 50-dimensional document vectors, whereas
classical feature models need tens of thousands of features to achieve a similar result. Note that
the Word2Vec model outperforms classical feature models on the WD3 dataset, and it produces
slightly different results on WD1 and WD2 sets as well. Figure 5 lists the top 10 nearest words
to the male-oriented query term “abi” and the female-oriented term “abla” with respect to the
cosine similarity measure. The word vectors were computed from the WD3 dataset using basic
preprocessing and with layer size set to 50. This figure shows that the Word2Vec model is very
successful in clustering semantically similar words.
5.5.2 Results of DL Methods. In this phase, we first used CNN and RNN algorithms, which
automatically discover features from word embeddings. In experiments of this section, we employed word embeddings with layer size 50 that produced better results for ML-based classifiers in
10 The
NBM classifier cannot handle negative feature values.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
66:26
Ö. Çoban et al.
Fig. 5. Top 10 nearest words for query terms “abi” and “abla” with respect to the cosine similarity.
Table 13. Accuracy Values for CNN with Respect
to the Wall Datasets
CLS
Variant
CNN
Static
Non-static
WD1
0.852
0.919
Dataset
WD2 WD3
0.850 0.866
0.879 0.913
previous sections. To run DL algorithms, we used Google Colab,11 which is a free-of-charge service and enables the research community to use the Tesla K80 GPU. First of all, we compared
classification accuracy of the two variants of the CNN algorithm on wall datasets. Note that we
obtained results of our CNN structure with the parameter tuning of Kim [2014], where we applied
filter sizes of 3, 4, and 5 with 128 filters, dropout rate of 0.5, and batch size of 64. Figure 8(a) in the
Appendix depicts the summary of our CNN structure. We configured the CNN to run with five
epochs based on our experiments showing that it often achieves the best result with five epochs.
We then injected our previously trained word embeddings with layer size of 50 in CNN-static and
CNN-non-static variations.
As is seen from Table 13, CNN-non-static provides better results than CNN-static for each of
three datasets. The best result, however, is obtained with an accuracy of 0.919 on the WD1 dataset.
As a second step of our DL experiments, we used the RNN algorithm to perform gender detection.
In this step, we performed experiments only for the WD1 dataset, which provides the best result
for CNN. We would also like to note that we used trainable (i.e., non-static) word embeddings
for the RNN algorithm because the non-static word embeddings provided better results for the
CNN algorithm in the previous step of our DL experiments. Before running the experiments, we
performed parameter tuning by using Bayesian optimization with the help of the skopt12 package.
We performed tuning by using an RNN structure with bidirectional LSTM cells for the purpose of
detecting the number of layers and network parameters. We searched for many parameters, which
are the number of LSTM layer(s), the number of neurons in LSTM layer(s), the number of neurons
in layer(s), whether to use dropout and batch normalization, dropout rate (if dropout is used),
and optimizer. After searching, we detected the optimum network structure, which is stacked
with three bidirectional LSTM layers having 16 neurons each, dropout rate of 0.2, optimizer with
rmsprop, and without batch normalization and a dense layer before the output layer. Figure 8(b)
11 https://colab.research.google.com/notebooks/intro.ipynb.
12 https://scikit-optimize.github.io/stable/.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction
66:27
Table 14. Accuracy Values of RNN and Combination of CNN and RNN
with Respect to Their Different Variants on the WD1 Dataset
CLS
RNN
Combined
Variant
Unidirectional GRU
Bidirectional GRU
Unidirectional LSTM
Bidirectional LSTM
Bidirectional LSTM + CNN-non-static
CNN-non-static + Bidirectional LSTM
Acc
0.865
0.864
0.868
0.873
0.881
0.887
shows the summary of created network structure after the parameter optimization process. Next,
we performed experiments on this structure to detect the number of epochs for better results.
Based on these experiments, we additionally found that this structure provides better results with
eight epochs.
Using our RNN structure and parameter settings, we run experiments by changing the recurrent
unit (i.e., LSTM or GRU) and type (i.e., unidirectional or bidirectional) so as to explore their effects
on the results as well. Table 14 presents our experimental results, which are obtained with LSTM
and GRU units with unidirectional and bidirectional types of our network. As is seen from the table,
the best accuracy (i.e., 0.873) is obtained with the bidirectional LSTM variant of the RNN algorithm.
All variants of RNN outperform static CNN, and the best result is obtained with an accuracy of
0.873 with bidirectional LSTM. However, variants of RNN still fall behind with CNN-non-static.
Additionally, we combined best-performer variants of the CNN and RNN algorithms by using
their optimum structure and parameter settings that have been observed from the previous two
experiments. As stated earlier, bidirectional LSTM and CNN-non-static are the most successful
variants of the CNN and RNN algorithms, respectively. As such, we used these two variants to
create our combined models, namely CNN-static + Bidirectional LSTM, and Bidirectional LSTM
+ CNN-non-static with the same parameter settings. Figure 8(c) and (d) show the structures of
these combined models, respectively. Note that we configured these models to run with five and
seven epochs, respectively. Table 14 also presents obtained accuracy values for these combined
models. As is seen from the table, combined models provide better results than the singleton run
of bidirectional LSTM, but they cannot outperform the sole running of CNN-non-static.
As a last step in our DL experiments, we utilized the BERT method in our wall content model
by applying three different variations. Our aim is to investigate the effect of the domain of texts
on which the BERT model is trained. For this purpose, we used two pre-trained BERT models with
the help of the transformers13 library. These models are bert-base-multilingual-uncased14
and dbmdz/bert-base-turkish-128k-cased15 from which the first one is trained on lowercased
text in the top 102 languages including Turkish with Wikipedia dump, whereas the second one
is trained on a large Turkish corpus that has a size of 35 GB and a vocabulary size of 128K. Additionally, we trained our BERT-like model that uses MLM with the help of using open source
implementation16 of the keras library. In the rest of this article, we use the term bert_fb to represent our BERT model that is trained on 300K randomly selected types of activity content on
Facebook.
13 https://pypi.org/project/transformers/.
14 https://github.com/google-research/bert/blob/master/multilingual.md.
15 https://huggingface.co/dbmdz/bert-base-turkish-128k-cased.
16 https://keras.io/examples/nlp/masked_language_modeling/.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
66:28
Ö. Çoban et al.
Table 15. Accuracy Values of Three Different Fine-Tuned
BERT Models on the WD1 Dataset
BERT Model
bert-base-multilingual-uncased
dbmdz/bert-base-turkish-128k-cased
bert_fb
Acc
0.849
0.884
0.926
For pre-trained models, we downloaded each of them and fine-tuned over our WD1 dataset so
as to predict gender of users. We selected the WD1 dataset as the best accuracy values in the wall
content model is generally obtained with this dataset. To train the bert_fb model, we randomly
selected 300K messages from all of the crawled activities of 20K users. We then included [CLS] and
[SEP] special tokens for each sentence both in these 300K messages and WD1 dataset. We then
trained the bert_fb model with 5 epochs and default parameter settings with 128 hidden layers,
and 8 attention heads. We also set the number of maximum sequence length to 256, embedding
dimension to 128, and vocabulary size to 24K. After training the bert_fb model, we again trained
and fine-tuned it on the WD1 dataset with 6 epochs to predict gender of users. We would like to
note that the best results are often achieved with 6 epochs for bert_fb model.
For each of the BERT models, we obtained results with 5-fold cross validation and used accuracy
metric so as to make a fair comparison with other ML and DL methods. We present obtained results
of the BERT models in Table 15 which shows that the best accuracy (i.e., 0.926) among the BERT
models is achieved by the bert_fb model. This shows that one needs to train BERT model on a
dataset that is from the same domain with the training data. These results also show that the most
successful method in the wall content model is BERT with the highest accuracy of 0.926 compared
to the best accuracies of the CNN (0.919), RNN (0.873), CNN + RNN (0.887), traditional ML methods
(0.862).
5.6 Feature Dependence on Gender in Different Models
In this section, we measure and present the dependence between user gender and features obtained
with our models. To measure the dependence, we used the MI method, which is described briefly
in Section 4.7. We conducted this experiment on features obtained from profile information, wall
interaction, and wall content models. We excluded features that correspond to distributed representation of words in the wall content model and nodes in the network structure model. This
is because word2vec and its derivation, the node2vec, automatically use distributed vectors they
have learned. Note that using the word2vec model, document representation is performed by taking the average of the vectors of words observed in the relevant document (see Table 7). Measuring
attribute importance gives information about which index of these distributed vectors is more important. Since each index stands for a learned feature that does not have a specific name, measuring
MI dependence does not provide meaningful information. We would also like to note that we measured feature dependencies only for bow features extracted from the WD1 dataset and weighted
with raw term frequencies (i.e., tf) in the wall content model. The reason for this is that bow features produce the best result among traditional content features using raw term frequencies on
the WD1 dataset (see Table 11).
Figure 6(a) depicts obtained dependencies of features (see Table 5) used in the profile information
model. As is seen from Figure 6(a), the most important features in profile information are numbered
with 6, 3, 4, and 8, respectively. When definitions of these features are examined from Table 5, it
is clear that kinship relations and the display name of the WO are the most important features in
the profile information model.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction
66:29
Fig. 6. Mutual information dependence between user gender and features with respect to the profile information (a), wall interaction (b), and wall content model (c).
Figure 6(b) likewise shows dependencies of features (see Table 6) used in the wall interaction
model. As is seen from Figure 6(b), the most dependent features on user gender in the wall interaction model are numbered with 32, 33, 31, 11, 34, 13, and 12. Having a closer look at Table 6
for feature descriptions, it is seen that the gender of users who interact with the WO and genderoriented words they use in their activities are the most important features in the wall interaction
model.
Figure 6(c), however, shows the top 50 words with respect to their MI dependency on user gender
in the wall content model. As is seen from Figure 6(c), the most dependent features include genderoriented words in Turkish along with their informal forms: ”abi”, ”kardeş”, ”bey”, ”abla”, ”gardaş”,
”abim”, ”kız”, ”adam”, and so on. This result shows that extracting features based on a lexicon of
gender-oriented words is an appropriate method in the wall interaction model.
5.7 Experimental Results of the Ensemble Models
In the last step of our experiments, we combined profile information, network structure, wall interaction, and wall content models to investigate whether any combination of the three models
can improve our results.
To combine models, we applied a feature extension approach that extends the current feature
space by combining it with the other features obtained from different model(s). For instance, to
combine the profile information and wall interaction models, we use an extended feature space
that includes a union of features from Table 5 and Table 6. For instance, assume that we have
two different feature models and an arbitrary male user’s instance vector that is <2, 3, Male> with
feature model 1, whereas it is <4, 5, Male> for feature model 2. When we combine feature models 1
and 2, the instance vector will be <2, 3, 4, 5, Male> with the help of feature extension. In this phase,
we used this simple approach to combine our feature extraction models.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
66:30
Ö. Çoban et al.
Table 16. Accuracy Values of the Ensemble Model with Respect to Different ML Classifiers
Model
Code
Profile Information + Wall Interaction
Profile Information + Wall ContentWE
Profile Information + Wall ContentBow
Wall Interaction + Wall ContentWE
Wall Interaction + Wall ContentBow
Wall ContentBow + Wall ContentWE
C1 + Wall ContentWE
C1 + Wall ContentBow
C1 + Network Structure
C1 + C6
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
Classifier
NB
NBM
SVM
IBk
C4.5
RF
0.805
0.936
0.650
0.800
0.671
0.642
0.905
0.683
0.624
0.685
0.876
NA
0.759
NA
0.921
NA
NA
0.889
NA
NA
0.563
0.917
0.703
0.615
0.842
0.709
0.589
0.795
0.513
0.795
0.968
0.968
0.946
0.819
0.589
0.675
0.968
0.943
0.608
0.951
0.979
0.979
0.979
0.958
0.961
0.826
0.977
0.977
0.680
0.980
0.982
0.981
0.894
0.964
0.875
0.785
0.981
0.913
0.761
0.899
For this purpose, we selected the best cases (i.e., feature set, document set, parameter settings)
for the four models and experimentally evaluate all combinations of these models.
We would like to note that we have taken different feature sets in the wall content model and
therefore used the best cases for each of them separately to combine this model with other models.
Among traditional feature models (i.e., bow, n-gram), the best result (i.e., 0.862) is provided by the
WD1 document set with bow features and tf weighting. As such, we used this document set and
term weighting and named it wall contentBow . Similarly, wall contentWE represents the wall content
model applied on the WD3 dataset, with the layer size set to 50, as this case gave the best result
(i.e., 0.850).
We obtained results of our ensemble models in two steps: (i) combining best cases of the wall
content model with the handcrafted features of the profile information and wall interaction models,
as well as node embedding features of the network structure model, and (ii) combining learned
features of the DL-based wall content model with the best case of the previous step.
The experimental results of the first step are presented in Table 16, which shows that combining profile information and wall interaction models, coded with C1, improves the performance of
gender prediction and produces an accuracy of 0.982, which is the highest value obtained so far
in this work. Note that NA means that we are unable to obtain the accuracy value, as the NBM
classifier cannot run over datasets that have negative feature values. In this phase, the best results
are obtained with the help of C4.5 or RF classifiers.
In the second step, we combined features learned by the CNN algorithm with the best feature set
detected in the previous step of experiments. This is because CNN with trainable word embeddings
produces higher results (i.e., 0.919) than that of RNN and CNN + RNN methods on the WD1 dataset
in the wall content model. In this phase, we first combined CNN with ML algorithms by using
features extracted by the CNN to feed traditional ML classifiers. This process is actually a means
to create an ensemble at the algorithm level. To do this, we used outputs of the fully connected layer
(i.e., dense layer with 30 neurons) as the learned features and made the predictions by replacing
the softmax layer of our CNN (see Figure 8(a)) with any of the ML algorithms.
We implemented our ensemble model that is inspired from Wu et al. [2018] and again used fivefold cross validation. The ensemble of CNN and any ML classifier (with/without feature extension)
is depicted in Figure 7 and can be summarized as follows:
• For the training process, the WD1 dataset and pre-trained word embeddings (with layer
size of 50) are fed to the CNN.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction
66:31
Fig. 7. Implementation process of ensemble CNN-any ML model.
Table 17. Accuracy Values for the Ensemble of CNN with Traditional ML Classifiers with
Respect to Feature Extension
Feature Set
CNN Features
CNN Features + C1 (see Table 16)
C4.5
0.880
0.878
RF
0.883
0.885
ML Classifier
IBk
NB
NBM
0.875 0.888 0.888
0.869 0.887 0.819
SVM
0.886
0.538
• After training the CNN, the corresponding feature vector is automatically extracted for each
input text.
• The softmax layer is replaced with the ML classifier that is trained with the automatically
extracted feature vectors.
• For the test process, a given text is fed to the well-trained CNN and the test feature vector
is obtained.
• The well-trained ML classifier performs the classification using the test feature vector.
Note that this process is shown with a dashed blue line in Figure 7. We also performed experiments by using this ensemble classifier on the extended feature space that includes CNN features
and other model features. This additional process is also shown with dashed red line in Figure 7.
Using this two-way implementation, we obtained results for the ensemble of CNN with traditional ML classifiers in the wall content model. Next, we also obtained results for the ensemble
model on the extended feature space that includes wall content features automatically extracted
by CNN and the best features coded with C1 in the previous step of our experiments.
Table 17 presents results of the experiments with/without feature extension. As is seen from
Table 17, the best results are obtained with the NB classifier and the results are not sensitive to
feature extension. Based on our previous results, it is clear that this ensemble model produces
better results in both cases than a single run of the traditional ML classifiers on the WD1 dataset
(see Table 11 and Table 12). However, its accuracy falls behind in accuracy (i.e., 0.919 obtained on
the WD1 dataset) of the single run of the CNN (see Table 13.)
Finally, we tried to create an ensemble of the BERT-based wall content model with other models.
This is because the best accuracy is obtained by the BERT model (i.e., fb_bert) in the wall content
model. For this purpose, we extracted sentence vectors from the fb_bert model and fed into the
ML classifiers as done by Kazameini et al. [2020]. Notice that likewise in the ensemble of CNN, this
process is actually an algorithmic-level combination of the methods. To extract sentence vectors
from the BERT model, we used two approaches: (i) concatenating the last four hidden layers that
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
66:32
Ö. Çoban et al.
Table 18. Accuracy Values for the Ensemble of BERT with Traditional ML Classifiers
BERT Sentence Vectors
[CLS]
Concatenation of last four
C4.5
0.654
0.699
RF
0.691
0.749
ML Classifier
IBk
NB
NBM
0.637 0.575 NA
0.697 0.724 NA
SVM
0.627
0.785
Table 19. Comparison of Our Study with Previous Gender Prediction Studies Focusing on Turkish OSN
Users with Respect to Employed Information and the Ensemble Model in the Feature or Algorithm Level
Work
[Ciot et al. 2013]
[Sezerer et al. 2019a]
[Sezerer et al. 2019b]
[Talebi and Köse 2013]
[Çelik and Aslan 2019]
Ours
Prediction Model(s) Depend(s) on . . .
Content
Profile InNetwork
Wall In(Wall
formation
Structure
teractions
Content)
✗
✗
✗
(0.870)
✗
✗
✗
(0.806)
✗
✗
✗
(0.723)
✗
✗
✗
(0.908)
✗
✗
✗
(0.741)
(0.981)
(0.792)
(0.966)
(0.926)
Ensemble
Model
✗
✗
✗
✗
✗
(0.982)
give the best representation for a word [Devlin et al. 2019], and (ii) using a fully connected layer
over the final hidden state corresponding to the [CLS] input token [Adhikari et al. 2019]. We
again applied this process with fivefold cross validation that is similar to the ensemble process of
the CNN. We obtained sentence vectors using the preceding two ways by getting predictions of
the BERT model for our encoded activity content in the WD1 dataset. Table 18 presents results of
this experiment, which shows that running ML classifiers over sentence vectors learned by BERT
does not improve accuracy.
6
DISCUSSION
In this article, we studied the problem of gender prediction of Facebook users from Turkey. As is
seen from Table 1 and Table 19,17 existing gender prediction studies often use the content of OSN
users. However, in this study, we employed different models that use basic profile information,
network structure, wall interactions, and wall content of users. As is seen from our summarized
results in Table 19, the profile information model is more successful in gender prediction with
respect to other models. The main reason for this is that revealed profile attributes are more reliable
than node embeddings, and content-based and interaction-based features. As stated before (see
Table 16), however, building an ensemble of profile and wall interactions models at the feature level
provides the best accuracy (i.e., 0.982) in this study. Table 19 additionally makes it clear that we
obtained better results than the results of the existing Turkish-oriented studies by the previously
unused 34 features (see Table 6) in the wall interactions model and employing the BERT language
model in the wall content model.
17 The value between parentheses next to the indicates the best accuracy of the corresponding study with respect to the
related information/model.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction
66:33
Despite the challenges described in Section 4.4.1, the wall interaction model produces the
second-best results. This shows that wall activities and other interacting users in these activities can be very effective in determining the gender of a Facebook user. This is one of the most
important results in this work, because gender detection based on profile information may not always be possible due to OSN data is often incomplete and privacy-aware users keep their attributes
private. If an adversary is unable to employ the profile information model, then he/she can use the
wall interaction model, which is very effective. Even though the wall interaction model requires
language-dependent processes to detect whether an observed gender-oriented word in any activity is used for gender-based sense, the model’s success can be improved by applying morphological
analysis and word sense disambiguation to eliminate words used in non-gender-oriented sense.
In the wall content model, the best result is often achieved from the WD1 document set. This
proves that activities that target the WO are more important in predicting the WO’s gender. However, the success of the wall content model falls behind the profile information and wall interaction
models. The reason for this is that Facebook activity content is often dirty and contains many typos. To handle the dirty content, spelling correction can also be employed in the preprocessing
step to extract more meaningful features. For spelling correction, well-known algorithms or tools
may not provide satisfactory performance, as users mostly use different abbreviations and write
words without obeying any grammatical rules. Creating and using a lexicon may help to correct
typos.
The wall content model has also some other drawbacks: (i) compared to the other three models, the wall content model that is applied with bow and n-gram features has a huge number of
features that makes gender classification computationally expensive, and (ii) extracting discriminative gender-specific features in this model is hard because Facebook users generally post activities about everything they face in their daily lives. In this model, DL algorithms including CNN,
RNN, and their combinations (e.g., CNN + RNN) often outperform traditional ML algorithms used
with word embeddings. This shows that it is possible to overcome the challenge of having highdimensional feature space without loss of performance by employing word embeddings. However,
using the BERT language model outperforms all of the traditional ML classifiers and other DL algorithms in the wall content model. This is because BERT automatically learns context-aware word
embeddings, unlike traditional word embeddings obtained by the word2vec model. An important
note here is that BERT produces better results when trained on a corpus that is is from the same
domain with training data. This is because textual content from the same domain is often akin
to including similar words, phrases, sentences, and so on. Creating an ensemble of CNN and ML
classifiers (especially SVM) often improves classification accuracy for image data. However, in this
work, a single run of the CNN and BERT models produces better results than their ensemble models created with ML classifiers. This is possibly caused because the discriminative power of the
automatically learned features does not increase without going through the activation function in
the last softmax layer. Therefore, we suggest this investigation for other languages in the context
of gender prediction based on user-generated content.
Facebook has some obstacles stemming from its nature such that detecting to whom an activity
targets among users who posted before under the same parent activity is a very challenging task.
Developing an effective solution to this problem will make it possible to further increase the performance of the wall interaction and wall content models. However, a first name–centric strategy
used in the profile information and wall interaction models is quite successful, and the first name
is very effective in the gender prediction task. It is possible to infer the gender attribute if the username has been selected to include the user’s first name, even if the account is completely private.
This outcome is also verified by feature dependencies of user gender in different models. In the
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
66:34
Ö. Çoban et al.
profile information model, the most discriminative feature is the display name of users, whereas
the most important features in the wall interaction model depend on the gender of interacted users
and the number of gender-oriented words in their activities. This also proves that our assumption
in the wall interaction model is based correctly.
As stated before, OSN data is often incomplete. Users may keep their profile attributes, friendship connections, and wall activities private. Some of them may just keep some information private, whereas others may completely keep their Facebook account private. This depends on privacy
awareness of each user in the network. Therefore, profile information and wall interaction models
can be employed together for detecting the gender of users with a high degree of accuracy. Therefore, the models evaluated in this article can be employed for any Facebook user. If the WO keeps
his/her data private, then an adversary can use the naive model to make a random guess. Otherwise, he/she can use our models separately or in combination depending on the available public
data on the target user’s account. Note that our fivefold results indicate that an adversary is able to
perform gender detection with accuracies of 0.981, 0.792, 0.966, and 0.926 by using the profile information, network structure, wall interaction, and wall content models, respectively. He/she can
also improve the best accuracy to 0.982 by combining wall interaction and wall content models if
profile information and wall interactions exist in the WO’s account.
To ensure the privacy of Facebook users, we advise that (i) the wall and other profile pages must
be kept private, and (ii) the selected username should consist of only numbers, or not include any
part of the WO’s real name, especially the first name. However, these precautions may not be
enough if your connections are not careful. Even if you keep your profile information and wall
content private, if one of your connections in your social network does not keep his/her connection
with you private, your privacy may be violated. For instance, if any user reveals that he/she is in
a relationship with you, or you are his/her family member, then any other person is able to have
insight to your sensitive information, including your gender. Additionally, if one of your friends
types an activity that includes any gender-oriented word(s) and targets you, whether on your wall
or his/her wall, then he/she may reveal your gender.
7
CONCLUSION AND FUTURE WORK
In this article, we perform gender detection of Facebook users not only by using textual content
but also by using the profile information, network structure, and wall interactions. We explore and
report the best model by using our models separately and in combination. Based on our experimental results, we conclude that the gender of Facebook users can be inferred even just by using
a person’s display name. Moreover, other different models can be employed for gender detection
with a high degree of accuracy if profile information is not available. The wall interaction model is
one of these models, as it is able to achieve second-best accuracy and outperforms other contentbased existing studies in the context of the Turkish language. We conclude that this is one of the
most important findings of this work.
Wall content and wall interaction models require language-specific tasks, and therefore further
improvement is possible by using effective preprocessing, word sense disambiguation, and language models. In the wall content model, the BERT model trained on a corpus from the same
domain with the training data outperforms all ML classifiers and other DL algorithms. We also
conclude that the BERT model may provide much better results when trained on a much larger
corpus.
As future work, we are planning to focus on wall interaction and wall content models. In the
wall interaction model, we will try to create a lexicon including words along with their polarity
scores in terms of their use by male and female users. In the wall content model, we will investigate
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction
66:35
how the performance of the BERT language model changes depending on the size of corpus it is
trained on.
APPENDIX
Fig. 8. The summary of our CNN (a), stacked RNN (b), CNN + RNN (c), and RNN + CNN (d) structures.
DATA AVAILABILITY
This article experimented with publicly available OSN data collected from Facebook according
to the methodology published in the work of Coban et al. [2020]. Regardless of the data being
shared publicly by the corresponding users, the dataset contains sensitive personal information.
To respect these users’ individual privacy, the data will not be shared in its raw form. The corresponding author offers his best effort toward providing a thoroughly anonymized form of the
dataset under reasonable justification.
ACKNOWLEDGMENTS
We would like to thank our referees and editors for their valuable suggestions that helped us
significantly improve the article.
REFERENCES
Ashutosh Adhikari, Achyudh Ram, Raphael Tang, and Jimmy Lin. 2019. DocBERT: BERT for document classification.
arxiv:1904.08398
Luca Maria Aiello, Alain Barrat, Rossano Schifanella, Ciro Cattuto, Benjamin Markines, and Filippo Menczer. 2012. Friendship prediction and homophily in social media. ACM Transactions on the Web 6, 2 (2012), 1–33.
Ahmet Afsin Akın and Mehmet Dündar Akın. 2007. Zemberek, an open source NLP framework for Turkic languages.
Structure 10 (2007), 1–5.
Jalal S. Alowibdi, Ugo A. Buy, and Philip Yu. 2013a. Empirical evaluation of profile characteristics for gender classification
on Twitter. In Proceedings of the 2013 12th International Conference on Machine Learning and Applications, Vol. 1. IEEE,
Los Alamitos, CA, 365–369.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
66:36
Ö. Çoban et al.
Jalal S. Alowibdi, Ugo A. Buy, and Philip Yu. 2013b. Language independent gender classification on Twitter. In Proceedings
of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE, Los Alamitos,
CA, 739–743.
M. Fatih Amasyalı and Banu Diri. 2006. Automatic Turkish text categorization in terms of author, genre and gender. In
Proceedings of the International Conference on Application of Natural Language to Information Systems. 221–226.
Bassem Bsir and Mounir Zrigui. 2018a. Gender identification: A comparative study of deep learning architectures. In Proceedings of the International Conference on Intelligent Systems Design and Applications. 792–800.
Bassem Bsir and Mounir Zrigui. 2018b. Enhancing deep learning gender identification with gated recurrent units architecture in social text. Computación y Sistemas 22, 3 (2018), 757–766.
John D. Burger, John Henderson, George Kim, and Guido Zarrella. 2011. Discriminating gender on Twitter. In Proceedings
of the Conference on Empirical Methods in Natural Language Processing. ACM, New York, NY, 1301–1309.
Özer Çelik and Ahmet Faruk Aslan. 2019. Gender prediction from social media comments with artificial intelligence.
Sakarya Üniversitesi Fen Bilimleri Enstitüsü Dergisi 23, 6 (2019), 1256–1264.
Ming Cheung and James She. 2017. An analytic system for user gender identification through user shared images. ACM
Transactions on Multimedia Computing, Communications, and Applications 13, 3 (2017), 1–20.
Morgane Ciot, Morgan Sonderegger, and Derek Ruths. 2013. Gender inference of Twitter users in non-English contexts. In
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1136–1145.
Onder Coban, Ali Inan, and Selma Ayse Ozel. 2020. Towards the design and implementation of an OSN crawler: A case of
Turkish Facebook users. International Journal of Information Security Science 9, 2 (2020), 76–93.
Andrea Corriga, Simone Cusimano, Francesca Malloci, Lodovica Marchesi, and Diego Reforgiato Recupero. 2018. Leveraging cognitive computing for gender and emotion detection. In Proceedings of the 4th Workshop on Sentic Computing,
Sentiment Analysis, Opinion Mining, and Emotion Detection (EMSASW’18). 47–56.
William Deitrick, Zachary Miller, Benjamin Valyou, Brian Dickinson, Timothy Munson, and Wei Hu. 2012. Gender identification on Twitter using the Modified Balanced Winnow. Communications and Network 4, 3 (2012), 189–195.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. arxiv:cs.CL/1810.04805
Enfel Doğan. 2011. Türkiye Türkçesine Cinsiyet Kategorisinin İzleri. Journal of International Social Research 4, 17 (2011),
89–98.
Mehwish Fatima, Komal Hasan, Saba Anwar, and Rao Muhammad Adeel Nawab. 2017. Multilingual author profiling on
Facebook. Information Processing & Management 53, 4 (2017), 886–904.
Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar, and Daniel Preoţiuc-Pietro. 2016. Analyzing biases in human
perception of user age and gender from text. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 843–854.
Juliette Garside. 2015. Twitter puts trillions of tweets up for sale to data miners. The Guardian. Retrieved March 23, 2021
from https://www.theguardian.com/technology/2015/mar/18/twitter-puts-trillions-tweets-for-sale-data-miners.
Daniel Gayo Avello. 2011. All liaisons are dangerous when all your friends are known to us. In Proceedings of the 22nd ACM
Conference on Hypertext and Hypermedia. ACM, New York, NY, 171–180.
Orestis Giannakopoulos, Nikos Kalatzis, Ioanna Roussaki, and Symeon Papavassiliou. 2018. Gender recognition based on
social networks for multimedia production. In Proceedings of the 2018 IEEE 13th Image, Video, and Multidimensional
Signal Processing Workshop (IVMSP’18). IEEE, Los Alamitos, CA, 1–5.
Emma Graham-Harrison and Carole Cadwalladr. 2018. Revealed: 50 million Facebook profiles harvested for Cambridge
Analytica in major data breach. The Guardian. Retrieved March 23, 2021 from https://www.theguardian.com/news/
2018/mar/17/cambridge-analytica-facebook-influence-us-election.
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 855–864.
Vishal Gupta and Gurpreet S. Lehal. 2009. A survey of text mining techniques and applications. Journal of Emerging Technologies in Web Intelligence 1, 1 (2009), 60–76.
Kyungsik Han, Yonggeol Jo, Youngseung Jeon, Bogoan Kim, Junho Song, and Sang-Wook Kim. 2018. Photos don’t have me,
but how do you know me? Analyzing and predicting users on Instagram. In Adjunct Publication of the 26th Conference
on User Modeling, Adaptation and Personalization. ACM, New York, NY, 251–256.
Ahmet Hayran and Mustafa Sert. 2017. Sentiment analysis on microblog data based on word embedding and fusion techniques. In Proceedings of the 2017 25th Signal Processing and Communications Applications Conference (SIU’17). IEEE, Los
Alamitos, CA, 1–4.
Carter Jernigan and Behram F. T. Mistree. 2009. Gaydar: Facebook friendships expose sexual orientation. First Monday 14,
10 (2009). https://firstmonday.org/ojs/index.php/fm/article/download/2611/2302.
Fariba Karimi, Claudia Wagner, Florian Lemmerich, Mohsen Jadidi, and Markus Strohmaier. 2016. Inferring gender from
names on the web: A comparative evaluation of gender detection methods. In Proceedings of the 25th International
Conference Companion on World Wide Web. 53–54.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction
66:37
Andrej Karpathy, Justin Johnson, and Li Fei-Fei. 2015. Visualizing and understanding recurrent networks.
arxiv:cs.LG/1506.02078
Amirmohammad Kazameini, Samin Fatehi, Yash Mehta, Sauleh Eetemadi, and Erik Cambria. 2020. Personality trait detection using bagged SVM over BERT word embedding ensembles. arxiv:cs.CL/2010.01309
Jeremy Keeshin, Zach Galant, and David Kravitz. 2010. Machine Learning and Feature Based Approaches to Gender Classification of Facebook Statuses.
Kazi Zainab Khanam, Gautam Srivastava, and Vijay Mago. 2020. The homophily principle in social network analysis.
arxiv:cs.SI/2008.10383
Ankush Khandelwal. 2019. Towards Identifying Humor and Author’s Gender in Code-Mixed Social Media Content. Ph.D.
Dissertation. International Institute of Information Technology Hyderabad.
Yoon Kim. 2014. Convolutional neural networks for sentence classification. arxiv:cs.CL/1408.5882
Gizem Korkmaz, Chris J. Kuhlman, Joshua Goldstein, and Fernando Vega-Redondo. 2020. A computational study of homophily and diffusion of common knowledge on social networks based on a model of Facebook. Social Network Analysis
and Mining 10, 1 (2020), 5.
Michal Kosinski, David Stillwell, and Thore Graepel. 2013. Private traits and attributes are predictable from digital records
of human behavior. Proceedings of the National Academy of Sciences 110, 15 (2013), 5802–5805.
Kamran Kowsari, Mojtaba Heidarysafa, Tolu Odukoya, Philip Potter, Laura E. Barnes, and Donald E. Brown. 2020. Gender
detection on social networks using ensemble deep learning. In Proceedings of the Future Technologies Conference. 346–
358.
Tayfun Kucukyilmaz, B. Barla Cambazoglu, Cevdet Aykanat, and Fazli Can. 2006. Chat mining for gender prediction. In
Proceedings of the International Conference on Advances in Information Systems. 274–283.
Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2020. BioBERT:
A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (2020), 1234–
1240.
Yiou Lin, Hang Lei, Jia Wu, and Xiaoyu Li. 2015. An empirical study on sentiment classification of Chinese review using
word embedding. arxiv:cs.CL/1511.01665
Jack Lindamood, Raymond Heatherly, Murat Kantarcioglu, and Bhavani Thuraisingham. 2009. Inferring private information using social network data. In Proceedings of the 18th International Conference on World Wide Web. ACM, New York,
NY, 1145–1146.
Zachary C. Lipton, John Berkowitz, and Charles Elkan. 2015. A critical review of recurrent neural networks for sequence
learning. arxiv:cs.LG/1506.00019
Wendy Liu and Derek Ruths. 2013. What’s in a name? Using first names as features for gender inference in Twitter. In
Proceedings of the 2013 AAAI Spring Symposium Series. 10–16.
Anshu Malhotra, Luam Totti, Wagner Meira Jr., Ponnurangam Kumaraguru, and Virgilio Almeida. 2012. Studying user
footprints in different online social networks. In Proceedings of the 2012 IEEE/ACM International Conference on Advances
in Social Networks Analysis and Mining. IEEE, Los Alamitos, CA, 1065–1070.
Saurav Manchanda and George Karypis. 2018. Distributed representation of multi-sense words: A loss driven approach. In
Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 337–349.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector
space. arxiv:cs.CL/1301.3781
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013b. Distributed representations of words and
phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111–3119.
Sergei Nicvist, Daria Bogatireva, and Victoria Bobivec. 2018. Tweet author gender identification, PAN 2016 task. In Proceedings of the International Conference on Telecommunications, Electronics, and Informatics. 344–347.
Claudia Peersman, Walter Daelemans, and Leona Van Vaerenbergh. 2011. Predicting age and gender in online social networks. In Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents. ACM, New York,
NY, 37–44.
Hanchuan Peng, Fuhui Long, and Chris Ding. 2005. Feature selection based on mutual information criteria of maxdependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27,
8 (2005), 1226–1238.
Francisco Rangel and Paolo Rosso. 2013. On the identification of emotions and authors’ gender in Facebook comments
on the basis of their writing style. In Proceedings of the International Workshop on Emotion and Sentiment in Social and
Expressive Media. 34–46.
Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information Processing & Management 24, 5 (1988), 513–523.
Lucia Santamaria and Helena Mihaljevic. 2018. Comparison and benchmark of name-to-gender inference services. PeerJ
Computer Science 4 (2018), e156.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
66:38
Ö. Çoban et al.
Maarten Sap, Gregory Park, Johannes Eichstaedt, Margaret Kern, David Stillwell, Michal Kosinski, Lyle Ungar, and H.
Andrew Schwartz. 2014. Developing age and gender predictive lexica over social media. In Proceedings of the 2014
Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1146–1151.
Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing
45, 11 (1997), 2673–2681.
H. Andrew Schwartz, Johannes C. Eichstaedt, Margaret L. Kern, Lukasz Dziurzynski, Stephanie M. Ramones, Megha
Agrawal, Achal Shah, et al. 2013. Personality, gender, and age in the language of social media: The open-vocabulary
approach. PLoS ONE 8, 9 (2013), e73791.
Erhan Sezerer, Ozan Polatbilek, and Selma Tekir. 2019a. Gender prediction from Turkish tweets with neural networks. In
Proceedings of the 2019 27th Signal Processing and Communications Applications Conference (SIU’19). IEEE, Los Alamitos,
CA, 1–4.
Erhan Sezerer, Ozan Polatbilek, and Selma Tekir. 2019b. A Turkish dataset for gender identification of Twitter users. In
Proceedings of the 13th Linguistic Annotation Workshop. 203–207.
Masoud Talebi and Cemal Köse. 2013. Identifying gender, age and education level by analyzing comments on Facebook. In
Proceedings of the 2013 21st Signal Processing and Communications Applications Conference (SIU’13). IEEE, Los Alamitos,
CA, 1–4.
Cong Tang, Keith Ross, Nitesh Saxena, and Ruichuan Chen. 2011. What’s in a name: A study of names, gender inference,
and gender behavior in Facebook. In Proceedings of the International Conference on Database Systems for Advanced
Applications. 344–356.
Eric S. Tellez, Sabino Miranda-Jiménez, Daniela Moctezuma, Mario Graff, Vladimir Salgado, and José Ortiz-Bejar. 2018.
Gender identification through multi-modal tweet analysis using MicroTC and bag of visual words. In Proceedings of the
9th International Conference of the CLEF Association (CLEF’18). http://ceur-ws.org/Vol-2125/.
Murat Tezgider, Beytullah Yıldız, and Galip Aydın. 2018. Improving word representation by tuning Word2Vec parameters
with deep learning model. In Proceedings of the 2018 International Conference on Artificial Intelligence and Data Processing
(IDAP’18). IEEE, New York, NY, 1–7.
Abinash Tripathy, Ankit Agrawal, and Santanu Kumar Rath. 2016. Classification of sentiment reviews using n-gram machine learning approach. Expert Systems with Applications 57 (2016), 117–126.
Mudasir Ahmad Wani, Nancy Agarwal, Suraiya Jabin, and Syed Zeeshan Hussai. 2018. Design and implementation of
iMacros-based data crawler for behavioral analysis of Facebook users. arxiv:cs.SI/1802.09566
Haifeng Wu, Qing Huang, Daqing Wang, and Lifu Gao. 2018. A CNN-SVM combined model for pattern recognition of knee
motion using mechanomyography signals. Journal of Electromyography and Kinesiology 42 (2018), 136–142.
Dongwen Zhang, Hua Xu, Zengcai Su, and Yunfeng Xu. 2015. Chinese comments sentiment classification based on
word2vec and SVMperf. Expert Systems with Applications 42, 4 (2015), 1857–1863.
Lei Zhang, Shuai Wang, and Bing Liu. 2018. Deep learning for sentiment analysis: A survey. Wiley Interdisciplinary Reviews:
Data Mining and Knowledge Discovery 8, 4 (2018), e1253.
Elena Zheleva and Lise Getoor. 2009. To join or not to join: The illusion of privacy in social networks with mixed public
and private user profiles. In Proceedings of the 18th International Conference on World Wide Web. ACM, New York, NY,
531–540.
Received April 2020; revised November 2020; accepted January 2021
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 20, No. 4, Article 66. Publication date: May 2021.
Descargar