Content Analysis with Stata

Anuncio
Content Analysis with Stata
M. Escobar([email protected]) y J.L. Alonso Berrocal([email protected])
Universidad de Salamanca
8th Spanish Stata Users Group meeting
Madrid, 22th October-2015
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
1 / 54
Table of Contents
Overview
Background
Content analysis
Social network analysis
Coincidence analysis
Stata users-written commands
The command precoin
Multiple variables
Thesaurus strings
Words
The command coin
Next steps
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
2 / 54
Background
Content Analysis
Content Analysis
Definitions
Content analysis is a technique used in the social sciences for the
systematic study of the contents of the communication.
“A systematic, replicable technique for compressing many words of
text into fewer content categories based on explicit rules of coding”
[Berelson, 1952].
“Any technique for making inferences by objectively and
systematically identifying specified characteristics of messages”
[Holsti, 1969].
“Content analysis is a research technique for making replicable and
valid inferences from data to their context” [Krippendorff, 1980].
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
3 / 54
Background
Programs for content analysis
Software for content analysis
Programs
Qualitative analyzers
Nvivo
Atlas-ti
QDA miner
Statistical analyzers
WordStat
TextAnalyst
LIWC
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
4 / 54
Background
Qualitative analysis programs
Qualitative analysis programs
Nvivo
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
5 / 54
Background
Qualitative analysis programs
Qualitative analysis programs
Atlas-ti
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
6 / 54
Background
Statistical analyzers
Statistical analysts
WordStat for QDA (and for Stata)
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
7 / 54
Background
Social network analysis
Social network analysis
Stata programs
Although there are no tools for SNA in Stata, some advanced users
have begun to write some routines. I wish to highlight the following
works from which I have obtained insights:
Corten [2011] wrote a routine to visualize social networks [netplot]
Miura [2012] created routines (SGL) to calculate networks centrality
measures, including two Stata commands [netsis and netsummarize]
White presented a suite of Stata programs for network meta-analysis
which includes the network graphs of Anna Chaimani in the 2013 UK
users group meeting. Cerulli and Zinilii presented a procedure [datanet]
to prepare a dataset for analysis purposes in the 2014 Italian Stata
Users Group meeting.
Grund [2014] have created a collection of programs to plot and analyze
social networks in the Nordic and Baltic Stata Users Group
[nwcommands].
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
8 / 54
Background
Coincidence analysis
Coincidence analysis
Definition
Coincidence analysis is a set of techniques whose object is to detect which
people, subjects, objects, attributes or events tend to appear at the same
time in different delimited spaces.
These delimited spaces are called scenarios (n), and are considered as
units of analysis (i).
In each scenario a number of J events Xj may occur (1) or may not
(0) occur.
The starting point is an incidence matrix (X) an n × J matrix
composed by 0 and 1, according to the incidence or not of every
event Xj .
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
9 / 54
Background
Old example
Pictures analysis
4 pictures (scenarios) & 8 different people (events)
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
10 / 54
Background
Albums Analysis
Example with names
Father, mother, grandmother and 5 children
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
11 / 54
Background
Albums Analysis
Example with codes
Turina, Garzón, Joaquı́n, Marı́a, Concha, José Luis, Obdulia, Valle
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
12 / 54
Background
Albums Analysis
Coincidences graphs
MDS-Biplot-CA-PCA
Turina (MDS)
Turina (Biplot)
Obdulia Garzón
Joaquín Turina
Joaquín
Josefa Valle
Concha
María
Joaquín
Josefa
Garzón
Josefa
Valle
Obdulia Garzón
José Luis
Josefa Garzón
María
Obdulia
Obdulia
José
Luis
Concha
Joaquín Turina
MDS coordinates
BIPLOT coordinates
Turina (CA)
Turina (PCA)
Josefa Garzón
Josefa Valle
Joaquín
María
Obdulia Garzón
Concha
Josefa Garzón
Obdulia
José
Luis
Obdulia Garzón
Josefa Valle
Joaquín Turina
Joaquín
María
Concha
José
Luis
Obdulia
Joaquín Turina
CA coordinates
Modesto Escobar & J.L. A. Berrocal (USAL)
PCA coordinates
Content Analysis
22th October 2015
13 / 54
Background
Other analysis
Other uses of coincidence analysis
From survey analysis to cultural trends
Coincidence analysis has many applications. Among others:
Survey analysis
Unemployment
Social problems
Mass media audience
Data Mining
Samples (Composition of genes)
Corruption (Black cards)
Cultural trends
Composers
Painters
Creators
Content analysis
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
14 / 54
Background
Survey analysis
Survey analysis
Ways of looking for jobs (EPA-2014)
Others: others
Age: adult
Age: young
Contacts: employers
Self: loan
Others: exams
Others: interviews
Ads: placing
Waiting: offers
Waiting: results
Agency: private
Self: employment
Contacts: informal
Ads: looking at
Age: older
Agency: public
MDS coordinates
Agencies/Search
Contacts/Search
Competition/Search
Ads/Search
Others/Search
Age/Age
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
Self-emp./Search
22th October 2015
15 / 54
Background
Survey analysis
Survey analysis
Social problems in Spain (2014) CIS-3045
ESTUDIOS==Secundaria 1ª etapa
Edad==Maduro
Problemas económicos
El paro
La corrupción y el fraude
ESTUDIOS==Superiores
ESTUDIOS==Sin estudios
ESTUDIOS==Secundaria 2ª etapa
Otras respuestas
Edad==Mayor
Edad==Adulto
Los problemas de índole social
P26==Hombre
P26==Mujer
ESTUDIOS==F.P.
ESTUDIOS==Primaria
La educación
La sanidad
Edad==Joven
Los/as políticos/as en general
MDS coordinates
Económico/Problema
Social/Problema
Políticos/Problema
Género/Género
Edad/Edad
Estudios/Estudios
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
Otros/Problema
22th October 2015
16 / 54
Background
Survey analysis
Survey analysis
Mass media audience (EGM-2013)
INDPE
NO newspaper
CDIAL
PRONTO
NOVA
TELECINCO
NO cadena
HOLA
QUE ME DICES
CANAL SUR
DIVINITY
MI BEBE Y YO
AR
ANTENA3 NITRO
NO magazine
LABORES DEL HOGAR
SABER VIVIR
SEMANA
LA SEXTA3
TELVA MÍA
SER PADRES
CUORE
FACTORIA DE FICCION (FDF)
DIEZ MINUTOS
COSAS DE CASA
COCINA FÁCIL
ELLE
C40 TVE1 TVE2 CUATRO
LECTURAS ESPECIAL COCINA
LECTURAS
CLARA
CASA
DIEZ
XPLORA
VOGUE
C100
EL MUEBLE
DISCOVERY MAX
NEOX
COSMOPOLITAN
VIAJES NATIONALGEOGRAPHIC
EUROPA
COPE
GLAMOUR
MICASA
JARA Y SEDAL
LA SEXTA
QUO
MUY INTERESANTE
HISTORIA NATIONAL GEOGRAPHIC
INTERVIU
MARCA MOTOR
KISSFM
NATIONAL GEOGRAPHIC
PELO PICO PATA
OCR RNE1
NO channel
24 HORAS TVE
FOTOGRAMAS
OTRAS REVISTAS
TIEMPO
MARCA
EL JUEVES AS
LA VOZ DE GALICIA
RACC CLUB
ABC
AUTOPISTA
SER
EL MUNDO
SOLO MOTO ACTUAL
TV3
LA VANGUARDIA
EL PAÍS
LA RAZÓN
CAT_RA
EL PERIÓDICO
20 MINUTOS
EL MUNDO DEPORTIVO
EXPANSIÓN
SPORT
RAC1
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
17 / 54
Background
Data mining
Data mining
Genes composition of samples. Fuente: http://www.1000genomes.org/data
TSI
PUR
ASW
CHB
IBS
YRI
MXL
GIH
CHS
LWK
Has Omni Genotypes
ACB
PEL
Has Exome/LOF Genotypes
JPT
Has Axiom Genotypes
CLM
CDX
FIN
BEB
KHV
GBR
CEU
ITU
PJL
Has Affy 6.0 Genotypes
MSL
GWD
STU
ESN
Fruchterman-Reingold coordinates
Africa/Sample
America/Sample
Asia/Sample
Europe/Sample
Omni/Genotype
Axiom/Genotype
Affy/Genotype
Exome/Genotype
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
18 / 54
Background
Data mining
Data mining
Mean expenses per person with Bankia black cards (2003-2011)
Joyas
Derecha
Gasolineras
Hoteles
Metro
Gasto compras
Flores
Social-democracia
Clubes
Restaurantes lujo
Restaurante normal
Taxis
Cultura Cajeros automáticos
Modesto Escobar & J.L. A. Berrocal (USAL)
Izquierda
Content Analysis
22th October 2015
19 / 54
Background
Cultural trends
Cultural trends
Bachtack concerts reviewed (2009-2015)
Handel
Beethoven
Bach
Mozart
Haydn
Bruckner
Bartok
Schubert
Liszt
Chopin
Vivaldi
Mendelssohn
Brahms
Schumann
Stravinsky
Mahler
Ravel
Debussy
StraussR
Prokofiev
Wagner
Rachmaninov
Berlioz
Dvorak
Tchaikovsky
Elgar
Sibelius
Britten
Shostakovich
MDS coordinates
Barroco
Modesto Escobar & J.L. A. Berrocal (USAL)
Clásico
Romántico
Content Analysis
Siglo XX
22th October 2015
20 / 54
Background
Cultural trends
Cultural trends
Creators in Juan March exhibitions (1975-2015)
Bonnard, Pierre
Redon, Odilon
Dix, Otto
Buren, Daniel
Torres-García, Joaquín
Kosuth, Joseph
Magritte, René
Moore, Henry
Penn, Irving
Sisley, Alfred
Schuitema, Paul
Francis, Sam
Uecker, Günther
Miralda, Antoní
Broodthaers, Marcel
Zush, Alberto
Carlu, Jean
Dexel, Walter
Razulevich, Mijaíl
Baselitz, Georg
Serrano, Pablo
Schmidt-Rottluff, Karl
Giacometti, Alberto
Dubuffet, Jean
Albers, Josef
David Friedrich, Caspar
J. Schoonhoven, Jan
Kandinsky, Wassily
Trump, Georg
Newman, Barnett
Bayer, Herbert
Klee, Paul
Dibbets, Jan
Lechuga, David
Senkin, Serguéi
Mueller, Otto
Moholy-Nagy, László
Gauguin, Paul
Stenberg, Gueórgui
Beckmann, Max
Oldenburg,
Claes
Kuláguina,
Valentina
Capa Eiriz, Joaquín
Gottlieb, Adolph
Schwitters, Kurt
Vaquero Turcios, Joaquín
Soto, Jesús Raphael
Picasso, Pablo Sherman, Cindy
Noland, Kenneth
Klein, Yves
Bacon,
Francis
Baumeister, Willi
Mapplethorpe,
Robert
Munch, Edvard
Ernst, Max
Lutschischkin, Sergei
Dalí, Salvador
Rodin, Auguste
Madoz, Chema
Tobey, Mark
Gursky, Andreas
Zwart,
Piet
Brossa,
Joan
Morellet,
François
Lohse, Richard Paul
Nauman, Bruce
Kokoschka, Oskar
Rothko, Mark
Seurat, Georges
Heartfield, John
Mangold, Robert
Léger, Fernand
Chirino, Martín
Scully, Sean
Ruff, Thomas
Mondrian, Piet
García Rodero, Cristina
Canogar, Rafael
Gran, Enrique
Prusakov, Nikolai
Hockney, David
Serra, Richard
Cruz-Díez, Carlos
Rosenquist, James
Wesselmann, Tom
Opalka, Roman
Delaunay, Robert
Cassandre, Adolphe
Flavin, Dan
Molzahn, Johannes
Lehmbruck, Wilhelm
Claret, Joan
Pollock, Jackson
Altman, Natan
Laffón, Carmen
Kline, Franz
Stenberg, Vladímir
Matisse, Henri
Fontana, Lucio
Senkin, Sergei
Arbus, Diane
Berrocal, Miguel
Vantongerloo, Georges
Grosz, George
Bordes, Juan
Manet,
Édouard
Morris, Robert
Braque, Georges
Schlemmer, Oskar
Klucis, Gustavs
Motherwell, Robert
Muñoz, Lucio Hernández Pijuan, Joan
Miró, Joan
Lichtenstein, Roy
Long, Richard
Popova, Liubov
Teixidor, Jordi
Cuixart, Modest Palazuelo, Pablo
Nolde, Emil
Chillida, Eduardo
von Jawlensky, Alexej
von Graevenitz, Gerhard
Cartier-Bresson, Henri
López García, Antonio
Tharrats, Joan Josep
Manzoni, Piero
Balagueró, José Luis
Verheyen, Jef
Schiele, Egon
Rohlfs, Christian
Tinguely, Jean
Ryman, Robert
Puig, August
de Kooning, Willem
Johns, Jasper
Heckel, Erich
Lissitzky, El
de Goya, Francisco
Arp, Jean
Bill, Max
Degas, Edgar
de Toulouse-Lautrec, Henri
Malevich, Kasimir
Monet, Claude
Stella, Frank
Ludwig Kirchner, Ernst
García-Alix, Alberto
Sempere, Eusebio
Vasarely, Victor
Torner, Gustavo
LeWitt, Sol
Ponç, Joan
Vilacasas, Joan
Rueda, Gerardo
Saura, Antonio
Millares, Manuel
Equipo Crónica,
Brinkmann, Enrique
Tàpies, Antoni
Soria, Salvador
Clavé, Antoni
Victoria, Salvador
Burguillos, Jaime
Genovés, Juan
Clavé, Antonio
Farreras, FranciscoRivera, Manuel
Guinovart, JosepFeito,
Luis
Guerrero, José
Nicholson, Ben
entre otros,
Mack, Heinz
Darboven, Hanne
Morandi, Giorgio
Kelly, Ellsworth
López Hernández, Julio
Bissier, Julius
Francés, Juana
Ermilov, Vassily
Andre, Carl
Judd, Donald
Boltanski, Christian
Haussmann, Raoul
Sutnar, Ladislav
Gabino, Amadeo
González, Julio
Viola, Manuel
Rauschenberg, Robert
Modesto Escobar & J.L. A. Berrocal (USAL)
Warhol, Andy
Chagall, Marc
Hernández Mompó, Manuel
Zóbel, Fernando
Tschichold, Jan
Calder, Alexander
Ródchenko, Aleksandr
Ruscha, Edward
Gordillo, Luis
Klimt, Gustav
Cézanne, Paul
McKnight Kauffer, Edward
No authors
Lorenzo, Antonio
Röhl, Karl Peter
Ródchenko, Alexandr
Content Analysis
22th October 2015
21 / 54
Background
Cultural trends
Cultural trends
Timeline of famous portrait painters
Diego Rodríguez de Silva y Vel
Andy Warhol
Francis Bacon
Édouard Manet
Jan Van Eyck
Chuck Close
Tiziano
Alberto Durero
Pablo Picasso
Piero della Francesca
Vincent Van Gouh
Pedro Pablo Rubens
Rembrandt Harmenszoon van Rijn
Lucien Freud
Ferdinand Hodler
Oskar Kokoschka
Thomas Gainsborough
Jean-Auguste-Dominique Ingres
Hyacinthe Rigaud
Jean-Honoré Fragonard
Wilhem Leibl
Rafael
Parmigianino
Giseppe Arcimboldo
Hans Holbein el joven
Leonardo da Vinci
Domenico Ghirlandaio
Joshua Reynolds
Giovanni Battista Moroni
Gustave Courbet
Francisco José de Goya y Lucie
Jacques-Louis
David Boucher
François
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
22 / 54
Background
Stata users’ commands
Stata user-written commands
Main
txttool provides a set of tools for managing and analyzing free-form text.
The command integrates several built-in Stata functions with new text
capabilities, including a utility to create a bag-of-words representation of
text and an implementation of Porter’s word-stemming algorithm.
wordfreq inputs a set of text files and produces in memory a set of
frequencies of all words that occur in at least one of the input texts. The
resulting dataset consists of a text variable word containing a list of the
words themselves.
wordscores implements the computerized content analysis techniques
described in ”Extracting Policy Positions From Political Texts Using Words
as Data” by Laver et al. [2003]
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
23 / 54
Background
Stata users’ commands
Stata user-written commands
Others
strdist module to calculate the Levenshtein distance (or edit
distance) between strings.
matchit is a tool to join observations from two datasets based on
string variables which do not necessarily need to be exactly the same.
It performs many different string-based matching techniques, allowing
for a fuzzy similarity between the two different text variables.
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
24 / 54
precoin
Classification
precoin
Distinct uses
precoin converts politomous variables into binary variables for
coincidence analysis. Original variables can be either numerical or string.
It also can divide the content of just one variable into different
dichotomous variables according to a separator.
It has three kind of uses:
Multiple variables
Thesaurus strings
Words
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
25 / 54
precoin
Multiple variables
precoin uses
Multiple variables
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
26 / 54
precoin
Multiple variables
precoin uses
Frequencies of multiple variables
. precoin P701-P703, stub(problem) min(.02) sort freq replace
Categories
f
%/events
%/scenar
El paro
La corrupción y el fraude
Los problemas de ı́ndole económic
Los/as polı́ticos/as en general,
Others
Los problemas de ı́ndole social
La sanidad
La educación
Otras respuestas
Los recortes
La Administración de Justicia
El Gobierno y partidos o polı́tic
La inmigración
Los problemas relacionados con l
La crisis de valores
1897
1573
629
574
327
220
213
190
145
101
88
68
62
58
54
30.6
25.4
10.1
9.3
5.3
3.5
3.4
3.1
2.3
1.6
1.4
1.1
1.0
0.9
0.9
77.1
63.9
25.6
23.3
13.3
8.9
8.7
7.7
5.9
4.1
3.6
2.8
2.5
2.4
2.2
Events:
Scenarios:
Missing scenarios:
6199
2461
4
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
27 / 54
precoin
Multiple variables
precoin uses
Transformation of multiple variables
. describe problem*
storage
variable name
type
display
format
problem01
problem02
problem03
problem04
problem05
problem06
problem07
problem08
problem09
problem10
problem11
problem12
problem13
problem14
problem99
problem_miss
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
byte
byte
byte
byte
byte
byte
byte
byte
byte
byte
byte
byte
byte
byte
byte
byte
Modesto Escobar & J.L. A. Berrocal (USAL)
value
label
variable label
El paro
La corrupción y el fraude
Los problemas de ı́ndole económic
Los/as polı́ticos/as en general,
Los problemas de ı́ndole social
La sanidad
La educación
Otras respuestas
"Los recortes"
La Administración de Justicia
El Gobierno y partidos o polı́tic
La inmigración
Los problemas relacionados con l
La crisis de valores
Others
No events
Content Analysis
22th October 2015
28 / 54
precoin
Thesauri strings
precoin uses
Thesauri strings
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
29 / 54
precoin
Thesauri strings
precoin uses
Frequencies of thesauri strings
. precoin composers, stub(composer) sep(;) freq sort min(.05) missing replace
Categories
f
%/events
%/scenar
Others
Beethoven
Mozart
Brahms
Bach
Ravel
Tchaikovsky
Schubert
Shostakovich
Mahler
No events
3606
528
387
329
309
247
239
231
215
214
46
57.2
8.4
6.1
5.2
4.9
3.9
3.8
3.7
3.4
3.4
0.7
86.4
12.7
9.3
7.9
7.4
5.9
5.7
5.5
5.2
5.1
1.1
Events:
Scenarios:
6305
4173
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
30 / 54
precoin
Thesauri strings
precoin uses
Variables of thesauri strings
. describe composer*
storage
variable name
type
display
format
composers
composer0001
composer0002
composer0003
composer0004
composer0005
composer0006
composer0007
composer0008
composer0009
composer1823
composer_miss
%-50s
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
str100
byte
byte
byte
byte
byte
byte
byte
byte
byte
byte
byte
Modesto Escobar & J.L. A. Berrocal (USAL)
value
label
variable label
Composers
Beethoven
Mozart
Brahms
Bach
Ravel
Tchaikovsky
Schubert
Shostakovich
Mahler
Others
No events
Content Analysis
22th October 2015
31 / 54
precoin
Words
precoin uses
Thesauri strings
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
32 / 54
precoin
Words
precoin uses
Simple conversion
. precoin Plataforma, stub(labels) freq
Warning: separator has been set to space
Categories
f
%/events
%/scenar
Instagram
Twitter
Web
79
225
1020
6.0
17.0
77.0
6.0
17.0
77.0
Events:
Scenarios:
Missing scenarios:
1324
1324
2
. describe Instagram-Web
storage
display
variable name
type
format
Instagram
Twitter
Web
byte
byte
byte
Modesto Escobar & J.L. A. Berrocal (USAL)
%8.0g
%8.0g
%8.0g
value
label
variable label
Instagram
Twitter
Web
Content Analysis
22th October 2015
33 / 54
precoin
Words
precoin uses
Words
. precoin Mensaje, stub(labels) sort freq replace stop(stopwords.txt) separator(" ") min(.03)
Categories
f
%/events
%/scenar
Others
Autentico
Elpoderdeloautentico
Vida
Playa
Familia
Disfrutar
Mar
GarnierEs
Amigos
Pelo
Sonrisa
Amor
Sol
Verano
Sentir
1275
369
146
121
87
82
73
58
55
51
51
50
42
41
40
40
49.4
14.3
5.7
4.7
3.4
3.2
2.8
2.2
2.1
2.0
2.0
1.9
1.6
1.6
1.5
1.5
96.5
27.9
11.1
9.2
6.6
6.2
5.5
4.4
4.2
3.9
3.9
3.8
3.2
3.1
3.0
3.0
Events:
Scenarios:
Missing scenarios:
2581
1321
5
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
34 / 54
precoin
Words
precoin uses
Simple convertion
. describe Autentico-Mensaje_miss
variable name
storage
type
Autentico
Elpoderdeloau~o
Vida
Playa
Familia
Disfrutar
Mar
GarnierEs
Amigos
Pelo
Sonrisa
Amor
Sol
Verano
Sentir
Mensaje_others
Mensaje_miss
byte
byte
byte
byte
byte
byte
byte
byte
byte
byte
byte
byte
byte
byte
byte
byte
byte
display
format
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
Modesto Escobar & J.L. A. Berrocal (USAL)
value
label
variable label
Autentico
Elpoderdeloautentico
Vida
Playa
Familia
Disfrutar
Mar
GarnierEs
Amigos
Pelo
Sonrisa
Amor
Sol
Verano
Sentir
Others
No events
Content Analysis
22th October 2015
35 / 54
coin
Definition
coin
What is it?
coin is an ado program which is capable of performing coincidence
analysis.
Its input is a dataset with scenarios as rows and events as columns.
Its outputs are:
Different matrices (frequencies, percentages, residuals (3), distances,
adjacencies and edges)
Several bar graphs, network graphs (circle, mds, pca, ca, biplot) and
dendrograms (single, average, waverage, complete, wards, median,
centroid)
Measures of centrality (degree, closeness, betweenness, information)
(eigenvector and power)
Options to export to Ucinet, Pajeck, nwcommands, Excel and csv files
Its syntax is simple, but flexible. Many options (output, bonferroni, p
value, minimum, special event, graph control and options, ...)
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
36 / 54
coin
Syntax
Command
coin
coin varlist
if
in
weight
using filename
, options
Options can be classified into the following groups:
Outputs:
Frequencies: frequencies g-relative-frequencies vertical% horizontal%,
expected-frequencies odd-ratios,
Residuals: residuals standard-residuals normalized-residuals
Significance: phaberman podd ratios pfisher-exact-test
Others: tetrachoric-correlations, adjacencies-matrix distances list-key
centrality measures, all-previous-statistics
Coordinates: x (with plot) xy(circle|mds|ca|pca|biplot).
Plots
Bar: bar, cbar(varlist) and ccbar(varlist)
Residuals: rgraph(varlist) and ograph(varlist)
Graph: graph(circle|mds|ca|pca|biplot)
Dendrograms: dendrogram(single|complete|average|wards)
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
37 / 54
coin
Syntax
Command
coin (continued)
coin varlist
if
in
weight
, options
Options can be classified into the following groups (continued):
Controls: head(varlist), variable(varname), ascending, descending,
minimum (#), support(#), pvalue(#), levels(# # #), bonferroni,
lminimum(#), iterations(#).
Exports
Edges: export(filename) with .csv .xls .nw .pjk and .dl extensions
Nodes: varsave(filename) o export(filename) with .csv or .xls extensions
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
38 / 54
coin
Examples
coin example (I)
Matrix of coincidences in L’Oreal’s messages
. coin Vida-Mar Amigos-Sentir, frequencies
1326 scenarios. 32 probable coincidences amongst 12 events. Density: 0.48. Components: 1.
12 events(n>=5): Vida Playa Familia Disfrutar Mar Amigos Pelo Sonrisa Amor Sol Verano Sentir
Vida Playa Fam~a Dis~r
Mar Ami~s
Pelo Son~a
Amor
Sol
Frequencies
Vida
Playa
Familia
Disfrutar
Mar
Amigos
Pelo
Sonrisa
Amor
Sol
Verano
Sentir
121
2
6
13
4
1
4
7
8
3
2
6
Modesto Escobar & J.L. A. Berrocal (USAL)
87
15
6
5
9
2
0
0
11
7
0
82
9
1
17
2
0
1
0
6
1
73
8
8
3
1
1
3
1
1
58
3
6
2
1
5
2
2
Content Analysis
51
1
1
0
2
4
0
51
1
4
4
3
6
50
0
3
0
0
42
2
0
3
Ver~o
Sen~r
40
3
40
41
1
0
22th October 2015
39 / 54
coin
Examples
coin example (II)
Matrix of expected coincidences in L’Oreal’s messages
. coin Vida-Mar Amigos-Sentir, expected
1326 scenarios. 32 probable coincidences amongst 12 events. Density: 0.48. Components: 1.
12 events(n>=5): Vida Playa Familia Disfrutar Mar Amigos Pelo Sonrisa Amor Sol Verano Sentir
Vida Playa Fam~a Dis~r
Mar Ami~s
Pelo Son~a
Amor
Sol
Expected frequencies
Vida
Playa
Familia
Disfrutar
Mar
Amigos
Pelo
Sonrisa
Amor
Sol
Verano
Sentir
11.0
7.9
7.5
6.7
5.3
4.7
4.7
4.6
3.8
3.7
3.7
3.7
Modesto Escobar & J.L. A. Berrocal (USAL)
5.7
5.4
4.8
3.8
3.3
3.3
3.3
2.8
2.7
2.6
2.6
5.1
4.5
3.6
3.2
3.2
3.1
2.6
2.5
2.5
2.5
4.0
3.2
2.8
2.8
2.8
2.3
2.3
2.2
2.2
2.5
2.2
2.2
2.2
1.8
1.8
1.7
1.7
Content Analysis
2.0
2.0
1.9
1.6
1.6
1.5
1.5
2.0
1.9
1.6
1.6
1.5
1.5
1.9
1.6
1.5
1.5
1.5
1.3
1.3
1.3
1.3
1.3
1.2
1.2
Ver~o
Sen~r
1.2
1.2
1.2
22th October 2015
40 / 54
coin
Examples
coin example (III)
Matrix of normalized residuals in L’Oreal’s messages
. coin Vida-Mar Amigos-Sentir, normalized
1326 scenarios. 32 probable coincidences amongst 12 events. Density: 0.48. Components: 1.
12 events(n>=5): Vida Playa Familia Disfrutar Mar Amigos Pelo Sonrisa Amor Sol Verano Sentir
Vida Playa Fam~a Dis~r
Mar Ami~s
Pelo Son~a
Amor
Sol
Haberman residuals
Vida
Playa
Familia
Disfrutar
Mar
Amigos
Pelo
Sonrisa
Amor
Sol
Verano
Sentir
36.4
-2.3
-0.6
2.7
-0.6
-1.8
-0.3
1.2
2.3
-0.4
-0.9
1.3
36.4
4.4
0.6
0.6
3.3
-0.8
-1.9
-1.7
5.3
2.8
-1.7
Modesto Escobar & J.L. A. Berrocal (USAL)
36.4
2.2
-1.4
8.2
-0.7
-1.9
-1.0
-1.7
2.4
-1.0
36.4
2.8
3.3
0.1
-1.1
-0.9
0.5
-0.8
-0.8
36.4
0.5
2.6
-0.1
-0.6
2.5
0.2
0.2
Content Analysis
36.4
-0.7
-0.7
-1.3
0.3
2.1
-1.3
36.4
-0.7
1.9
2.0
1.2
3.7
36.4
-1.3
1.2
-1.3
-1.3
36.4
0.6
-1.2
1.6
36.4
-0.2
-1.1
Ver~o
Sen~r
36.4
1.7
36.4
22th October 2015
41 / 54
coin
Examples
coin example (IV)
Adjacencies matrix in L’Oreal’s messages
. coin Vida-Mar Amigos-Sentir, adjace
1326 scenarios. 32 probable coincidences amongst 12 events. Density: 0.48. Components: 1.
12 events(n>=5): Vida Playa Familia Disfrutar Mar Amigos Pelo Sonrisa Amor Sol Verano Sentir
Vida Playa Fam~a Dis~r
Mar Ami~s
Pelo Son~a
Amor
Sol
Adjacency matrix
Vida
Playa
Familia
Disfrutar
Mar
Amigos
Pelo
Sonrisa
Amor
Sol
Verano
Sentir
0.0
0.0
0.0
1.0
0.0
0.0
0.0
1.0
1.0
0.0
0.0
1.0
Modesto Escobar & J.L. A. Berrocal (USAL)
0.0
1.0
1.0
1.0
1.0
0.0
0.0
0.0
1.0
1.0
0.0
0.0
1.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
0.0
0.0
1.0
1.0
1.0
0.0
0.0
1.0
0.0
0.0
0.0
1.0
1.0
0.0
0.0
1.0
1.0
1.0
Content Analysis
0.0
0.0
0.0
0.0
1.0
1.0
0.0
0.0
0.0
1.0
1.0
1.0
1.0
0.0
0.0
1.0
0.0
0.0
0.0
1.0
0.0
1.0
0.0
0.0
0.0
Ver~o
Sen~r
0.0
1.0
0.0
22th October 2015
42 / 54
coin
Examples
coin example (V)
Centrality measures in L’Oreal’s messages
. coin Vida-Mar Amigos-Sentir, centrality
1326 scenarios. 32 probable coincidences amongst 12 events. Density: 0.48. Components: 1.
12 events(n>=5): Vida Playa Familia Disfrutar Mar Amigos Pelo Sonrisa Amor Sol Verano Sentir
Degree
Close
Between
Inform
Centrality measures
Vida
Playa
Familia
Disfrutar
Mar
Amigos
Pelo
Sonrisa
Amor
Sol
Verano
Sentir
0.36
0.55
0.36
0.64
0.64
0.55
0.55
0.18
0.36
0.64
0.55
0.45
Modesto Escobar & J.L. A. Berrocal (USAL)
0.61
0.69
0.55
0.73
0.73
0.69
0.69
0.50
0.58
0.73
0.65
0.65
0.06
0.03
0.00
0.12
0.05
0.03
0.05
0.01
0.02
0.18
0.06
0.05
Content Analysis
0.07
0.09
0.07
0.10
0.10
0.09
0.09
0.05
0.07
0.10
0.09
0.08
22th October 2015
43 / 54
coin
Examples
coin example (VI)
Simple graph
coin Vida-Mar Amigos-Sentir, graph(mds) levels(.5 .05 .01) goptions(name(Network))
Sonrisa
Amigos
Disfrutar
Sol
Playa
Familia
Vida
Mar
Amor
Verano
Pelo
Sentir
MDS coordinates
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
44 / 54
coin
Examples
coin example (VII)
Color graph
coin Vida-Mar Amigos-Sentir using Words, graph(mds) levels(.5 .05 .01) color(Tipo) legend
Sonrisa
Amigos
Disfrutar
Sol
Playa
Familia
Vida
Mar
Amor
Pelo
Verano
Sentir
MDS coordinates
Valores
Modesto Escobar & J.L. A. Berrocal (USAL)
Gente
Sensaciones
Content Analysis
Lugares
22th October 2015
45 / 54
coin
Examples
coin example (VIII)
Words in their context
. list Mensaje if Pelo & Amor, clean string(120)
Mensaje
234.
Que hay más auténtico que tu hija de 1 a~
no acariciándote el pelo puro amor
237.
Que hay más auténtico que tu hija de 1 a~
no acariciándote el pelo puro amor
449.
Disfrutar de un atardecer con el sonido de las olas y el aroma en mi pelo de Original Remedies, en comp~
nı́a del amor de m..
636.
Lo realmente auténtico es el amor de mi familia. A mi hermana y a mi nos encanta peinarnos y tener un pelo suave, con br..
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
46 / 54
coin
Examples
coin example (IX)
Automatic color graph (Communities)
coin Vida-Mar Amigos-Sentir using Words, graph(mds) groups(5)
Sonrisa
Amigos
Disfrutar
Sol
Playa
Familia
Vida
Mar
Amor
Verano
Pelo
Sentir
MDS coordinates
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
47 / 54
coin
Examples
coin example (X)
Color graph
coin Vida-Mar Amigos-Sentir, dendrogram(ward)
Clusters (method:wards)
Vida
Amor
Sonrisa
Pelo
Sentir
Verano
Playa
Sol
Disfrutar
Mar
Familia
Amigos
0
10
Modesto Escobar & J.L. A. Berrocal (USAL)
20
30
40
50
Haberman distance
Content Analysis
22th October 2015
48 / 54
coin
Examples
coin example (XI)
Graph with manual codes
coin K1-K69 using Nodes, graph(mds) color(tipo)
Cerveza
Comida
Positividad
Presente
Felicidad
Frase
Vida
Sonrisa
Compartir
Amistad
Disfrutar
Vacaciones
Familia
Amor
Playa
Actividad
Risa
Sueños
Siesta
Verano
Relax
Sol
Pequeñas cosas Mar
Valores
Pequeño placer
Autenticidad
Pareja
Infancia
NaturalezaSensación
Cuerpo
Hogar
Ser
Autonomía
Atreverse
Aseo
Sentir
Aromas
Pelo
Natural
Fiesta
Belleza
MDS coordinates
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
49 / 54
coin
Examples
Last example
Components of self identity
Mujer/
Simpático/a
Actividad
Actividad complementaria Hombre
Adjetivo
Trabajador/a
Preferencia
Grupo primario no familiar
Autoevaluación social
Sin calificativos
Familia nuclear
Autoevaluación práctica
Definición universal
Autoevaluación carácter-moral
Autoevaluación intelectual
Relacional
Colectiva
MDS coordinates
Género/Sociodemográficas
Consensual/Códigos
Otros atributos/Códigos
Anclaje/Códigos
Modesto Escobar & J.L. A. Berrocal (USAL)
Actitudinal/Códigos
Content Analysis
Calificativos/Códigos
22th October 2015
50 / 54
coin
Availability
Availability of precoin and coin
Frame Subtitle
If you are an user of a version superior to the 11.2 of Stata, you can
have a free copy of coin by typing:
net install coin, from(http://sociocav.usal.es/stata/)
It is still their first version, but it works reasonably well and it is being
improved. It could be updated as follows:
adoupdate, update
Comments and suggestions will be welcome!!
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
51 / 54
Next steps
Next steps
For coin and precoin
Automatic codification through regular expressions.
Similar graphs representation of correlations among quantitative
variables.
Use of log-lineal models to discover n-coincidences.
Time based study of coincidences using dynamic networks.
Using objects in the Mata code of the command coin.
It would be great if Stata implemented sparse matrices in Mata!!.
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
52 / 54
References
References
Bernald Berelson. Content Analysis in Communication Research. Free
Press., New York, 1952.
Ole R. Holsti. Content Analysis for the Social Sciences and Humanities.
Addison-Wesley., Reading, 1969.
Klaus Krippendorff. Content Anlysis. An Introduction to its Methodology.
Sage., Beverly Hills, 1980.
Rense Corten. Visualization of social networks in Stata using
multidimensional scaling. The Stata Journal, 11(1):52–63, 2011.
Hirotaka Miura. Stata graph library for network analysis. The Stata
Journal, 12(1):94–129, 2012.
Thomas E. Grund. nwcommands: Software tools for statistical modeling
of network data in Stata, 2014. URL http://nwcommands.org.
Michael Laver, Kenneth Benoit, and John Garry. Extracting policy
positions from political texts using words as data. American Political
Science Review, 97(2):311–331, 2003.
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
53 / 54
Final
Last slide
Thanks
Thank you very much!
[email protected] & [email protected]
Modesto Escobar & J.L. A. Berrocal (USAL)
Content Analysis
22th October 2015
54 / 54
Descargar