Content Analysis with Stata M. Escobar([email protected]) y J.L. Alonso Berrocal([email protected]) Universidad de Salamanca 8th Spanish Stata Users Group meeting Madrid, 22th October-2015 Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 1 / 54 Table of Contents Overview Background Content analysis Social network analysis Coincidence analysis Stata users-written commands The command precoin Multiple variables Thesaurus strings Words The command coin Next steps Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 2 / 54 Background Content Analysis Content Analysis Definitions Content analysis is a technique used in the social sciences for the systematic study of the contents of the communication. “A systematic, replicable technique for compressing many words of text into fewer content categories based on explicit rules of coding” [Berelson, 1952]. “Any technique for making inferences by objectively and systematically identifying specified characteristics of messages” [Holsti, 1969]. “Content analysis is a research technique for making replicable and valid inferences from data to their context” [Krippendorff, 1980]. Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 3 / 54 Background Programs for content analysis Software for content analysis Programs Qualitative analyzers Nvivo Atlas-ti QDA miner Statistical analyzers WordStat TextAnalyst LIWC Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 4 / 54 Background Qualitative analysis programs Qualitative analysis programs Nvivo Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 5 / 54 Background Qualitative analysis programs Qualitative analysis programs Atlas-ti Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 6 / 54 Background Statistical analyzers Statistical analysts WordStat for QDA (and for Stata) Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 7 / 54 Background Social network analysis Social network analysis Stata programs Although there are no tools for SNA in Stata, some advanced users have begun to write some routines. I wish to highlight the following works from which I have obtained insights: Corten [2011] wrote a routine to visualize social networks [netplot] Miura [2012] created routines (SGL) to calculate networks centrality measures, including two Stata commands [netsis and netsummarize] White presented a suite of Stata programs for network meta-analysis which includes the network graphs of Anna Chaimani in the 2013 UK users group meeting. Cerulli and Zinilii presented a procedure [datanet] to prepare a dataset for analysis purposes in the 2014 Italian Stata Users Group meeting. Grund [2014] have created a collection of programs to plot and analyze social networks in the Nordic and Baltic Stata Users Group [nwcommands]. Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 8 / 54 Background Coincidence analysis Coincidence analysis Definition Coincidence analysis is a set of techniques whose object is to detect which people, subjects, objects, attributes or events tend to appear at the same time in different delimited spaces. These delimited spaces are called scenarios (n), and are considered as units of analysis (i). In each scenario a number of J events Xj may occur (1) or may not (0) occur. The starting point is an incidence matrix (X) an n × J matrix composed by 0 and 1, according to the incidence or not of every event Xj . Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 9 / 54 Background Old example Pictures analysis 4 pictures (scenarios) & 8 different people (events) Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 10 / 54 Background Albums Analysis Example with names Father, mother, grandmother and 5 children Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 11 / 54 Background Albums Analysis Example with codes Turina, Garzón, Joaquı́n, Marı́a, Concha, José Luis, Obdulia, Valle Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 12 / 54 Background Albums Analysis Coincidences graphs MDS-Biplot-CA-PCA Turina (MDS) Turina (Biplot) Obdulia Garzón Joaquín Turina Joaquín Josefa Valle Concha María Joaquín Josefa Garzón Josefa Valle Obdulia Garzón José Luis Josefa Garzón María Obdulia Obdulia José Luis Concha Joaquín Turina MDS coordinates BIPLOT coordinates Turina (CA) Turina (PCA) Josefa Garzón Josefa Valle Joaquín María Obdulia Garzón Concha Josefa Garzón Obdulia José Luis Obdulia Garzón Josefa Valle Joaquín Turina Joaquín María Concha José Luis Obdulia Joaquín Turina CA coordinates Modesto Escobar & J.L. A. Berrocal (USAL) PCA coordinates Content Analysis 22th October 2015 13 / 54 Background Other analysis Other uses of coincidence analysis From survey analysis to cultural trends Coincidence analysis has many applications. Among others: Survey analysis Unemployment Social problems Mass media audience Data Mining Samples (Composition of genes) Corruption (Black cards) Cultural trends Composers Painters Creators Content analysis Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 14 / 54 Background Survey analysis Survey analysis Ways of looking for jobs (EPA-2014) Others: others Age: adult Age: young Contacts: employers Self: loan Others: exams Others: interviews Ads: placing Waiting: offers Waiting: results Agency: private Self: employment Contacts: informal Ads: looking at Age: older Agency: public MDS coordinates Agencies/Search Contacts/Search Competition/Search Ads/Search Others/Search Age/Age Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis Self-emp./Search 22th October 2015 15 / 54 Background Survey analysis Survey analysis Social problems in Spain (2014) CIS-3045 ESTUDIOS==Secundaria 1ª etapa Edad==Maduro Problemas económicos El paro La corrupción y el fraude ESTUDIOS==Superiores ESTUDIOS==Sin estudios ESTUDIOS==Secundaria 2ª etapa Otras respuestas Edad==Mayor Edad==Adulto Los problemas de índole social P26==Hombre P26==Mujer ESTUDIOS==F.P. ESTUDIOS==Primaria La educación La sanidad Edad==Joven Los/as políticos/as en general MDS coordinates Económico/Problema Social/Problema Políticos/Problema Género/Género Edad/Edad Estudios/Estudios Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis Otros/Problema 22th October 2015 16 / 54 Background Survey analysis Survey analysis Mass media audience (EGM-2013) INDPE NO newspaper CDIAL PRONTO NOVA TELECINCO NO cadena HOLA QUE ME DICES CANAL SUR DIVINITY MI BEBE Y YO AR ANTENA3 NITRO NO magazine LABORES DEL HOGAR SABER VIVIR SEMANA LA SEXTA3 TELVA MÍA SER PADRES CUORE FACTORIA DE FICCION (FDF) DIEZ MINUTOS COSAS DE CASA COCINA FÁCIL ELLE C40 TVE1 TVE2 CUATRO LECTURAS ESPECIAL COCINA LECTURAS CLARA CASA DIEZ XPLORA VOGUE C100 EL MUEBLE DISCOVERY MAX NEOX COSMOPOLITAN VIAJES NATIONALGEOGRAPHIC EUROPA COPE GLAMOUR MICASA JARA Y SEDAL LA SEXTA QUO MUY INTERESANTE HISTORIA NATIONAL GEOGRAPHIC INTERVIU MARCA MOTOR KISSFM NATIONAL GEOGRAPHIC PELO PICO PATA OCR RNE1 NO channel 24 HORAS TVE FOTOGRAMAS OTRAS REVISTAS TIEMPO MARCA EL JUEVES AS LA VOZ DE GALICIA RACC CLUB ABC AUTOPISTA SER EL MUNDO SOLO MOTO ACTUAL TV3 LA VANGUARDIA EL PAÍS LA RAZÓN CAT_RA EL PERIÓDICO 20 MINUTOS EL MUNDO DEPORTIVO EXPANSIÓN SPORT RAC1 Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 17 / 54 Background Data mining Data mining Genes composition of samples. Fuente: http://www.1000genomes.org/data TSI PUR ASW CHB IBS YRI MXL GIH CHS LWK Has Omni Genotypes ACB PEL Has Exome/LOF Genotypes JPT Has Axiom Genotypes CLM CDX FIN BEB KHV GBR CEU ITU PJL Has Affy 6.0 Genotypes MSL GWD STU ESN Fruchterman-Reingold coordinates Africa/Sample America/Sample Asia/Sample Europe/Sample Omni/Genotype Axiom/Genotype Affy/Genotype Exome/Genotype Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 18 / 54 Background Data mining Data mining Mean expenses per person with Bankia black cards (2003-2011) Joyas Derecha Gasolineras Hoteles Metro Gasto compras Flores Social-democracia Clubes Restaurantes lujo Restaurante normal Taxis Cultura Cajeros automáticos Modesto Escobar & J.L. A. Berrocal (USAL) Izquierda Content Analysis 22th October 2015 19 / 54 Background Cultural trends Cultural trends Bachtack concerts reviewed (2009-2015) Handel Beethoven Bach Mozart Haydn Bruckner Bartok Schubert Liszt Chopin Vivaldi Mendelssohn Brahms Schumann Stravinsky Mahler Ravel Debussy StraussR Prokofiev Wagner Rachmaninov Berlioz Dvorak Tchaikovsky Elgar Sibelius Britten Shostakovich MDS coordinates Barroco Modesto Escobar & J.L. A. Berrocal (USAL) Clásico Romántico Content Analysis Siglo XX 22th October 2015 20 / 54 Background Cultural trends Cultural trends Creators in Juan March exhibitions (1975-2015) Bonnard, Pierre Redon, Odilon Dix, Otto Buren, Daniel Torres-García, Joaquín Kosuth, Joseph Magritte, René Moore, Henry Penn, Irving Sisley, Alfred Schuitema, Paul Francis, Sam Uecker, Günther Miralda, Antoní Broodthaers, Marcel Zush, Alberto Carlu, Jean Dexel, Walter Razulevich, Mijaíl Baselitz, Georg Serrano, Pablo Schmidt-Rottluff, Karl Giacometti, Alberto Dubuffet, Jean Albers, Josef David Friedrich, Caspar J. Schoonhoven, Jan Kandinsky, Wassily Trump, Georg Newman, Barnett Bayer, Herbert Klee, Paul Dibbets, Jan Lechuga, David Senkin, Serguéi Mueller, Otto Moholy-Nagy, László Gauguin, Paul Stenberg, Gueórgui Beckmann, Max Oldenburg, Claes Kuláguina, Valentina Capa Eiriz, Joaquín Gottlieb, Adolph Schwitters, Kurt Vaquero Turcios, Joaquín Soto, Jesús Raphael Picasso, Pablo Sherman, Cindy Noland, Kenneth Klein, Yves Bacon, Francis Baumeister, Willi Mapplethorpe, Robert Munch, Edvard Ernst, Max Lutschischkin, Sergei Dalí, Salvador Rodin, Auguste Madoz, Chema Tobey, Mark Gursky, Andreas Zwart, Piet Brossa, Joan Morellet, François Lohse, Richard Paul Nauman, Bruce Kokoschka, Oskar Rothko, Mark Seurat, Georges Heartfield, John Mangold, Robert Léger, Fernand Chirino, Martín Scully, Sean Ruff, Thomas Mondrian, Piet García Rodero, Cristina Canogar, Rafael Gran, Enrique Prusakov, Nikolai Hockney, David Serra, Richard Cruz-Díez, Carlos Rosenquist, James Wesselmann, Tom Opalka, Roman Delaunay, Robert Cassandre, Adolphe Flavin, Dan Molzahn, Johannes Lehmbruck, Wilhelm Claret, Joan Pollock, Jackson Altman, Natan Laffón, Carmen Kline, Franz Stenberg, Vladímir Matisse, Henri Fontana, Lucio Senkin, Sergei Arbus, Diane Berrocal, Miguel Vantongerloo, Georges Grosz, George Bordes, Juan Manet, Édouard Morris, Robert Braque, Georges Schlemmer, Oskar Klucis, Gustavs Motherwell, Robert Muñoz, Lucio Hernández Pijuan, Joan Miró, Joan Lichtenstein, Roy Long, Richard Popova, Liubov Teixidor, Jordi Cuixart, Modest Palazuelo, Pablo Nolde, Emil Chillida, Eduardo von Jawlensky, Alexej von Graevenitz, Gerhard Cartier-Bresson, Henri López García, Antonio Tharrats, Joan Josep Manzoni, Piero Balagueró, José Luis Verheyen, Jef Schiele, Egon Rohlfs, Christian Tinguely, Jean Ryman, Robert Puig, August de Kooning, Willem Johns, Jasper Heckel, Erich Lissitzky, El de Goya, Francisco Arp, Jean Bill, Max Degas, Edgar de Toulouse-Lautrec, Henri Malevich, Kasimir Monet, Claude Stella, Frank Ludwig Kirchner, Ernst García-Alix, Alberto Sempere, Eusebio Vasarely, Victor Torner, Gustavo LeWitt, Sol Ponç, Joan Vilacasas, Joan Rueda, Gerardo Saura, Antonio Millares, Manuel Equipo Crónica, Brinkmann, Enrique Tàpies, Antoni Soria, Salvador Clavé, Antoni Victoria, Salvador Burguillos, Jaime Genovés, Juan Clavé, Antonio Farreras, FranciscoRivera, Manuel Guinovart, JosepFeito, Luis Guerrero, José Nicholson, Ben entre otros, Mack, Heinz Darboven, Hanne Morandi, Giorgio Kelly, Ellsworth López Hernández, Julio Bissier, Julius Francés, Juana Ermilov, Vassily Andre, Carl Judd, Donald Boltanski, Christian Haussmann, Raoul Sutnar, Ladislav Gabino, Amadeo González, Julio Viola, Manuel Rauschenberg, Robert Modesto Escobar & J.L. A. Berrocal (USAL) Warhol, Andy Chagall, Marc Hernández Mompó, Manuel Zóbel, Fernando Tschichold, Jan Calder, Alexander Ródchenko, Aleksandr Ruscha, Edward Gordillo, Luis Klimt, Gustav Cézanne, Paul McKnight Kauffer, Edward No authors Lorenzo, Antonio Röhl, Karl Peter Ródchenko, Alexandr Content Analysis 22th October 2015 21 / 54 Background Cultural trends Cultural trends Timeline of famous portrait painters Diego Rodríguez de Silva y Vel Andy Warhol Francis Bacon Édouard Manet Jan Van Eyck Chuck Close Tiziano Alberto Durero Pablo Picasso Piero della Francesca Vincent Van Gouh Pedro Pablo Rubens Rembrandt Harmenszoon van Rijn Lucien Freud Ferdinand Hodler Oskar Kokoschka Thomas Gainsborough Jean-Auguste-Dominique Ingres Hyacinthe Rigaud Jean-Honoré Fragonard Wilhem Leibl Rafael Parmigianino Giseppe Arcimboldo Hans Holbein el joven Leonardo da Vinci Domenico Ghirlandaio Joshua Reynolds Giovanni Battista Moroni Gustave Courbet Francisco José de Goya y Lucie Jacques-Louis David Boucher François Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 22 / 54 Background Stata users’ commands Stata user-written commands Main txttool provides a set of tools for managing and analyzing free-form text. The command integrates several built-in Stata functions with new text capabilities, including a utility to create a bag-of-words representation of text and an implementation of Porter’s word-stemming algorithm. wordfreq inputs a set of text files and produces in memory a set of frequencies of all words that occur in at least one of the input texts. The resulting dataset consists of a text variable word containing a list of the words themselves. wordscores implements the computerized content analysis techniques described in ”Extracting Policy Positions From Political Texts Using Words as Data” by Laver et al. [2003] Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 23 / 54 Background Stata users’ commands Stata user-written commands Others strdist module to calculate the Levenshtein distance (or edit distance) between strings. matchit is a tool to join observations from two datasets based on string variables which do not necessarily need to be exactly the same. It performs many different string-based matching techniques, allowing for a fuzzy similarity between the two different text variables. Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 24 / 54 precoin Classification precoin Distinct uses precoin converts politomous variables into binary variables for coincidence analysis. Original variables can be either numerical or string. It also can divide the content of just one variable into different dichotomous variables according to a separator. It has three kind of uses: Multiple variables Thesaurus strings Words Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 25 / 54 precoin Multiple variables precoin uses Multiple variables Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 26 / 54 precoin Multiple variables precoin uses Frequencies of multiple variables . precoin P701-P703, stub(problem) min(.02) sort freq replace Categories f %/events %/scenar El paro La corrupción y el fraude Los problemas de ı́ndole económic Los/as polı́ticos/as en general, Others Los problemas de ı́ndole social La sanidad La educación Otras respuestas Los recortes La Administración de Justicia El Gobierno y partidos o polı́tic La inmigración Los problemas relacionados con l La crisis de valores 1897 1573 629 574 327 220 213 190 145 101 88 68 62 58 54 30.6 25.4 10.1 9.3 5.3 3.5 3.4 3.1 2.3 1.6 1.4 1.1 1.0 0.9 0.9 77.1 63.9 25.6 23.3 13.3 8.9 8.7 7.7 5.9 4.1 3.6 2.8 2.5 2.4 2.2 Events: Scenarios: Missing scenarios: 6199 2461 4 Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 27 / 54 precoin Multiple variables precoin uses Transformation of multiple variables . describe problem* storage variable name type display format problem01 problem02 problem03 problem04 problem05 problem06 problem07 problem08 problem09 problem10 problem11 problem12 problem13 problem14 problem99 problem_miss %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g byte byte byte byte byte byte byte byte byte byte byte byte byte byte byte byte Modesto Escobar & J.L. A. Berrocal (USAL) value label variable label El paro La corrupción y el fraude Los problemas de ı́ndole económic Los/as polı́ticos/as en general, Los problemas de ı́ndole social La sanidad La educación Otras respuestas "Los recortes" La Administración de Justicia El Gobierno y partidos o polı́tic La inmigración Los problemas relacionados con l La crisis de valores Others No events Content Analysis 22th October 2015 28 / 54 precoin Thesauri strings precoin uses Thesauri strings Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 29 / 54 precoin Thesauri strings precoin uses Frequencies of thesauri strings . precoin composers, stub(composer) sep(;) freq sort min(.05) missing replace Categories f %/events %/scenar Others Beethoven Mozart Brahms Bach Ravel Tchaikovsky Schubert Shostakovich Mahler No events 3606 528 387 329 309 247 239 231 215 214 46 57.2 8.4 6.1 5.2 4.9 3.9 3.8 3.7 3.4 3.4 0.7 86.4 12.7 9.3 7.9 7.4 5.9 5.7 5.5 5.2 5.1 1.1 Events: Scenarios: 6305 4173 Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 30 / 54 precoin Thesauri strings precoin uses Variables of thesauri strings . describe composer* storage variable name type display format composers composer0001 composer0002 composer0003 composer0004 composer0005 composer0006 composer0007 composer0008 composer0009 composer1823 composer_miss %-50s %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g str100 byte byte byte byte byte byte byte byte byte byte byte Modesto Escobar & J.L. A. Berrocal (USAL) value label variable label Composers Beethoven Mozart Brahms Bach Ravel Tchaikovsky Schubert Shostakovich Mahler Others No events Content Analysis 22th October 2015 31 / 54 precoin Words precoin uses Thesauri strings Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 32 / 54 precoin Words precoin uses Simple conversion . precoin Plataforma, stub(labels) freq Warning: separator has been set to space Categories f %/events %/scenar Instagram Twitter Web 79 225 1020 6.0 17.0 77.0 6.0 17.0 77.0 Events: Scenarios: Missing scenarios: 1324 1324 2 . describe Instagram-Web storage display variable name type format Instagram Twitter Web byte byte byte Modesto Escobar & J.L. A. Berrocal (USAL) %8.0g %8.0g %8.0g value label variable label Instagram Twitter Web Content Analysis 22th October 2015 33 / 54 precoin Words precoin uses Words . precoin Mensaje, stub(labels) sort freq replace stop(stopwords.txt) separator(" ") min(.03) Categories f %/events %/scenar Others Autentico Elpoderdeloautentico Vida Playa Familia Disfrutar Mar GarnierEs Amigos Pelo Sonrisa Amor Sol Verano Sentir 1275 369 146 121 87 82 73 58 55 51 51 50 42 41 40 40 49.4 14.3 5.7 4.7 3.4 3.2 2.8 2.2 2.1 2.0 2.0 1.9 1.6 1.6 1.5 1.5 96.5 27.9 11.1 9.2 6.6 6.2 5.5 4.4 4.2 3.9 3.9 3.8 3.2 3.1 3.0 3.0 Events: Scenarios: Missing scenarios: 2581 1321 5 Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 34 / 54 precoin Words precoin uses Simple convertion . describe Autentico-Mensaje_miss variable name storage type Autentico Elpoderdeloau~o Vida Playa Familia Disfrutar Mar GarnierEs Amigos Pelo Sonrisa Amor Sol Verano Sentir Mensaje_others Mensaje_miss byte byte byte byte byte byte byte byte byte byte byte byte byte byte byte byte byte display format %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g Modesto Escobar & J.L. A. Berrocal (USAL) value label variable label Autentico Elpoderdeloautentico Vida Playa Familia Disfrutar Mar GarnierEs Amigos Pelo Sonrisa Amor Sol Verano Sentir Others No events Content Analysis 22th October 2015 35 / 54 coin Definition coin What is it? coin is an ado program which is capable of performing coincidence analysis. Its input is a dataset with scenarios as rows and events as columns. Its outputs are: Different matrices (frequencies, percentages, residuals (3), distances, adjacencies and edges) Several bar graphs, network graphs (circle, mds, pca, ca, biplot) and dendrograms (single, average, waverage, complete, wards, median, centroid) Measures of centrality (degree, closeness, betweenness, information) (eigenvector and power) Options to export to Ucinet, Pajeck, nwcommands, Excel and csv files Its syntax is simple, but flexible. Many options (output, bonferroni, p value, minimum, special event, graph control and options, ...) Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 36 / 54 coin Syntax Command coin coin varlist if in weight using filename , options Options can be classified into the following groups: Outputs: Frequencies: frequencies g-relative-frequencies vertical% horizontal%, expected-frequencies odd-ratios, Residuals: residuals standard-residuals normalized-residuals Significance: phaberman podd ratios pfisher-exact-test Others: tetrachoric-correlations, adjacencies-matrix distances list-key centrality measures, all-previous-statistics Coordinates: x (with plot) xy(circle|mds|ca|pca|biplot). Plots Bar: bar, cbar(varlist) and ccbar(varlist) Residuals: rgraph(varlist) and ograph(varlist) Graph: graph(circle|mds|ca|pca|biplot) Dendrograms: dendrogram(single|complete|average|wards) Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 37 / 54 coin Syntax Command coin (continued) coin varlist if in weight , options Options can be classified into the following groups (continued): Controls: head(varlist), variable(varname), ascending, descending, minimum (#), support(#), pvalue(#), levels(# # #), bonferroni, lminimum(#), iterations(#). Exports Edges: export(filename) with .csv .xls .nw .pjk and .dl extensions Nodes: varsave(filename) o export(filename) with .csv or .xls extensions Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 38 / 54 coin Examples coin example (I) Matrix of coincidences in L’Oreal’s messages . coin Vida-Mar Amigos-Sentir, frequencies 1326 scenarios. 32 probable coincidences amongst 12 events. Density: 0.48. Components: 1. 12 events(n>=5): Vida Playa Familia Disfrutar Mar Amigos Pelo Sonrisa Amor Sol Verano Sentir Vida Playa Fam~a Dis~r Mar Ami~s Pelo Son~a Amor Sol Frequencies Vida Playa Familia Disfrutar Mar Amigos Pelo Sonrisa Amor Sol Verano Sentir 121 2 6 13 4 1 4 7 8 3 2 6 Modesto Escobar & J.L. A. Berrocal (USAL) 87 15 6 5 9 2 0 0 11 7 0 82 9 1 17 2 0 1 0 6 1 73 8 8 3 1 1 3 1 1 58 3 6 2 1 5 2 2 Content Analysis 51 1 1 0 2 4 0 51 1 4 4 3 6 50 0 3 0 0 42 2 0 3 Ver~o Sen~r 40 3 40 41 1 0 22th October 2015 39 / 54 coin Examples coin example (II) Matrix of expected coincidences in L’Oreal’s messages . coin Vida-Mar Amigos-Sentir, expected 1326 scenarios. 32 probable coincidences amongst 12 events. Density: 0.48. Components: 1. 12 events(n>=5): Vida Playa Familia Disfrutar Mar Amigos Pelo Sonrisa Amor Sol Verano Sentir Vida Playa Fam~a Dis~r Mar Ami~s Pelo Son~a Amor Sol Expected frequencies Vida Playa Familia Disfrutar Mar Amigos Pelo Sonrisa Amor Sol Verano Sentir 11.0 7.9 7.5 6.7 5.3 4.7 4.7 4.6 3.8 3.7 3.7 3.7 Modesto Escobar & J.L. A. Berrocal (USAL) 5.7 5.4 4.8 3.8 3.3 3.3 3.3 2.8 2.7 2.6 2.6 5.1 4.5 3.6 3.2 3.2 3.1 2.6 2.5 2.5 2.5 4.0 3.2 2.8 2.8 2.8 2.3 2.3 2.2 2.2 2.5 2.2 2.2 2.2 1.8 1.8 1.7 1.7 Content Analysis 2.0 2.0 1.9 1.6 1.6 1.5 1.5 2.0 1.9 1.6 1.6 1.5 1.5 1.9 1.6 1.5 1.5 1.5 1.3 1.3 1.3 1.3 1.3 1.2 1.2 Ver~o Sen~r 1.2 1.2 1.2 22th October 2015 40 / 54 coin Examples coin example (III) Matrix of normalized residuals in L’Oreal’s messages . coin Vida-Mar Amigos-Sentir, normalized 1326 scenarios. 32 probable coincidences amongst 12 events. Density: 0.48. Components: 1. 12 events(n>=5): Vida Playa Familia Disfrutar Mar Amigos Pelo Sonrisa Amor Sol Verano Sentir Vida Playa Fam~a Dis~r Mar Ami~s Pelo Son~a Amor Sol Haberman residuals Vida Playa Familia Disfrutar Mar Amigos Pelo Sonrisa Amor Sol Verano Sentir 36.4 -2.3 -0.6 2.7 -0.6 -1.8 -0.3 1.2 2.3 -0.4 -0.9 1.3 36.4 4.4 0.6 0.6 3.3 -0.8 -1.9 -1.7 5.3 2.8 -1.7 Modesto Escobar & J.L. A. Berrocal (USAL) 36.4 2.2 -1.4 8.2 -0.7 -1.9 -1.0 -1.7 2.4 -1.0 36.4 2.8 3.3 0.1 -1.1 -0.9 0.5 -0.8 -0.8 36.4 0.5 2.6 -0.1 -0.6 2.5 0.2 0.2 Content Analysis 36.4 -0.7 -0.7 -1.3 0.3 2.1 -1.3 36.4 -0.7 1.9 2.0 1.2 3.7 36.4 -1.3 1.2 -1.3 -1.3 36.4 0.6 -1.2 1.6 36.4 -0.2 -1.1 Ver~o Sen~r 36.4 1.7 36.4 22th October 2015 41 / 54 coin Examples coin example (IV) Adjacencies matrix in L’Oreal’s messages . coin Vida-Mar Amigos-Sentir, adjace 1326 scenarios. 32 probable coincidences amongst 12 events. Density: 0.48. Components: 1. 12 events(n>=5): Vida Playa Familia Disfrutar Mar Amigos Pelo Sonrisa Amor Sol Verano Sentir Vida Playa Fam~a Dis~r Mar Ami~s Pelo Son~a Amor Sol Adjacency matrix Vida Playa Familia Disfrutar Mar Amigos Pelo Sonrisa Amor Sol Verano Sentir 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 Modesto Escobar & J.L. A. Berrocal (USAL) 0.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 1.0 1.0 Content Analysis 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 Ver~o Sen~r 0.0 1.0 0.0 22th October 2015 42 / 54 coin Examples coin example (V) Centrality measures in L’Oreal’s messages . coin Vida-Mar Amigos-Sentir, centrality 1326 scenarios. 32 probable coincidences amongst 12 events. Density: 0.48. Components: 1. 12 events(n>=5): Vida Playa Familia Disfrutar Mar Amigos Pelo Sonrisa Amor Sol Verano Sentir Degree Close Between Inform Centrality measures Vida Playa Familia Disfrutar Mar Amigos Pelo Sonrisa Amor Sol Verano Sentir 0.36 0.55 0.36 0.64 0.64 0.55 0.55 0.18 0.36 0.64 0.55 0.45 Modesto Escobar & J.L. A. Berrocal (USAL) 0.61 0.69 0.55 0.73 0.73 0.69 0.69 0.50 0.58 0.73 0.65 0.65 0.06 0.03 0.00 0.12 0.05 0.03 0.05 0.01 0.02 0.18 0.06 0.05 Content Analysis 0.07 0.09 0.07 0.10 0.10 0.09 0.09 0.05 0.07 0.10 0.09 0.08 22th October 2015 43 / 54 coin Examples coin example (VI) Simple graph coin Vida-Mar Amigos-Sentir, graph(mds) levels(.5 .05 .01) goptions(name(Network)) Sonrisa Amigos Disfrutar Sol Playa Familia Vida Mar Amor Verano Pelo Sentir MDS coordinates Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 44 / 54 coin Examples coin example (VII) Color graph coin Vida-Mar Amigos-Sentir using Words, graph(mds) levels(.5 .05 .01) color(Tipo) legend Sonrisa Amigos Disfrutar Sol Playa Familia Vida Mar Amor Pelo Verano Sentir MDS coordinates Valores Modesto Escobar & J.L. A. Berrocal (USAL) Gente Sensaciones Content Analysis Lugares 22th October 2015 45 / 54 coin Examples coin example (VIII) Words in their context . list Mensaje if Pelo & Amor, clean string(120) Mensaje 234. Que hay más auténtico que tu hija de 1 a~ no acariciándote el pelo puro amor 237. Que hay más auténtico que tu hija de 1 a~ no acariciándote el pelo puro amor 449. Disfrutar de un atardecer con el sonido de las olas y el aroma en mi pelo de Original Remedies, en comp~ nı́a del amor de m.. 636. Lo realmente auténtico es el amor de mi familia. A mi hermana y a mi nos encanta peinarnos y tener un pelo suave, con br.. Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 46 / 54 coin Examples coin example (IX) Automatic color graph (Communities) coin Vida-Mar Amigos-Sentir using Words, graph(mds) groups(5) Sonrisa Amigos Disfrutar Sol Playa Familia Vida Mar Amor Verano Pelo Sentir MDS coordinates Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 47 / 54 coin Examples coin example (X) Color graph coin Vida-Mar Amigos-Sentir, dendrogram(ward) Clusters (method:wards) Vida Amor Sonrisa Pelo Sentir Verano Playa Sol Disfrutar Mar Familia Amigos 0 10 Modesto Escobar & J.L. A. Berrocal (USAL) 20 30 40 50 Haberman distance Content Analysis 22th October 2015 48 / 54 coin Examples coin example (XI) Graph with manual codes coin K1-K69 using Nodes, graph(mds) color(tipo) Cerveza Comida Positividad Presente Felicidad Frase Vida Sonrisa Compartir Amistad Disfrutar Vacaciones Familia Amor Playa Actividad Risa Sueños Siesta Verano Relax Sol Pequeñas cosas Mar Valores Pequeño placer Autenticidad Pareja Infancia NaturalezaSensación Cuerpo Hogar Ser Autonomía Atreverse Aseo Sentir Aromas Pelo Natural Fiesta Belleza MDS coordinates Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 49 / 54 coin Examples Last example Components of self identity Mujer/ Simpático/a Actividad Actividad complementaria Hombre Adjetivo Trabajador/a Preferencia Grupo primario no familiar Autoevaluación social Sin calificativos Familia nuclear Autoevaluación práctica Definición universal Autoevaluación carácter-moral Autoevaluación intelectual Relacional Colectiva MDS coordinates Género/Sociodemográficas Consensual/Códigos Otros atributos/Códigos Anclaje/Códigos Modesto Escobar & J.L. A. Berrocal (USAL) Actitudinal/Códigos Content Analysis Calificativos/Códigos 22th October 2015 50 / 54 coin Availability Availability of precoin and coin Frame Subtitle If you are an user of a version superior to the 11.2 of Stata, you can have a free copy of coin by typing: net install coin, from(http://sociocav.usal.es/stata/) It is still their first version, but it works reasonably well and it is being improved. It could be updated as follows: adoupdate, update Comments and suggestions will be welcome!! Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 51 / 54 Next steps Next steps For coin and precoin Automatic codification through regular expressions. Similar graphs representation of correlations among quantitative variables. Use of log-lineal models to discover n-coincidences. Time based study of coincidences using dynamic networks. Using objects in the Mata code of the command coin. It would be great if Stata implemented sparse matrices in Mata!!. Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 52 / 54 References References Bernald Berelson. Content Analysis in Communication Research. Free Press., New York, 1952. Ole R. Holsti. Content Analysis for the Social Sciences and Humanities. Addison-Wesley., Reading, 1969. Klaus Krippendorff. Content Anlysis. An Introduction to its Methodology. Sage., Beverly Hills, 1980. Rense Corten. Visualization of social networks in Stata using multidimensional scaling. The Stata Journal, 11(1):52–63, 2011. Hirotaka Miura. Stata graph library for network analysis. The Stata Journal, 12(1):94–129, 2012. Thomas E. Grund. nwcommands: Software tools for statistical modeling of network data in Stata, 2014. URL http://nwcommands.org. Michael Laver, Kenneth Benoit, and John Garry. Extracting policy positions from political texts using words as data. American Political Science Review, 97(2):311–331, 2003. Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 53 / 54 Final Last slide Thanks Thank you very much! [email protected] & [email protected] Modesto Escobar & J.L. A. Berrocal (USAL) Content Analysis 22th October 2015 54 / 54