Análisis de redes sociales

miércoles, 11 de diciembre de 2013

ARS 101: Historia del análisis de redes sociales

Historia del análisis de redes sociales

John Scott

Las raíces de la red social de pensamiento se encuentran en los enfoques relacionales o estructurales en el análisis social que se desarrolló en la sociología clásica. Mientras que algunos enfoques de la sociología y la antropología utilizaban las ideas de la cultura y la formación cultural de explicar los patrones sociales de sentimiento, pensamiento y comportamiento, y otros hicieron hincapié en el entorno material y el aspecto físico del cuerpo como determinantes cruciales, una hebra particularmente importante del pensamiento social centrado su atención en los patrones reales de interacción e interconexión a través del cual los individuos y grupos sociales están relacionados entre sí. En algunos casos, esto implicó una concepción del "organismo social", o sistema social, como una estructura de instituciones que limitan la subjetividad y las acciones de los que ocupan cargos dentro de las instituciones. Para otros teóricos, sin embargo, se le dio mayor atención a los encuentros inmediatos cara a cara a través de la cual los individuos se relacionan entre sí y que se refigurado constantemente a través de las acciones de estos individuos.

Fue uno de los últimos teóricos que las metáforas de la "red" social y sus equivalentes - tales como la 'web' social o de lo 'tejido' social - surgieron por primera vez. Los teóricos sociales alemanes como Ferdinand Tönnies y Georg Simmel tomaron esta idea en su "sociología formal", visto como una sociología de las «formas» de interacción que llevan y contener los diversos contenidos subjetivamente significativas que motivan las acciones de los individuos. La traducción de esta obra y la publicación de gran parte de ella en el American Journal of Sociology animó muchos sociólogos de la primera década del siglo XX de Estados Unidos para seguir un enfoque " interaccionista " a la vida social. Charles Cooley, Albion Small y George Mead fueron, tal vez, las principales figuras de este movimiento del pensar. En la misma Alemania, las sociologías de Alfred Vierkandt y Leopold von Wiese exploran la imbricación de las acciones en las formas sociales de gran escala, tales como el mercado y el Estado. Un número de estos teóricos adoptó explícitamente una terminología de 'puntos' y ' líneas ' para representar las redes de conexiones que unen a los individuos en los " nudos " y " redes " de la estructura social. Wiese (1931) fue, tal vez, entre los primeros en usar estas ideas cuasi - matemáticos de forma explícita en una monografía teórica, puntos de etiquetado con las letras y en referencia a la direccionalidad y la circularidad de entretejer líneas de conexión.

Sociometría, pequeños grupos, y comunidades

El trabajo empírico temprano en pequeños grupos y comunidades se produjo en los Estados Unidos, donde los investigadores con formación en psicología y psicoanálisis realizaron una serie de investigaciones sobre las preferencias de amistad hechas en contextos educativos de los escolares y estudiantes universitarios. Opciones amistad entre compañeros de clase eran vistos como una forma de explorar la cohesión de los grupos de clase de la escuela y la popularidad relativa de alumnos particulares. Que nace de una larga tradición de estudio del niño que había tocado techo con Stanley Hall (1904) estudio del desarrollo de la adolescencia, el informe publicado más temprano en las redes de amistad fue el de Helen Bott (1928), que estudió las actividades de juego entre los niños de la guardería.

La influencia principal en el desarrollo de este trabajo, sin embargo, fue un psicoanalista austríaco que había emigrado a los Estados Unidos. Influenciado por la forma en que Alfred Vierkandt había combinado un enfoque relacional con una preocupación fenomenológico de los significados y significación emocional de las relaciones, Jacob Moreno ideó métodos formales sistemáticos para trazar las relaciones sociales entre los niños. El objetivo de Moreno fue a la vez para medir y trazar las relaciones sociales, en referencia a su trabajo como 'sociometría' y para sus dibujos como 'sociogramas' (Moreno 1934).

Moreno observó la interacción de los niños y se contó el número de opciones de amistad hechas y recibidas por los diferentes miembros de la clase, la combinación de estos en sociogramas que representaban a cada niño como un punto y sus preferencias de amistad como líneas con puntas de flecha. Estas puntas de flecha muestran la dirección en que se tomó una decisión : distinguir las opciones de los salientes ' dirigidas a otras de las opciones de los entrantes que perciben de la otros. Este método permitió a Moreno para identificar las "estrellas" más popular de la atracción y el familiar ' aislados ' que recibieron pocos o ningún preferencias de amistad. Un ejemplo de uno de sus sociogramas se muestra en la Figura 1. Moreno también fue capaz de ver si algunos niños trataron de hacer amistad con los demás, pero no fueron capaces de asegurar opciones recíprocas de aquellos que buscaban. A través de la compilación de las diversas preferencias de amistad hechas por miembros de la clase en una sola sociograma, Moreno pretendía utilizar una investigación sociométrico para modelar la conectividad global y el clima emocional del grupo (véase Jennings 1948).

1 Un sociograma del compañerismo

Una influencia en la obra de Moreno fue la tradición de la psicología de la Gestalt. Kurt Lewin, también un emigrante alemán a los Estados Unidos, fue aún más firmemente arraigada en la tradición de análisis y fue pionero en una psicología más general de los grupos pequeños. Psicología de la Gestalt implica un enfoque en las estructuras mentales que permiten a las personas a organizar y dar sentido a sus experiencias, basando sus ideas sobre las observaciones de los primates no humanos (véase Köhler 1917). Lewin como objetivo traducir esta idea básica a nivel social, con el objetivo de demostrar que las estructuras sociales de los grupos son el medio a través del cual sus acciones se organizan y limitados. Sus investigaciones sobre los grupos sociales se trate, entonces, con cómo se producen tales estructuras y los efectos que tienen sobre la comunicación y las acciones de sus miembros.

Punto de partida de Lewin fue a ver a grupos como "campos" de interacción, de ahí su adopción de la "teoría de campo " el término para describir su enfoque. Un campo de grupo es el espacio de la vida dentro de la cual la gente actúa, y sus preferencias de amistad y otras relaciones sociales han de entenderse como la creación de fuerzas de atracción y repulsión en el campo que limitan el flujo de ideas a través del grupo. Un individuo en particular, por ejemplo, es capaz de comunicar ideas sólo a través de sus contactos directos oa través de intermediarios que son capaces de transmitirlas. La difusión de las ideas, entonces, depende de la estructura de las relaciones de grupo dentro del cual se encuentran las personas que se comunican.

El trabajo de Lewin inspiró una serie de estudios experimentales que condujeron a la creación de " dinámica de grupo " como una especialidad dentro de la psicología social (Cartwright y Zander, 1953; Harary y Norman 1953). Fue en esta especialidad que los investigadores comenzaron a utilizar argumentos matemáticos más sistemáticos para modelar la estructura del grupo. Utilizando el enfoque matemático llamado la teoría de grafos, que investiga las propiedades formales de las redes, o ' gráficos ', de todo tipo, comenzaron a poner en práctica las ideas de la "densidad" de sociogramas y la " centralidad " de los individuos. En teoría de grafos, las propiedades formales de los puntos y líneas en una red se convierten en los objetos de un análisis matemático que revela las limitaciones que forman la red de forma.

Los investigadores en la dinámica de grupos construyen modelos formales de la estructura del grupo, como la estrella, la ' Y', la cadena y el círculo (ver Figura 2), y sostuvo que estas estructuras tienen implicaciones muy diferentes para una comunicación efectiva, ya que algunos individuos están en posiciones críticas "centrales". Si un grupo se estructura en una larga cadena de conexiones en el que se comunique información haciéndola pasar a través de una serie de intermediarios, entonces es probable que los significados se harán ligeramente distorsionada y alterada con cada paso en el flujo de la comunicación. Así como sucede en el juego de los Susurros chinos de los niños, el mensaje recibido al final de la cadena puede ser muy diferente de la enviada por el principio. En un grupo en el que hay muchas conexiones directas y canales de comunicación alternativos, por otro lado, los significados son menos probable que altere a medida que fluyen a través del grupo debido a que los múltiples canales de introducir "correcciones" y así una mayor conformidad en el pensamiento y el comportamiento es de esperar.

2 Estructuras sociométricas : estrella, Y, cadena, círculo

Los intermediarios en los grupos sociales, especialmente los que están en los centros de las "estrellas", han sido vistos como los potencialmente más poderosos miembros de sus grupos : son los líderes de opinión influyentes debido a su ubicación central dentro del grupo. La investigación en la dinámica de grupo ha explorado las formas en que las relaciones de dependencia dentro de los grupos pueden aumentar o disminuir la energía y fomentar las estructuras particulares de liderazgo (French y Raven 1959). Otros investigadores se han centrado en vez más en la medida en que son recíprocos relaciones sociales y, por lo tanto, los patrones de "equilibrio" y el desequilibrio que caracterizan a los diferentes grupos (Davis 1941). La investigación de Festinger (1957) vincula esto con las ideas de equilibrio mental subjetivo en las actitudes e ideas con el fin de explorar patrones particulares de respuesta del grupo (Festinger 1956).

En la década de 1930, George Lundberg (1936 ; Lundberg y Steele 1938) se había extendido las técnicas sociométricos básicos para el estudio de las comunidades de las aldeas, pero no fue hasta la década de 1950 que las técnicas sociométricos realmente comenzaron a moverse más allá de los grupos pequeños y experimentales para grupos más grandes en bienes ajustes. El psicólogo social canadiense Elizabeth Bott - hija del investigador sociométrico pionera Helen Bott trabajó en el Instituto Tavistock, donde la dinámica sociometría y de grupo tenían su base británica. Aquí se lleva a cabo un estudio comparativo de las parejas de la clase obrera y de clase media de Londres. Bott (1957) mostró que los miembros de cada clase fueron incorporados en diferentes, específicas de clase, las estructuras de parentesco y amistad, y que estas redes influido en sus relaciones domésticas " conyugales " dentro de sus hogares. Ella examinó, en particular, las diferencias de género en las redes sociales y los roles conyugales.

Bott trabajó en estrecha colaboración con antropólogos sociales que ya estaban empezando a explorar las implicaciones de (1940) la visión de Alfred Radcliffe -Brown, que las relaciones sociales de las sociedades tribales podrían ser investigados a través de la construcción de modelos de las "formas estructurales" de estas relaciones. John Barnes (1954) trajo estas ideas en su informe sobre la estructura comunal de la aldea de pescadores de Bremnes en el oeste de Noruega y su obra fue particularmente influyente en el trabajo en África central que Max Marwick estaba estableciendo como una rama del Departamento de Sociología y Antropología Social de la Universidad de Manchester. Clyde Mitchell, Scarlett Epstein, Bruce Kapferer, y otros trabajaron solidariamente a una serie de estudios de la comunidad y de parentesco y sus efectos sobre los chismes y las huelgas. Estos estudios fueron reunidos en una colección influyente que tuvo como objetivo mostrar el poder de la teoría de grafos como modelo para las relaciones sociales en las sociedades complejas (Mitchell 1969).

Desde finales de la década de 1960, los investigadores en los Estados Unidos se aprovechó de los avances que se están realizando en el uso de las computadoras para el análisis de datos y comenzaron a aplicar las ideas más sistemáticas y rigurosas en sus estudios de la comunidad más amplia y redes económicas. Granovetter (1974) examinó las relaciones de amistad como fuentes de información sobre oportunidades de trabajo y desarrolló la idea influyente que las personas adquieren la información más útil de sus lazos más distantes. Cuando una oportunidad de empleo que se disponga a nivel local, la información fluye de manera rápida y rápidamente a través de las densas y bien conectadas las redes locales y todo el mundo tiende a adquirir la misma información. Oportunidades de trabajo más distantes, sin embargo, llegan a ser conocido únicamente por aquellos con conexiones más flexibles en general más allá de su localidad inmediata. Las personas con este tipo de conexiones puede, por lo tanto, tienen una clara ventaja en sus actividades de búsqueda de empleo, ya que tendrán más oportunidades que los que sólo tienen conexiones localmente densas (ver Figura 3). Granovetter (1973) describió esto como la tesis de la fuerza de los lazos débiles.

3 Lazos fuertes y débiles

Wellman (1979 ; Wellman y Hogan 2006, véase también Fischer 1977 ; 1982) utilizaron métodos de encuesta social, para recoger información sobre la amistad y las relaciones de parentesco en Toronto, Canadá. Su objetivo fue explorar si los individuos se basó exclusivamente en las conexiones locales o eran capaces de mantenerse en contacto con aquellos que habían mudado a otras partes de la ciudad o del país. Él fue capaz de examinar gama de la gente de los contactos sociales inmediatos, la frecuencia y la intensidad percibida de esos lazos, y las oportunidades que ofrece el teléfono y el coche para mantener contactos a través de grandes distancias.

Bearden y sus colegas (1975, y ver Mizruchi 1982 ; Mintz y Schwartz 1985) investigaron las conexiones a nivel de placa corporativas en las principales compañías de Estados Unidos, mientras que Helmers y sus colegas (1975) realizaron un estudio similar en los Países Bajos y rápidamente ampliaron esto en una organización internacional comparación (Stokman et al. 1985). Esta investigación pone de relieve la " centralidad " relativa de los bancos y empresas financieras en las redes corporativas y las relaciones cambiantes entre las empresas financieras e industriales en las principales economías. Ellos documentaron las estructuras de coordinación y comunicación entre las grandes empresas comerciales y señaló a sus efectos sobre el desempeño económico y la cohesión de clase.

Los cliques, las funciones y matrices

El trabajo de los antropólogos que estudian las sociedades africanas en Manchester no fue el primer intento por parte de los antropólogos sociales para investigar las redes sociales. Lloyd Warner trabajó en la tradición durkheimiana de Radcliffe -Brown y se había llevado a cabo un estudio convencional de parentesco entre los australianos indígenas antes de mudarse a los Estados Unidos y la unión con el psicólogo Elton Mayo para realizar un estudio antropológico de una fábrica de Chicago y de su fuerza de trabajo. Este uso pionero de técnicas antropológicas para el estudio de las sociedades avanzadas " resultó importante en la generación de una alternativa a los estudios puramente sociométricos.

Los psicólogos industriales que trabajan en las obras eléctricas Hawthorne en Chicago habían estado realizando estudios experimentales sobre los efectos de las condiciones físicas en la satisfacción del trabajo y la producción. Habían descubierto que la mejora de las condiciones de calefacción e iluminación, permitiendo períodos de descanso, y otras alteraciones físicas en el lugar de trabajo mejora la moral de los trabajadores y dio lugar a una mayor productividad. Ellos estaban confundidos de encontrar, sin embargo, que cambios similares se produjeron cuando se restablecen las condiciones físicas a su estado original o incluso permitió a deteriorarse. Incierto cómo interpretar estos resultados, instaron a Mayo y Warner para asesorarlos. Pronto se llegó a la conclusión de que los trabajadores estaban respondiendo a la propia experiencia y no a los cambios en las condiciones físicas. Los gerentes habían seleccionado especialmente un grupo para estudiar, los había situado en un área de la fábrica por separado de los demás trabajadores, y tuvo, por primera vez, pareció mostrar interés en su bienestar. Este fenómeno se hizo conocido como el "efecto Hawthorne " en estudios experimentales.

Para llegar a estas conclusiones, Mayo y Warner perseguían sus propios estudios observacionales y experimentales en la fábrica. De particular importancia fue su estudio observacional de una sala de cableado donde observaron interacciones amistosas y hostiles, la cooperación y las ofertas de ayuda (Roethlisberger y Dickson 1939). Algunos de sus hallazgos fueron reportados como sociogramas, aunque parecen haberse inspirado en los diagramas de cableado eléctrico que abundaban en la fábrica y no por el trabajo sociométrico publicada. Su obra más importante, sin embargo, utiliza las tablas oa la matriz de representaciones para describir la formación de " camarillas ". La presentación de las interacciones observadas en una tabla en la que las filas representan los individuos y las columnas representan las ocasiones en las que participan, permitió a los investigadores identificar a los individuos que interactúan con frecuencia y las ocasiones o circunstancias en las que interactúan. Identificaron una camarilla en la sala de cableado que comprendía a los que tiende a ayudar a los demás y un conjunto de cepas que tenían dificultades para obtener ayuda cuando era necesario y dibujaron esto como un diagrama (ver figura 4). Las pandillas fueron vistos como representaciones formales de la idea expresada de que las personas pueden sentir que son parte de un " en - multitud" o que son personas ajenas a la misma.

4 Estructura de Clique

Warner decidió continuar cuestiones similares a través de un estudio de la comunidad en la ciudad de Nueva Inglaterra de Newburyport. Con respecto a esto como un típico pueblo americano con sus raíces en el período colonial temprano, se refirió a ella como " Yankee City " en sus estudios publicados (Warner y Lunt 1941 ; 1942 ; Warner y Srole 1945 ; Warner y Baja 1947 ; Warner 1963). Más tarde, en la década de 1930 supervisó un estudio similar en el igualmente antiguo pueblo del sur de Natchez, en referencia a esto como ' Ciudad Vieja ' Davis 1963). En estos estudios, Warner y su equipo exploraron la formación de " camarillas ", entendidos como agrupaciones comunales informales basados en sentimientos de intimidad y la solidaridad y que existían junto a las asociaciones formales de la iglesia, los negocios, el ocio y la política. Esta relación entre los patrones comunitarios y asociativos, los lazos formales informales y, puede ser visto en relación a Tönnies (1887) sugerencias anteriores sobre la relación entre Gemeinschaft y Gesellschaft.

La investigación Warner demostró que las personas tienden a ser miembros de numerosas camarillas superpuestas y que es a través de la intersección de las camarillas que se producen las estructuras generales de la solidaridad y la cohesión comunitaria. Warner y sus colegas argumentaron que el examen de la estructura de filas y columnas en una tabla de datos puede mostrar la existencia de camarillas y las relaciones entre ellos. En lugar de utilizar sociogramas, informaron las camarillas en los diagramas de Venn en el que los círculos y elipses representan conjuntos de individuos que interactúan, como lo habían hecho en los estudios de Hawthorne (ver Figura 5). A nivel agregado para toda la comunidad a la que asignan las personas a una clase social y representados cada clase social como una fila en una matriz de conexiones entre las distintas camarillas. Este procedimiento les permitió identificar las posiciones macro - estructurales que se encuentran en las comunidades. Su investigación puso de relieve las divisiones sociales de clase y de etnia, que muestra la existencia de una "línea de color" rígida separación de las comunidades negras y blancas.

5 Camarillas superpuestas en una jerarquía social

George Homans (1950) llevó a cabo una revisión sistemática de estudios de grupos pequeños, buscando una síntesis teórica de las ideas. En el corazón de esta síntesis fue su uso de las ideas sociométricos de la frecuencia y la dirección de las relaciones sociales, pero también exploró los métodos matriciales utilizados por Warner para la identificación camarilla. Revisando los análisis de la interacción informal realizada en Natchez por el equipo de Warner, comenzó a desarrollar un método sistemático de análisis de la matriz que se ha convertido desde entonces en una parte central del análisis de redes sociales. Homans se veía en las reuniones de 18 mujeres en 14 eventos sociales, como se representa en una matriz de 18 × 14, y afirmó que una reordenación Manual simultánea de filas y columnas podría poner de manifiesto la estructura camarilla interna. Una matriz está típicamente dispuesta en un orden arbitrario, enumerando los individuos y eventos alfabético o, en el mejor de, por orden cronológico. Barajar el orden de filas y columnas hasta que aparezca un patrón fuerte en las células revela la estructura que está oculto por estos arreglos arbitrarios.

Este tipo de prueba y error reordenamiento es lento y engorroso, incluso para un grupo relativamente pequeño, y el método de Homans no fue perseguida hasta mucho más tarde. No pasó, sin embargo, señalar el camino hacia investigaciones sistemáticas de las posiciones estructurales dentro de las redes de la clase que se estaban sugeridos por el antropólogo social Nadel (1957). Con el objetivo de un enfoque formal, matemático a la antropología, Nadel mostró que los métodos algebraicos para el análisis conjunto se podrían utilizar para modelar los "roles" dentro de una estructura social. Dónde usos sociométricos de la teoría de grafos se centraron en las interacciones reales de los individuos particulares, teorías de conjuntos algebraicos se centraron en las posiciones y los roles que estos individuos ocupados y en las relaciones de roles a nivel de la estructura social en su conjunto.

Métodos algebraicos y de la matriz han sido desarrolladas por Harrison White y otros (1963 ; Lorrain y Blanco 1971) y que fueron capaces de sacar provecho de los avances en informática para realizar reordenamientos de la matriz de las redes sociales a gran escala. En un enfoque que ellos llamaban ' blockmodelling ', vieron bloques de celdas en una matriz como la representación de las posiciones estructurales que Nadel había buscado (Blanco 1976 ; Boorman y Blanco 1976). Las personas que ocupan cada puesto son " estructuralmente equivalentes " entre sí : son, a los efectos de red, intercambiables, y sus características y conexiones individuales pueden ignorarse. Por lo tanto, se puede esperar que todos los padres a relacionarse de manera similar con respecto a sus hijos e hijas, mientras que se puede esperar que todos los maestros de relacionarse de forma similar hacia los alumnos. Las características de los papeles son hechos sociales que son independientes de las actitudes y puntos de vista de sus ocupantes individuales particulares.

El espacio y la distancia

Los usos sociométricos de la teoría de grafos midieron la "distancia" de un individuo a otro por el número de enlaces que hay que atravesar para conectar los dos. Esta es una medida útil de la proximidad, pero no se corresponde con la idea de todos los días de la distancia como algo mide a través de un espacio físico. En un sociograma, la disposición física de puntos es arbitraria, limitado sólo por el intento de estética para minimizar los solapamientos entre las líneas. Una medida de la distancia física, sin embargo, requiere de una representación no arbitraria de los datos. La distancia entre dos ciudades en millas, por ejemplo, se puede medir " a vuelo de pájaro ' en vez de por la que atraviesa una red de carreteras (de longitud variable) y las intersecciones. Un número de investigadores de la red han, por lo tanto, el intento de construir modelos de " espacio social " en el que la línea recta, las distancias ' línea recta ' se pueden medir (Bogardus 1925 ; 1959). Incrustación de una red de conexiones en un espacio tan permite que tanto el patrón de conexiones y distancias relativas al ser estudiados.

En algunas aplicaciones, una idea sencilla de la distancia social percibida se ha utilizado. Los individuos pueden presentarse con una carta de amistad sencilla (ver Figura 6) y se les pidió para trazar la posición de aquellos que saben en cuanto a su distancia subjetiva o emocional : amigos cercanos, las personas que usted está familiarizado con sobre una base del día a día o amigos más lejanos. El gráfico resultante da una representación visual del mundo social de una persona en términos del grado de intimidad que tienen con diversos números de otras personas (Wallman 1984 : 61, 66-7 ; Spencer y Pahl 2006). Estas cartas han proporcionado un enfoque útil para afectiva redes personales.

6 Estructura de roles en modelos de bloques

Una idea más formal del espacio social es inherente a los primeros trabajos de Lewin, pero en realidad se desarrolló como un método formal en psicología. Estudios psicométricos de actitudes habían utilizado métodos de ' escala ' para mostrar la fuerza relativa de las actitudes y esto llevó a los intentos de medir dos o más actitudes a través de la intersección de sus escalas en un espacio cognitivo que podría ser tomado para representar una parte estructural del mente. Estos planteamientos fueron llamados más pequeño análisis del espacio (SSA) por tratarse de un intento de definir el menor número de dimensiones (escalas) que representaría a un grupo particular de actitudes. Este enfoque se generalizó como escalamiento multidimensional (Kruskal y Wish 1978) y comenzó a ser aplicado a los fenómenos sociales como una forma de reportar las características estructurales del espacio social (Coxon 1982, y ver aplicaciones en Hope 1972).

7 distancia afectiva en la amistad

De particular importancia en el trazado de las redes en el espacio social han sido los métodos de análisis factorial y el análisis de componentes principales, los cuales tratan de la construcción de un espacio social más pequeña dimensión en la que grupos de personas o posiciones pueden ser representados. El desarrollo más reciente de este enfoque ha sido el análisis de correspondencia múltiple (Rouanet y Le Roux 2009), se utilizó una versión de la que en los estudios de estratificación emprendidas por Bourdieu (1979). Los programas de ordenador ahora producen imágenes de escalamiento multidimensional que muestran la distancia social real entre puntos, como se muestra en la Figura 8.

8 El escalamiento multidimensional de una red

El escalamiento multidimensional se aplicó en un estudio de las comunidades por Edward Laumann (1966 ; 1973, y ver Laumann y Pappi 1976). Centrándose en las posiciones en lugar de individuos, Laumann trazó los patrones de amistad de las personas, en particular, las categorías profesionales. La frecuencia de los lazos de amistad entre pares de posiciones se tomó como una medida de la distancia entre las posiciones y técnicas informatizadas se utilizaron para generar un espacio global en el que las clases podría ser asignada de acuerdo a su distancia de amistad. Este fue, por lo tanto, un intento de medir la asociación diferencial entre las clases sociales. El estudio de Laumann genera un modelo tridimensional de la estructura de la comunidad en la que las mismas clases sociales podrían ser representadas como nubes o grupos de puntos.

Uno de los estudios más influyentes que utilizan el escalamiento multidimensional fue el de Joel Levine (1972) en el enclavamiento de director de empresa. Utilizando datos de grandes bancos de Estados Unidos en 1966, Levine construye medidas de similitudes en los patrones de conexión alrededor de cada uno de los principales bancos. Posteriormente, utilizó este método para producir una integrales ' atlas ' de conexiones corporativas (Levine, 1984).

Los avances en los programas informáticos han hecho que sea muy fácil de llevar a cabo el escalamiento multidimensional. Uno de los principales programas de software (pajek) utiliza una técnica de primavera - incrustación a la posición de una red en un espacio multidimensional y permite que la red puede girar para la inspección visual de su estructura.

Dinámica y cambio social

La mayoría de los primeros estudios sobre las redes sociales han sido estático y descriptivo. Ellos han informado sobre las características de las redes sociales, tal como existen en un momento determinado, pero no han intentado generalmente para explorar las dinámicas internas que llevan a una red para cambiar de un estado a otro. Cuando la preocupación por el tiempo y el cambio ha sido evidente, esto ha implicado la construcción de una serie de estudios transversales con los procesos de cambio que se le imputen, pero no estudiados directamente (véase Scott y Griff 1984 ; Scott y Hughes, 1980). Un movimiento hacia modelos dinámicos es un fenómeno relativamente reciente y se ha logrado mediante el trabajo de los físicos que han estado al tanto de la investigación anterior por los psicólogos sociales, antropólogos sociales y sociólogos.

Motivado por una aparente disminución de los problemas teóricos fundamentales de la física en sí, un número de físicos han explorado las posibles ampliaciones de los modelos matemáticos de la física a otros campos intelectuales. Barabási (2002) ha sido el principal impulsor de la aplicación de los modelos físicos de los fenómenos sociales y económicos, viéndose a sí mismo como un pionero en territorio virgen (Scott 2011b). Al destacar la importancia de un documento elaborado por Watts y Strogatz (1998), Barabási ha producido un acercamiento que, pese a sus numerosas limitaciones, proporciona algunas nuevas ideas que han ayudado a lograr una mayor conciencia de la importancia del análisis dinámico.

El trabajo en esta área ha utilizado Stanley Milgram (1967 ; Travers y Milgram 1969) los estudios de "pequeños mundos" para explorar los límites para ciertos tipos de variación en la estructura de la red. Milgram estaba interesado en el hecho de que los extranjeros suelen ser capaces de identificar a personas conocidas o conexiones mutuas y exclamarán "qué pequeño es el mundo ! ' Para explorar este fenómeno que llevó a cabo un experimento en el que pedía voluntarios para pasar un mensaje a un llamado, pero desconocido, persona en otro país. Los voluntarios fueron instruidos de que deben pasar el mensaje sólo a una persona conocida para ellos y que esta segunda persona también debe pasar el mensaje a una persona conocida. El mensaje debe, por lo tanto, ser pasado hacia adelante en la forma de una carta en cadena. Cada persona que recibe el mensaje es instruido, sin embargo, para dárselo a un conocido que ellos sienten que es probable que sea capaz de transmitir el mensaje, directa o indirectamente, a la persona objetivo. Milgram descubrió que los mensajes podrían típicamente llegar desde el origen al destino a través de un promedio de seis conexiones, o cinco individuos intermedios. Esta es la ya famosa idea de " seis grados de conexión".

Watts y Strogatz comenzaron a explorar las propiedades matemáticas de las redes en las que estas conclusiones experimentales poseen. Ellos mostraron que sólo ciertos tipos de redes tienen estas propiedades de mundo pequeño y que muchas de las medidas utilizadas en el análisis de redes sociales dependía de su presencia en las redes estudiadas. Su foco de interés fue, por lo tanto, las variaciones en la estructura de la red y los cambios de estado de una red de mundo pequeño a más o menos densamente conectada redes. Duncan Watts (1999 ; 2003) ha demostrado que existen propiedades de mundo pequeño en las redes que se agrupan en zonas de densidad relativamente alta y por una diferenciación entre los lazos fuertes y débiles. En una red de este tipo, la superposición de las conexiones es tan grande que las distancias en línea entre los puntos son de forma óptima bajo. Un gráfico del mundo pequeño contiene muchos enlaces " redundantes " de tal manera que los puntos tienden a estar conectados a través de varias rutas alternativas. Watts mostró que los pequeños cambios en la conectividad de este tipo de redes pueden alterar significativamente sus propiedades si se producen estos cambios cerca de los niveles de los umbrales que definen las condiciones de mundo pequeño. Por lo tanto, los cambios estructurales radicales pueden seguir los cambios de menor importancia a nivel local.

El cambio estructural, entonces, es vista como el resultado de la elaboración a nivel local y la rotura de las conexiones y se produce como una consecuencia no deseada de esas acciones. Estos tipos de cambio se han modelado en modelos computacionales basados en agentes que tienen como objetivo simular la toma de decisiones del agente y así rastrear el cambio a través del tiempo. Este trabajo desarrollado independientemente del trabajo de los físicos (ver Snijders 2010), pero rápidamente se ha reconocido como proporcionar un elemento esencial en los modelos dinámicos propuestos por los físicos. El modelo de Tom Snijders representa individuos como seguidores de reglas que hacen o rompen sus relaciones sociales de acuerdo con las reglas de decisión en particular. Las personas que actúan " miope ", sin conciencia de las consecuencias de sus acciones más grandes (que son, por lo general, desconocido e impredecible para ellos). Las personas que actúan de esta manera producen cambios lineales graduales en la estructura general de la red. Cuando sus acciones reducen el número de enlaces redundantes más allá de cierto punto, sin embargo, el cambio puede ser radical y no lineal. En los puntos de umbral críticos no es lo que Watts ha llamado un " transición de fase ' que interrumpe la capacidad de la red para continuar desarrollando como antes. La difusión de las innovaciones y el flujo de capital, por ejemplo, pueden estar completamente interrumpidas por esa transición. Cuando las acciones a nivel local, aumentar el número de enlaces redundantes más allá de un punto de umbral superior, por otro lado, la red llega a ser tan alta conectividad que las ideas y los recursos pueden propagarse rápidamente a través de la red a una velocidad tal que se pierden todas las ventajas posicionales.

He revisado la historia del análisis de redes sociales a través de rastreo de avances en relación con una serie de enfoques metodológicos específicos. La primera de ellas fue la teoría de grafos y las técnicas asociadas de la sociometría. Esto proporciona un modelado intuitiva de las redes sociales y permite una serie de medidas fundamentales y avanzados de organización en red que se calcula. El segundo enfoque matemático que consideré fue el uso algebraico de conjuntos y matrices para descubrir la estructura de cargos y funciones dentro de una red. Si bien este enfoque es perfectamente compatible con la teoría de grafos, se destacan conjuntos muy distintos de cuestiones. Siguiente Miré a modelos espaciales que emulan las técnicas cartográficas geográficas para producir configuraciones espaciales de puntos. Estas técnicas permiten un alejamiento de las configuraciones arbitrarias de sociogramas sociométricos y hacia visualmente más significativa arreglos. Por último, he considerado algunos nuevos enfoques para la dinámica de la red que hacen posible la construcción de los modelos estáticos de la teoría de grafos, álgebra de matrices, y el escalamiento multidimensional y construir relatos de cambio estructural y desarrollo de la red. En el capítulo que sigue, voy a introducir los conceptos clave empleados en estos enfoques, y en el capítulo 4 Voy a mirar con más detalle algunas de las aplicaciones de estos conceptos en estudios sustantivos.

Otras lecturas

Scott, John, 2012 "Social Network Analysis." Third Sage London - El capítulo 2 da un recuento más completo de la historia del análisis de redes sociales.

Prell, Christina, 2012 "Social Network Analysis : History, Theory and Methodology." Sage London. El capítulo 3 provee un relato alternativo de esta misma historia.

Freeman, Linton C., 2004 "The Development of Social Network Analysis : A Study in the Sociology of Science." Empirical Press Vancouver. La historia del análisis de redes sociales definitiva y completa.

Bott, Elizabeth, 1957 "Family and Social Network." Tavistock Publications LondonA good example of a classic early study using network ideas

Fischer, Claude S., 1982 "To Dwell Among Friends : Personal Networks in Town and City." University of Chicago Press ChicagoA more advanced middle-period study of personal networks

Bloomsbur Academics

lunes, 9 de diciembre de 2013

Una aplicación para control de las redes sociales internas de la empresa

ViewDo Labs quiere ayudar a encontrar las estrellas en las redes sociales internas de su empresa

La aplicación ViewPoint Enterprise rastrea datos a través de las redes sociales de la empresa

Eric Blattberg

ViewDo Labs está apostando a que las grandes empresas quieran saber más acerca de lo que está pasando en sus redes sociales internas.

El emprendimiento lanzó su aplicación de análisis ViewPoint Enterprise, que rastrea la adopción del usuario, actividad del grupo, y los usuarios influyentes en Yammer y Microsoft SharePoint. El soporte para Jive, Chatter, Tibbr y Huddle estará disponible muy pronto.

"A veces es una puñalada en la oscuridad en cuanto a quién usted siente es el verdadero factor de influencia dentro de una organización es, y no hay forma de medir esa pestaña de factor de influencia para cualquiera producto sin nuestra ayuda", Tim Yandel, vicepresidente regional de ViewDo de ventas, dijo a VentureBeat.

Las personas más notorias que hablan más fuerte no siempre son los más influyentes, subrayó Yandel. Para determinar la influencia relativa, ViewPoint considera las acciones de "compartir", los "me gusta", y referencias, además de mensajes en total y la frecuencia de correos.

ViewDo influencer

También puede configurar alertas personalizadas para palabras clave específicas, presumiblemente para asegurar que los empleados no tengan fugas de información sensible (o hablen mal de los compañeros de trabajo).

"Es realmente la cabina de mando de la red social de una empresa", dijo Yandel. "La gente ve el valor de hacer una red de colaboración , pero su principal problema es la falta de conocimiento que tienen sobre el mismo y la falta de control que tienen sobre ella."

ViewDo abordará la segunda mitad de ese problema (control) cuando se actualice el producto con capacidades de gobernanza adicionales que le permiten administrar usuarios, grupos y contenido. Estas características están programados para llegar a principios del próximo año, de acuerdo con Yandel.

Los principales competidores de ViewDo pueden no estar en las empresas de análisis, sino más bien las empresas propias redes sociales, las que podría poner en práctica concebible una funcionalidad similar directamente en sus plataformas. Sabemos de por lo menos una empresa (que VentureBeat acordó no nombrar) que planifica hacer precisamente eso.

ViewDo Labs está basado en Woburn, Massachusetts y actualmente cuenta con 35 empleados. No han accedido a ninguna financiación aparte de una ronda de inversores ángeles no revelado de su junta de asesores.

VentureBeat

viernes, 6 de diciembre de 2013

Kivran-Swaine et al: Efectos de género y fortaleza de lazo en las interacciones de Twitter

First Monday

We examine the connection between language, gender, and social relationships, as manifested through communication patterns in social media. Building on an analysis of 78,000 Twitter messages exchanged between 1,753 gender–coded couples, we quantitatively study how the gender composition of conversing users influences the linguistic style apparent in the messages. Using Twitter data, we also model and control for the strength of ties between conversing users. Our findings show that, in line with existing theories, women use more intensifier adverbs, pronouns, and emoticons, especially when communicating with other women. Our results extend the understanding of gender–driven language use in the semi–public settings of social media services, and suggest implications for theory and insights for sociolinguistics.

Contents

1. Introduction
2. Theoretical background and hypotheses
3. Study: Gender and language in Twitter interactions
4. Discussion
5. Conclusion

1. Introduction

Popular social media platforms, like Facebook and Twitter, are home to significant amount of interpersonal interactions among their users. Many of these systems match the communication model of social awareness streams (SAS): one–to–many communication channels such as Twitter or Facebook’s “News Feed” (Naaman, et al., 2010). SAS expose various types of communication acts, such as sharing of information, creating new relationships, and managing existing ones. When analyzed in aggregate, SAS data can help us study human behavior in naturalistic settings and at scale.

Twitter, a widely used social network site, provides an exceptional opportunity to observe and analyze interpersonal communication patterns, in their intended social surroundings. In Twitter, there are number of communication conventions, such as replies, mentions, and re–tweets, which allow users of the platform to frame their messages with respect to their social connections. Replies, the communication convention we investigate in the current paper, facilitate directed conversations between individuals, where one user publicly “replies to” a message from another. The “reply” interactions on Twitter are acts of communication that (1) can be observed in their natural settings, not in an environment controlled for the purposes of research; and, (2) are semi–public in nature, bringing with them a potential “audience effect”, which may differentiate this communication convention from other computer–mediated communication (CMC) or non–mediated settings. In addition, data about users and their relationships, available from Twitter, offers the prospect of discovering connections between communication patterns, individual traits, and social relationships.

In this work, we look at how the gender of interacting users affects language use on Twitter. Gender is one construct that has been widely studied in relation to communication. A number of previous studies looked at the gender of the communicator as one of the main factors influencing content as well as the style of communication (Bronwlow, et al., 2003; Eckert, 1996; Labov, 1990; Lakoff, 1975; Mulac, et al., 1988). In this work, our goal is to further understand the relationship between gender and communication style, looking at exchanges between dyads of different gender compositions. While inspecting communication style, we build on the concept of linguistic style, the ways in which individuals use language. Linguistic style reveals attributes of the individuals, as well as their relationships with others (Pennebaker, et al., 2002). We look at 1,753 dyadic relationships between users of Twitter, and 78,000 directed semi–public Twitter replies exchanged between the users in these dyads. We address the following research question: “How does the gender composition of interacting dyads relate to linguistic styles in online conversations?”

When looking at language used in dyadic communications, it is beneficial to control for the strength of the connection between the interacting users. The intimacy and the intensity of the relationship between communicating parties have been shown to influence communication style (Bergs, 2006). The strength of ties between individuals can significantly influence the linguistic style of conversations that takes place between them. Thus, in this inquiry on understanding the relationship between gender and communication, we take tie strength into account.

Studying the association between gender, communication, and relationships can help us better understand communication theories, which currently are largely based on settings that are either non–mediated or tightly controlled. We use Twitter data to reveal how these theories apply to new communication environments within SAS, and at scale. Moreover, by examining gender and interpersonal interactions in their intended natural environments, we can better understand communication behavior in SAS settings.

1.1. Social interaction in Twitter

Twitter is a highly popular social media service, used for many purposes including personal communication, information sharing, business communication, and marketing. User activity in Twitter is primarily focused around “streams” of content. Twitter allows users to post short messages (tweets) up to 140 characters long. Users in Twitter are connected to others via asymmetric ”follow“ relationships (e.g., If Jeremy follows Devon, it does not imply that Devon is following Jeremy). A follow relationship on Twitter implicates that when a user logs into Twitter, she will be shown posts from those she follows, in reverse–chronological order. Unless users decide to make their accounts private, all messages in Twitter are publicly available to view. In this work, we report solely on publicly available data.

Twitter has a number of communication conventions such as mentions, replies, and re–tweets, allowing for different modes of interaction. For our study, we focus on the “reply” convention within Twitter. In Twitter, a user can send a reply message to another user, by initiating the message with an “@” immediately followed by the username of the correspondent. By default, reply messages are shown publicly on the sender’s Twitter profile page. Reply messages also are displayed in the timeline of all users following both the sender and the recipient of the message. For example, if Eddy follows both Jeremy and Devon, replies from Jeremy to Devon (or vice versa) will be displayed in Eddy’s timeline.

2. Theoretical background and hypotheses

We base our work on theories and prior work in language and gender. Grounding on theory, we develop a set of hypotheses to be tested on a dataset of replies from Twitter.

2.1. Linguistic style

In CMC environments, language often is the primary if not the only form of communication, and people utilize language to form and maintain online relationships (Baym, 2002). Linguistic style can be defined as how individuals or groups of individuals use language in communication. Linguistic style is a way of language use, specific to a community or a sub–population, supporting the construction of identification for speakers of a language (Eckert, 1996). In other words, style is how a person uses language in relation to other people.

Linguistic style, especially in the new communication environments of SNS, has the potential to reveal not only intentions (Searle, 1969) but also identity and social group membership (Tagliomente, 2006) As individuals alter their linguistic style for their audiences to achieve an expected effect (Bell, 1997), an examination of linguistic style can uncover rather complex social dynamics between gender, language, and social relationships, which may be hard or impossible to untangle otherwise.

A common approach to studying linguistic style has been observing the use of individual words, or categories of words, including various parts of speech (e.g., adverbs, pronouns, prepositions). Even though words as units, by themselves, do not carry or reveal the meaning behind speech acts, studying frequency or type of use of categories of words divorces the utterance from the context, and helps analysis focus on style (Pennebaker, et al., 2002).

2.2. Language, linguistic style, and gender

A significant number of research studies have examined gender differences with respect to language use. Past studies have illuminated differences in linguistic style between men and women, revealing patterns that emerge in language when men and women interact with one another (Bronwlow, et al., 2003; Cegala, 1989; Mulac, et al., 1988; Savicki, et al., 1997; Sillars, et al., 1997). In our study, we concentrate on analyzing use of three types of linguistic style markers — personal pronouns, intensifier adverbs, and emoticons — the most widely studied features in dyadic settings.

We next report on previous work examining the aforementioned markers that focused on the gender of the communicating person, and explain our motivation for studying them in SAS. We then report on the literature on the effect of dyadic gender composition on communication.

2.2.1. Personal pronouns

People use personal pronouns to reference others or themselves. A recipient of a message containing personal pronouns should be able to infer whom the pronouns refer to, in order to comprehend the full meaning of the message. Thus, frequent use of personal pronouns can imply that speakers make assumptions about their audiences’ ability to follow references (Pennebaker, et al., 2002), presupposing a level of intimacy and common ground. Previous studies have shown that women, when compared to men, use first person singular (FPS) pronouns significantly more often (Brownlow, et al., 2003; Mulac, 2006; Pennebaker, et al., 2002; Savicki, et al., 1997). However, there are conflicting findings regarding how the use of first person plural pronouns (FPP) differs between men and women (Brownlow, et al., 2003; Mulac, 2006). Personal pronoun use in relation to gender in SAS may indicate how speakers situate themselves within the social context of interactions (Mühlhäusler and Harré, 1990), and reflect on their social surroundings. For example, according to cognitive grammar framework (Langacker, 1987), use of the first person singular pronoun “I” increases the prominence of the speaker radically, maximally objectifying the speaker’s self. On the other hand, use of first person singular pronouns such as “we” indicates a sense of “oneness” and “communality”, which generally is associated with socially constructed gender roles for women (Eagly and Wood, 1991). Gender differences in the use of pronouns can expose how men and women may feel and act differently about identities that they construct online, as well as relationships they maintain in these platforms.

2.2.2. Adverbs

Adverbs are parts of speech that carry very little meaning by themselves, but significantly change the style as well as the strength of the utterances that they appear in. Simply, adverbs modify other words in a sentence. Intensifier adverbs (intensifiers), such as “really”, are used to strengthen, diminish, or otherwise change the meaning of the word they precede. People are motivated to use intensifiers to capture the attention of their audiences (Peters, 1994). A higher level of adverb use increases the narrative characteristic of the communication act, making utterances more embellished, and possibly alluring for listeners. The use of adverbs is a linguistic style feature that has previously been shown to be used more by women (Lakoff, 1975; Mulac, 2006). In fact, the dominance of adverb use by women has been noted by scholars since mid–eighteenth century (Partington, 1993). An early (and simplistic) interpretation from the beginning of nineteenth century for women’s increased adverb use was women’s inherent fondness for hyperbole in expressions (Stoffel, 1901). An examination of adverb use and how it changes between men and women in today’s technologically advanced communication platforms is valuable, as differences in adverb use can reveal how men and women decide to make their statements more captivating to their audiences in the performative spaces of SAS. For example, we may discover under what circumstances men and women choose to increase adverb use, thus coloring their expressions to influence their audience’s perceptions of specific messages.

2.2.3. Emoticons

Emoticons are series of symbols that represent non–verbal cues such as smiling, frowning, or winking, and are frequently used in CMC channels, where non–verbal cues often are impossible to put forth in text–based settings (Wolf, 2000). The manner in which (e.g., type, frequency) emoticons are used in language can also be considered a linguistic style choice. Former work that researched emoticon use in various CMC channels such as discussion boards, chat rooms, or blogs found significant gender differences in trends of emoticon use. Previoulsy, women were found to use emoticons more than men (Witmer and Katzman, 1997; Wolf, 2000). However, in a study of blogs authored by teenagers, boys were shown to use emoticons significantly more than girls (Huffaker and Calvert, 2005). Thus, age and context can be confounding factors in the relationship between gender and emoticon use. Emoticons, like pronouns, suggest intimacy between conversing parties. Looking at emoticon use in SAS and whether or how men and women use emoticons differently can help us better understand circumstances in which men and women feel at ease to include non–verbal cues in their semi–public communications. Moreover, by studying emoticons in relation to gender, we can uncover how men and women enrich the meaning of their interactions differently by portraying traces of their sentiment and mood in rather casual ways.

2.2.4. Gender and language in dyadic interactions

While the studies described earlier mostly considered the gender of the person communicating, researchers have also examined the interaction between linguistic style and the gender composition of the conversing pair. For example, language accommodation (Giles, et al., 1991), known to occur on Twitter (Danescu–Niculescu–Mizil, et al., 2011), leads, in same–gender dyads, to stressing tendencies for gender–specific language differences as converging tendencies (Mulac, et al., 1998). Other effects of gender composition have been shown; for example women use more pronouns (a typical female language feature) when interacting with other women (Brownlow, et al., 2003). Moreover, it was observed that in mixed–gender groups of interaction, men’s level of emoticon use rose to the level of women (Wolf, 2000).

2.3. Language and social networks

A person’s language use depends on whom the person is talking with. The speech community, a group sharing common language specifics in which an individual participates, plays an important role in the linguistic style of the individual. Online social networks are contemporary virtual speech communities (Paolillo, 1999) and manifest fundamental attributes of speech communities in traditional settings, such as language accommodation (Danescu–Niculescu–Mizil, et al., 2011).

Both audience design framework from sociolinguistics and accommodation theory from communication were used previously to explain changes in one’s language in relation to social environments where a speech act takes place. In audience design framework, Bell (1997) first defines “style” as what a person does with language in relation to others, and then builds the framework around the main idea that speakers design their styles based their audiences (Bell, 1997). Very similar to audience design framework is the communication accommodation theory (Giles, et al., 1991). This notion states that speakers adjust their speech to seek social attractiveness or to increase efficiency of communication.

The association between language and networks of social relationships has also been subject of inquiry in CMC. Previous work found features in e–mail text to detect power relationships (Bramsen, et al., 2011), as well as roles in corporate settings (McCallum, et al., 2007). Speech accommodation was observed in small groups, where word count, pronoun, and tense use related to a group’s cohesiveness (Gonzales, et al., 2010). Most recently, it was shown that linguistic accommodation is observable in social media, namely interactions in Twitter (Danescu–Niculescu–Mizil, et al., 2011).

Overall, previous work indicates that people shape their language to their audiences, and this phenomenon takes place in SAS as well. Interactions in Twitter have well–defined audiences, properties of which may influence the linguistic style of interactions, beyond the gender composition of the participants. The size of an audience in Twitter interactions directly correlates with the number contacts shared between the conversing parties. While a larger audience in traditional settings may bring with it qualms about disclosure, for Twitter replies, a large audience is indicative of a large number of shared contacts, therefore possibly a closer and more intimate relationship. Accordingly, in our analysis of gender and language, we account for the size of the audience of interactions, which may be a proxy for the strength of ties between interacting parties.

2.4. Hypotheses

Building on previous work, we develop the following hypotheses about gender composition and linguistic style in Twitter replies:

H1a: Women use personal pronouns (first person singular, and first person plural) more than men.

H1b: Women use personal pronouns more when interacting with women, than they do when interacting with men.

H2a: Women use intensifiers more than men.

H2b: Women use intensifiers more when interacting with women than they do when interacting with men.

H3a: Women use emoticons (positive and negative) more than men.

H3b: Women use emoticons more frequently when interacting with women than they do when interacting with men.

3. Study: Gender and language in Twitter interactions

We begin reporting our study by describing the dataset we used for our analysis. We then describe in detail the samples we selected from our dataset, and our motivations for doing so. We follow up by explaining how we calculated the variables to be used in our analyses. Finally we go through the analyses we have performed, to understand the relationship between gender composition of conversing individuals and the linguistic style assumed by them in Twitter interactions.

3.1. Dataset

We extracted a dataset of Twitter interactions, consisting of replies between Twitter dyads, as well as the genders of participating users, and the Twitter contact network (followers and followees) around the users in each dyad.

We began forming our dataset by identifying 715 Twitter users, whose data was available from a previous study (omitted for anonymity). This set, S, includes Twitter users that were randomly selected and manually identified as active, personal users of Twitter, not representing commercial entities or celebrities. The users followed, and were followed by fewer than 5,000 users. For each user s∈S, the dataset included the tweets s posted between over a period of four months, including replies by sto other users, as well replies posted by other users that targeted s. Using this initial dataset of Twitter posts, we identified all dyads (s, f) in the data such that: 1) sreplied to f at least twice, and 2) f replied to s at least once. Selecting users in this way ensures that a) the seed interacts with the follower in non–trivial manner; and, b) both sides are engaged in the exchange have a meaningful interaction.

We further filtered the dataset to only include dyads where we could obtain gender labels for both users. We used the Amazon Mechanical Turk (AMT) crowd sourcing platform to code the gender for all users in our initial dataset. For each user, we asked the online worker to categorize the profile as one of the following four categories: male, female, undetermined sex, or not a person. We asked the workers to click through to the user’s Twitter page and to consider the photo of the user, their name, screen name, and any other information they provide in their profile description text (e.g., “I’m a mother” in the profile text indicates the user is likely to identify as a woman). To assess the accuracy of the resultant AMT codes, one of the authors manually coded a sample of the dataset (2.5 percent), creating ground truth data. The coders agreed with our ground truth 89.6 percent of the time, and 93.4 percent if we hold out users that were labeled as “undetermined”, since these are more ambiguous and difficult, and were not used for our analysis. This rate corresponds to a Scott’s Pi reliability measure of 0.83, considered “excellent”.

The resultant dataset of interacting dyads included 1,753 pairs of Twitter users where for each dyad (x,y) we have a set I_x,y of reply messages exchanged between the two users. We refer to the subset of replies in this dyad that were directed from x to y as I_x»_y⊆I_x,y. The content dataset resulted in a total of 77,989 replies, an average of 44.5 interactions per dyad (i.e., mean size of I_x,y across all dyads; these sets show a skewed distribution, with median=20, SD=92.6). Some seeds were represented more than others in the dataset as they participated in more dyadic conversations (the mean number of conversations per seed x is 6.0; median=4, SD=6.99). For our analysis, we provide some control for the skewed distributions by looking at the subsamples of these exchanges at the level of exchanges and at the level of individual tweets, as described below.

For each dyad, we computed the set of common neighbors for the dyad’s users in the Twitter social network. We retrieved all users z such that z is either followed by or following one of the users in the dyad. We used that network to compute the common neighbors variable described below. To retrieve this network information, we used the Twitter social network snapshot that has been collected at the same time as the content datatset we use here, available from Kwak, et al. (2010).

3.1.1. Tweet sample

The first sample we used from dataset, tweet sample, consists of individual tweets as units. We used the tweet sample to examine whether linguistic style markers existed in each tweet. In this sample, each tweet is assigned to one of the four categories of gender composition, according to the “direction” of the communication: Male to Male (shorthand: M»M), Male to Female (M»F), Female to Male (F»M), or Female to Female (F»F). In this sample, we included at most 10 messages from each person in a dyad, to minimize bias that may be created by users who participate in longer exchanges. In other words, if one side of each dyadic exchange had more than 10 messages (|I_x»y|>10) we selected 10 of these tweets at random to use for this sample. This sample was then used to look at the directed use of different linguistic markers at the dyadic level, e.g., whether a tweet directed from a man to a woman included any intensifier adverbs. The final tweet sample consists of 25,641 tweets from 1,753 dyads (4,248 F»F, 5,261 F»M, 5,482 M»F, and 10,650 M»M tweets). The average number of tweets per dyad was 14.6.

3.1.2. Exchange sample

To capture language use in more substantial exchanges between two individuals, we prepared a sample of “exchanges”. To ensure significant body of interaction between users in each dyad, in the Exchange Sample, each unit is an interaction that included at least 10 messages between the users in the dyad (|I_x»y|≥10). This sample was then used to look at the magnitude of use of different linguistic markers at the dyadic level, e.g., the proportion of pronouns included in an exchange. We assigned each exchange to a gender composition category using a three–level categorical variable, as each exchange can be characterized as Male–Male (MM), Male–Female mixed (MF), or Female–Female (FF). This process resulted in an Exchange Sample consisting of 1,343 dyads (222 FF, 565 MF, and 556 MM) that exchanged an average of 56.2 messages (median=29, SD=102.9) each.

3.2. Computed properties

Below, we summarize how the variables capturing linguistic style and social network properties were computed.

3.2.1. Linguistic markers

For our analysis, we computed language use variables for both tweet and exchange samples, mostly by using the “Linguistic Inquiry and Word Countt” (LIWC) dictionary, a widely used and studied language analysis system. We also derived descriptive network variables for each dyad in our dataset, described below.

We used the LIWC dictionary to generate tweet–level variables for each tweet in the tweet sample, and exchange–level variables for each exchange in the exchange. For the tweet sample, as tweets are too short for generating distinguishing and meaningful counts or proportions of words, we used the LIWC dictionary to code each tweet as 1 (includes at least one word in the linguistic category) or 0 (none). For the conversation sample, for each linguistic category, we calculated ratio_C, the ratio of the number of category words (identified by the LIWC dictionary) in each conversation C, to the total number of words in the conversation.

In this study we focused on use of pronouns (First Person Singular and First Person Plural), intensifier adverbs, and emoticons as linguistic style markers. For pronouns and adverbs, we used the LIWC dictionaries to compute the values in tweet and exchange samples. We calculated variables capturing the use of emoticons by tokenizing tweets into unigrams, and using a regular expression that identified 228 distinct “faces” in our data.

3.2.2. Structural network properties

While we are interested in conversations between dyads of different gender compositions, other variables may offer an alternative explanation for the content and style of conversation. Most prominently, the type and strength of connection between individuals in a dyad may play a role. The strength of ties (Granovetter, 1973) between people is known to affect linguistic style used in conversations. To account for the effect the strength of ties between users may have on interactions, we calculated the dyad’s number of common neighbors, a number that have been previously shown to be associated with tie strength in social media (Kivran–Swaine, et al., 2011).

The number of common neighbors is the number of Twitter connections shared by the members of the dyad. Formally, if the neighbors of node z are defined as N_z={w|w→z or w←z} then the number of common neighbors for a dyad (x,y) is |N_x∩N_y|. The number of common neighbors showed a log–normal distribution; we used the log–normalized number of common neighbors variable in all the tests reported below.

3.3. Analysis

The goal of the analysis was to capture the differences between dyads in different gender compositions, in their levels of use in each linguistic category we examined. For each linguistic category we examine, we test the influence of gender composition on two dependent variables, one computed from the tweet sample, and one from the exchange sample, as described next.

As previously mentioned, the unit of analysis in the tweet sample is a single tweet, and the dependent variable is the existence (0–1) of a linguistic category word in the tweet. We begin our analysis by constructing a binary logistic regression model for each language category, with the existence of the category word in the tweet as a two–level response variable. Each model uses two independent variables (IV): (1) the gender composition of the dyad interacting; and, (2) the number of common neighbors shared by the dyad’s members (see Figure 1). The gender composition IV is a four–level categorical variable representing the possible gender compositions for the dyad responsible for that tweet (M»M, M»F, F»M, F»F). The common neighbors IV is continuous, and log–normalized. We thus verify whether the existence of category words in replies can explained by the dyad’s gender composition, beyond the effect of the number of common neighbors shared by the dyad.

Figure 1: Regression model for the tweet sample.

An additional Pearson’s chi–square test, using the gender variable and the linguistic variable, helps us inquire further about nuanced differences between gender compositions. For a categorical variable like gender composition, the logistic regression model only allows us to reason about the difference between all levels and a single reference level (we used F»F as the reference level in our model, as shown in Figure 1). To supplement our statistical analysis we used the chi–square test and tested the null hypotheses about differences between gender compositions.

In the exchange sample, the unit of analysis is a dyad’s group of exchanges, and the dependent variable is based on the ratio_C value, capturing the proportion of words that belong to the linguistic category in exchanges. The distribution of the ratio score is not normal, log–normal, or standardized. Therefore, we turn the ratio value for each dyad into a use level {low, medium, high}. To do that, for each linguistic category, we calculate the ratio’s mean (M) and standard deviation (SD). Then, we label the use level of a dyad’s use of the linguistic category in a conversation as “low” if ratio_C<M–SD; “medium” if M–SD<ratio_C<M+SD; and, “high” if ratio_CM+SD.

Similar to the analysis of the tweet sample, we start our analysis by constructing a multinomial logistic regression model for each category, to explain use level of the category words in the conversation (a three–level response variable). Each model uses two independent variables (IV): (1) the dyad’s gender composition; and, (2) the number of common neighbors shared by the dyad’s members. The gender composition IV for the tweet sample is a three–level categorical variable representing the possible gender compositions for the dyad (MM, MF, FF). The common neighbors IV is the same variable we used in the tweet sample. For the purpose of capturing group differences between gender composition categories, a Pearson chi–square test was performed to test the null hypothesis (i.e., that no differences between gender compositions will be observed). The test looks at the relationship between two variables: the low–medium–high language use levels, and gender composition.

Since our hypotheses involved multiple tests using the same variables (i.e., the gender composition), we controlled for the higher likelihood of false positive results by using the Bonferroni correction, which asks for a significance level of α/n when conducting n tests at once. Thus, for the chi–square tests for both samples, when reporting the results, we point out those that are significant within the Bonferroni correction (p<.01, given the number of tests we perform). As regression models as a whole do not have associated significance values, we cannot perform this sort of correction for our regression models.

3.4. Results

Before looking at the relationship between gender composition of dyads and language in communication, we investigated the relationship between the number of common neighbors and gender composition. The results of the ANOVA test suggest, as expected, that there are significant differences between gender groups in relation to number of common neighbors (for tweet sample, F(3,23818)=87.66, p<.001). Post–hoc analysis in both cases revealed that MM dyads had significantly more common neighbors than FF dyads, who had significantly more common neighbors than mixed–gender dyads. These homophilous tendencies that suggest that stronger ties exist between individuals of the same gender, and further demonstrate the need to control for the number of common neighbors in our analyses.

3.4.1. Linguistic style

We now present the results for the use of each linguistic style feature by dyads of different gender compositions. For the regression models, we report odds ratios (OR) and only report the significant contributing factors (with p<.05).

3.4.1.1. First person singular pronouns

The use of first person singular pronouns (FPS) is significantly affected by gender composition, even when controlling for the effect of the common neighbors variable, with FPS use more likely between all–female dyads and less likely for all–male, supporting H1a and H1b. For the tweet sample, the binary logistic regression model predicting existence of FPS in replies showed that an increase in number of common neighbors makes the existence of FPS in replies slightly less likely (OR=0.94,p<.001, meaning that for each ten–fold increase in number of common neighbors, the likelihood of FPS in replies decreases by a factor of 0.94; the further OR is from 1, the stronger the effect of the variable). But even beyond the common neighbors effect, the sex compositions M»F (OR=0.81, p<.001), F»M (OR=0.86, p<.001), and M»M (OR=0.78, p<.001) make the existence of FPS in tweets less likely compared to the F»F reference category. In this case, OR reflects the ratio of the likelihood of FPS in a M»F tweet (for example) to the likelihood of FPS of the reference F»F tweet. In the exchange sample, the multinomial logistic regression results showed that MM sex composition (OR=0.96, p<.005) makes high–level FPS use slightly less likely than low–level.

The chi–square tests reveal more nuanced information about group differences in FPS use (here, without controlling for the number of common neighbors). The results for the tweet sample indicated that significantly higher proportion of replies by F»F dyads (52.4 percent) and lower proportion of replies by M»M dyads (46.7 percent) contained FPS compared to M»F and F»M replies (48.4 percentbetween them). The exchange sample also showed similar group differences with respect to gender composition of dyads: a significantly higher proportion of FF dyads (20.3 percent) and a significantly lower proportion of MM dyads (9.7 percent) exhibited high levels of FPS use compared to MF dyads (13.1 percent). The test results were significant for both the tweet sample, χ² (3, N=25641)=41.02, p<.001, as well as the exchange sample, χ² (4, N=1343)=16.55, p<.005.

3.4.1.2. First person plural pronouns

There are significant differences between gender groups with regards to their use of first person plural pronouns (FPP), even when controlling for the effect of the common neighbors variable: FPP use is more likely between all–female dyads, supporting H1a and H1b.

The binary logistic model on the tweet sample showed that the number of common neighbors had a significant positive effect on FPP existence (OR=1.26, p<.001) (i.e., the more common neighbors a dyad has, the more likely they are to use words like “we” in a tweet). The model also revealed that beyond the effect of common neighbors, the M»F (OR=0.79, p<.005) and F»M (OR=0.84, p<.05) gender compositions makes the existence of FPP in replies less likely compared to the F»F reference. However, the regression model on the exchange sample did not expose significant results.

The chi–square test results from the tweet sample showed that a significantly higher proportion of replies from F»F dyads (7.8 percent) contained at least one FPP, compared to the other groups (6.6 percent between them). The analysis of exchanges showed similar, significant, yet less conclusive outcome, with higher proportion of FF dyads (9.9 percent) using high levels of FPP compared to MF (7.3 percent) and MM (8.8 percent) dyads. However, for both samples, differences were not significant according to the Bonferroni correction requirements, with the tweet sample results at χ² (3, N=25641)=11.01, p=.01, and the conversation sample at χ² (4, N=1343)=9.88, p<.05.

3.4.1.3. Intensifiers

The use of intensifier adverbs is significantly affected by gender composition, even after controlling for the number of common neighbors. Intensifiers are more likely to be used between all female dyads, and less likely to be used when a woman is interacting with a man, lending support to H2b and partial support to H2a.

In the binary regression model on the tweet sample, the contribution of the common neighbors variable to the overall model was not significant. Nevertheless, the existence of intensifiers were more likely in tweets by F»F dyads than F»M (OR=0.79, p<.001), M»F (OR=0.85, p<.001), or M»M (OR=0.84, p<.001) dyads. On the other hand, no significant effects were seen in the regression model for the exchange sample.

Chi–square test results from the tweet sample show that women used intensifiers more frequently when interacting with women, but less frequently when interacting with men. A significantly higher proportions of replies by F»F dyads (42.6 percent) and a significantly lower proportion of F»M dyads (37 percent contained at least one intensifier, compared to the replies by M»F and M»M dyads (38.5 percent between them). Similarly, in the exchange sample, a significantly higher proportion of conversations in FF dyads (90.5 percent) exhibited high or medium level of intensifier use, when compared to the conversations in MF (83 percent) and MM (83.5 percent) dyads. The test results for the tweet sample were significant, χ² (3, N=25641)=33.53, p<.001. The exchange sample results were not significant according to the Bonferroni correction requirements, χ² (4, N=1343)=11.96, p<.05.

3.4.1.4. Emoticons

The use of positive emoticons (e.g., “(:”) in Twitter interactions was influenced by the gender composition of the dyad, controlling for the number of common variables. Overall, women exhibited more frequent and higher levels of positive emoticon use when compared to men, and men used emoticons more when talking to women than talking to men, supporting hypotheses H3a and H3b. We must note that neither of our samples included sufficient numbers of negative emoticons to be used in analysis and statistical tests.

The regression model on the tweet sample illustrated that even when accounting for the number of common neighbors, M»F (OR=0.81, p<.005) and especially M»M (OR=0.57, p<.001) dyads makes the existence of positive emoticons in replies less likely. The number of common neighbors did not contribute significantly to the model in the tweet sample, but did show an effect in the exchange sample, with higher values of common neighbors making the high level use of positive emoticons more likely (OR=2.29, p<.05).

The chi–square test results on the tweet sample analysis suggested that a significantly higher proportion of replies by F»F (11 percent) and F»M (11 percent) dyads, and a significantly lower proportion of replies by M»M dyads (7 percent) included a positive emoticon compared to the proportion of M»F dyads (9 percent). Similarly, the exchange sample analysis shows, for example, that a significantly higher proportion of MM interactions (60 percent) exhibited low levels of positive emoticon use, when compared to other groups (47.1 percent). Group differences between gender compositions in their use of positive emoticons were significant in both the tweet sample, χ² (3, N=25641)=126.14, p<.001, and the exchange sample, χ² (4, N=1343)=23.76, p<.001.

Discussion

Overall, our findings highlight key gender differences in linguistic style, even after controlling for tie strength between conversing users. Gender differences revealed in our analysis have mostly confirmed observations in traditional settings; women use higher levels of FPP, FPS, intensifiers, and emoticons in their speech, with levels escalating even more when they converse with other women, hinting at accommodation.

Linguistic style differences may be exhibited not only through shifts in levels of use, but also through how certain linguistic features are used. For instance, use of different kind of intensifiers (e.g., “totally” vs. “absolutely”) may signify different linguistic styles. Therefore, following our initial inquiry, we performed secondary analyses on token–level, to find words in each language category that are the best predictors of each gender composition. Since our language categories are based on dictionaries, where each category is defined by a set of words, e.g., intensifiers, we can measure how the use of individual words from each category differs between dyads of different gender compositions. To this end, we used the tweet sample to perform an analysis of word use for each category.

For our token–level analysis, we looked at the degree of “predictiveness” of each word in each category with regard to the gender composition of a message. Specifically, each word in the vocabulary that was used by a number of users above a predefined threshold (100 in our analysis) was scored with regard to each gender composition according to the following function: pred(t,c)=f(t|c)/f(t), where f(t|c) is the fraction of tweets with the gender composition c that contain the token t, and f(t)is the fraction of tweets containing the token t in our dataset as a whole. We can then examine the high–scoring tokens for each gender composition, i.e., words which are much more likely to occur in that gender composition compared to others.

Next, we report insights we gained from the results in each category, along with further findings in token–level, to inform future hypotheses and directions of research.

4.1. Presupposed familiarity and talking about “us”

In our models, the number of common neighbors partially explained FPP and FPS use (i.e., the more people a dyad knows in common, the more likely they are to use the word “we”, and the less likely to use “me”). But even after accounting for the effect of the number of common neighbors, the use of personal pronouns was found to be associated with gender composition; with female–to–female exchanges much more likely to contain both FPS (“I”) and FPP (“we”) words, supporting hypothesis H1band providing partial support to H1a.

We performed token–level analysis on FPS and FPP use across gender compositions. Terms that stand out in their predictive power for each gender composition can be seen at Table 1. For example, the table indicates that F»F tweets use the FPS “my” in their tweets 18 percent more frequently than it is used in the dataset as a whole, whereas tweets containing “mine” are 25 percent more frequent among M»F messages than they are overall. Token–level analysis for FPP use showed that the FPP “we” can be a significant marker for distinguishing replies by females (F»M and F»F messages; see Table 1).

These results suggest that, in their Twitter interactions, women tend to reference both themselves and others, more than men do. Moreover, the finding that the FPP “we” is a strong predictor of an utterance by a woman may imply that women in fact make communal references more frequently in their speech. In other words, it is likely that in their interactions, women refer to their partner in communication in unity with themselves, or they speak on behalf of others in the social context of a given communication. The increased FPS use by women also reveals how women might tend to make themselves the primary subject of their speech significantly more often than men. In general, the female linguistic style that was manifested in our study is more socially aware than linguistic style exhibited by men. This may be due to the fact that even when conversing with those they feel close to, in Twitter, women’s interactions are more about people and social happenings, whereas men prefer a style that is less personal. While our findings are not conclusive nor do they explain why these differences exist, future studies on social and behavioral predecessors of this particular linguistic difference could be valuable.

4.2. Embellished language and cues of discourse

The use of intensifiers was indeed shown to be more common with and between women, supporting the hypotheses H2a and H2b. Consistent with previous work, increased use of intensifiers is a marker of “female–style language”. This effect is heightened even more when the recipient of the message, as well as the sender, is a woman, suggesting communication accommodation.

It is possible that in their social media interactions, especially when the interactions are with a familiar audience, women, more so than men, aim to make their messages more captivating, influencing, and interesting. Intensifier adverbs can also be perceived as powerful tools to economically add detail and color to utterances in Twitter, where the length of utterances are strictly limited.

When we investigated, through token–level analysis, whether intensifiers were used differently between men and women, we discovered that intensifiers might in fact be strong linguistic features for further investigation of predicting gender composition of conversations. We found that for each gender composition group, there was at least one distinct intensifier that distinguished the group from the others (for example, M»M dyads used “actually” more often than other groups; see Table 1).

While intensifiers alone do not bear significant meanings, they can set up the tone of the message that they accompany, and clarify the intention of the speaker that uses them. A look at the most distinctive intensifiers for gender compositions exposed a potential trend: while the adverb “too” was the strongest predictor of an interaction by a woman directed to another woman, “actually” was the strongest predictor of an interaction from a man directed to a man. This finding puts forward the possibility that in their interactions especially with other women, women aim to emphasize their intentions of compliance and social harmony (e.g., “I like it too”, “You can do it too”). However, in man–to–man messages, in more so than other gender compositions, the tone is more argumentative, as the distinctive use of “actually”, an adverb used for clarification or correction purposes, exposes.

4.3. Men (still) don’t put on happy faces :)

The analysis of use of emoticons followed the same trend, supporting H3a; women used significantly higher levels of emoticons than men. As hypothesized (H3b), men used emoticons more frequently when interacting with women. We do not include a token–level analysis for individual emoticons, as the most predictive emoticons did not appear frequently enough in the dataset for us to treat them as sufficient indicators.

We noted that the use of emoticons exhibited a drastic drop in conversations between men. Almost two–thirds of the MM interactions had a low level of emoticon use. As emoticons essentially are symbolic representations of non–verbal cues, we look at literature in sex roles and communication to explain the trend that we observed. Stereotypically, women have been believed to be more emotionally expressive, verbally and non–verbally, than men. These perceptions have also been empirically observed in previous studies (Briton and Hall, 1995). In most Western cultures, it is the expected norm for men to suppress any emotional expression. Brody and Hall conclude that this training of suppression results in men perceiving non–verbal communication as irrelevant and unimportant, and accordingly giving less emphasis to non–verbally appending their communicative acts (Brody and Hall, 1993). Our results suggest that, even in CMC channels, where individuals supposedly are liberated in selecting their communication styles, they conform to established gender norms, knowingly or unknowingly.

4.4. “Love” versus “dude”

Finally, for each of the language categories we could point out distinctive tokens (words) used by different gender compositions, even when the use of that language category overall was not different between gender compositions. Indeed, the special characteristics of Twitter and our dataset motivated us to explore other stylistic categories, which may be associated with gender. To this end, we further examined the most predictive words for each gender class out of all the terms that appear in the data. The predictiveness of terms was measured similarly to the token level analysis described earlier in the Analysis section. Here, we again required support threshold of 100 occurrences per word (we only include terms that occur at least 100 times in one conversations of one of the gender compositions).

The results are shown in Table 2, with the top 10 most predictive words for each gender composition, along with their predictive value. Among the words in the table, we can see several which belong to the categories we examined (e.g., positive emotion words such as “good” and “love”, and intensifiers such as “so” and “too”). We also see several other categories of potential interest, such as words used to address the recipient (“u”, “dude”, “man”), third person pronouns (“her”, “him”, “she”), and question words (“when”, “how”, “will” and “what’s”). This last category is especially of interest, since, it pertains to the way that users express interest and involvement in their conversational partner’s feelings and emotional status (“what’s up?”,“how are you?”). These features can suggest additional differences in style of interaction between gender compositions, within, and outside Twitter.

5. Conclusion

The Internet (and social media) has often been lauded as a liberative space in which individuals can express themselves in any way they choose. However, our findings show that large majorities of people continue to engage in familiar patterns — namely, gendered communication. We show that gender communication on directed semi–public responses on Twitter confirms known language and expression tendencies, and largely follows what is known from other settings. Our results are reminiscent of Nakamura’s findings (2002) about race online, particularly that structured issues that affect our non–digital lives follow our digital lives as well. However, we are cautious not to reduce the findings of this study to a simple conclusion of “Men talk like this, and women talk like that.” It is possibly the case that the majority presence of gender normative users in Twitter is drowning out others. As such, particularly in an era of machine learning, classifiers, and big data, we believe that future research should focus on ways of detecting more subtle variations of gender performance.

Our study exhibits a number of key limitations. Focusing our research on Twitter, we acknowledge that there is a significant bias in terms of the people using this service, and further, participating in publicly directed correspondence in it. Indeed, there is opportunity to extend and further verify our results. Are there additional variables that can explain language variation? A survey method, or further coding of profiles or relationships for additional characteristics (e.g., geographic location) can help refine sociolinguistic elements that are in play.

Social media services can be an exciting laboratory for studying human language, where language and its variation can be studied in an environment where it naturally occurs. Social media thus provides a significant opportunity for research, to extend and develop an understanding of language use in social and cultural groups, and relate language style and variation to other forms of social processes like relationship formation, status, emotional well–being, and more.

About the authors

Funda Kivran–Swaine is a Ph.D. candidate in the School of Communciation & Information at Rutgers University.
Web: http://www.fundakivranswaine.com
E–mail: funda [at] rutgers [dot] edu

Samuel Brody is a software engineer at Google.
E–mail: sdbrody [at] gmail [dot] com

Mor Naaman Mor Naaman is an Associate Professor at the Jacobs Technion–Cornell Innovation Institute at Cornell NYC Tech.
E–mail: mor [dot] naaman [at] cornell [dot] edu

Acknowledgements

All authors were affiliated with Rutgers University at the time of the research.

References

Nancy Baym, 2002. “Interpersonal life online,” In: Leah A. Lievrouw and Sonia Livingstone (editors). The handbook of new media: Social shaping and consequences of ICTs. Thousand Oaks, Calif.: Sage, pp. 35–55.

Alexander Bergs, 2006. “Analyzing online communication from a social network point of view: Questions, problems, perspectives,” Language@Internet, volume 3, athttp://www.languageatinternet.org/articles/2006/371, accessed 26 August 2013.

Allan Bell, 1997. “Language style as audience design,” In: Nikolas Coupland and Adam Jaworski (editors). Sociolinguistics: A reader. New York: St. Martin’s Press, pp. 240–250.

Philip Bramsen, Martha Escobar–Molano, Ami Patel, and Rafael Alonso, 2011. “Extracting social power relationships from natural language,” Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 773–782, and at http://www.aclweb.org/anthology-new/P/P11/P11-1078.pdf, accessed 26 August 2013.

Nancy J. Briton and Judith A. Hall, 1995. “Beliefs about female and male nonverbal communication,” Sex Roles, volume 32, numbers 1–2, pp. 79–90.
http://dx.doi.org/10.1007/BF01544758, accessed 26 August 2013.

Leslie R. Brody and Judith A. Hall, 1993. “Gender and emotion,” In: Michael Lewis and Jeannette M. Haviland (editors). Handbook of emotions. New York: Guilford Press, pp. 447–460.

Sheila Brownlow, Julia A. Rosamond, and Jennifer A. Parker, 2003. “Gender–linked linguistic behavior in television interviews,” Sex Roles, volume 49, numbers 3–4, pp. 121–132.
http://dx.doi.org/10.1023/A:1024404812972, accessed 26 August 2013.

Donald J. Cegala, 1989. “A study of selected linguistic components of involvement in interaction,” Western Journal of Speech Communication, volume 53, number 3, pp. 311–326.
http://dx.doi.org/10.1080/10570318909374309, accessed 26 August 2013.

Cristian Danescu–Niculescu–Mizil, Michael Gamon, and Susan Dumais, 2011. “Mark my words: Linguistic accommodation in social media,” WWW ’11: Proceedings of the 20th International Conference on World Wide Web, pp. 745–754.
http://dx.doi.org/10.1145/1963405.1963509, accessed 26 August 2013.

Penelope Eckert, 1996. “Vowels and nail polish: The emergence of linguistic style in the preadolescent heterosexual marketplace,” In: Natasha Warner, Jocelyn Ahlers, Leela Bilmes, Monica Oliver, Suzanne Wertheim and Melinda Chen (editors). Gender and belief systems. Berkeley, Calif.: Berkeley Woman and Language Group, pp. 183–190; version at http://www.stanford.edu/~eckert/PDF/nailpolish.pdf, accessed 26 August 2013.

Alice H. Eagly and Wendy Wood, 1991. “Explaining sex differences in social behavior: A meta–analytic perspective,” Personality and Social Psychological Bulletin, volume 17, number 3, pp. 306–315.
http://dx.doi.org/10.1177/0146167291173011, accessed 26 August 2013.

Howard Giles, Justine Coupland, and Nikolas Coupland (editors), 1991. Contexts of accommodation: Developments in applied sociolinguistics. Cambridge: Cambridge University Press.

Amy L. Gonzales, Jeffrey T. Hancock, and James W. Pennebaker, 2010. “Language style matching as a predictor of social dynamics in small groups,” Communication Research, volume 37, number 1, pp. 3–19.
http://dx.doi.org/10.1177/0093650209351468, accessed 26 August 2013.

Mark S. Granovetter, 1973. “The strength of weak ties,” American Journal of Sociology, volume 78, number 6, pp. 1,360–1,380.

David A. Huffaker and Sandra L. Calvert, 2005. “Gender, identity, and language use in teenage blogs,” Journal of Computer–Mediated Communication, volume 10, number 2, at http://onlinelibrary.wiley.com/doi/10.1111/j.1083-6101.2005.tb00238.x/full, accessed 26 August 2013.
http://dx.doi.org/10.1111/j.1083-6101.2005.tb00238.x, accessed 26 August 2013.

Funda Kivran–Swaine, Priya Govindan, and Mor Naaman, 2011. “The impact of network structure on breaking ties in online social networks: Unfollowing on Twitter,” CHI ’11: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1,101–1,104.
http://dx.doi.org/10.1145/1978942.1979105, accessed 26 August 2013.

Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon, 2010. “What is Twitter, a social network or a news media?” WWW ’10: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600.
http://dx.doi.org/10.1145/1772690.1772751, accessed 26 August 2013.

William Labov, 1990. “The intersection of sex and social class in the course of linguistic change,” Language Variation and Change, volume 2, number 2, pp. 205–254.
http://dx.doi.org/10.1017/S0954394500000338, accessed 26 August 2013.

Robin T. Lakoff, 1975. Language and woman’s place. New York: Harper & Row.

Ronald W. Langacker, 1987. Foundations of cognitive grammar. Volume 1: Theoretical prerequisites. Stanford, Calif.: Stanford University Press.

LIWC, Inc., 2007. “The LIWC2007 application,” at http://www.liwc.net/liwcdescription.php, accessed 18 March 2013.

Andrew McCallum, Xuerui Wang, and Andrés Corrada–Emmanuel, 2007. “Topic and role discovery in social networks with experiments on Enron and academic e–mail,”Journal of Artificial Intelligence Research, volume 30, number 1, pp. 249–272.
http://dx.doi.org/10.1613/jair.2229, accessed 26 August 2013.

Peter Mühlhäusler and Rom Harré, 1990. Pronouns and people: The linguistic construction of social and personal identity. Oxford: Basic Blackwell.

Anthony Mulac, 2006. “The gender–linked language effect: Do language differences really make a difference?” In: Kathryn Dindia and Daniel J. Canary (editors). Sex differences and similarities in communication. Second edition. Mahwah, N.J.: Lawrence Erlbaum Associates, pp. 211–231.

Anthony Mulac, John M. Wiemann., Sally J. Wiedemann., and Toni W. Gibson, 1988. “Male/female language differences and effects in same–sex and mixed–sex dyads: The gender–linked language effect,” Communication Monographs, volume 55, number 4, pp. 315–335.
http://dx.doi.org/10.1080/03637758809376175, accessed 26 August 2013.

Mor Naaman, Jeffrey Boase, and Chih–Hui Lai, 2010. “Is it really about me? Message content in social awareness streams,” CSCW ’10: Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, pp. 189–192.
http://dx.doi.org/10.1145/1718918.1718953, accessed 26 August 2013.

Lisa Nakamura, 2002. Cybertypes: Race, ethnicity, and identity on the Internet. New York: Routledge.

John C. Paolillo, 1999. “The virtual speech community: Social network and language variation in IRC,” Journal of Computer–Mediated Communication, volume 4, number 4, at http://onlinelibrary.wiley.com/doi/10.1111/j.1083-6101.1999.tb00109.x/full, accessed 26 August 2013.
http://dx.doi.org/10.1111/j.1083-6101.1999.tb00109.x, accessed 26 August 2013.

Victor Savicki, Merle Kelley, and Erica Oesterreich, 1997. “Effects of instructions on computer–mediated communication in single or mixed–gender small task groups,”Computers in Human Behavior, volume 14, number 1, pp. 163–180.
http://dx.doi.org/10.1016/S0747-5632(97)00038-1, accessed 26 August 2013.

John R. Searle, 1969. Speech acts: An essay in the philosophy of language. Cambridge: Cambridge University Press.

Alan Sillars, Wesley Shellen, Anne McIntosh, and Maryann Pomegranate, 1997. “Relational characteristics of language: Elaboration and differentiation in marital conversations,” Western Journal of Communication, volume 61, number 4, pp. 403–422.
http://dx.doi.org/10.1080/10570319709374587, accessed 26 August 2013.

Cornelis Stoffel, 1901. Intensives and down–toners; A study in English adverbs. Heidelberg: C. Winter’s Universitätsbuchhandlung.

Diane F. Witmer and Sandra Lee Katzman, 1997. “On–line smiles: Does gender make a difference in the use of graphic accents?” Journal of Computer–Mediated Communication, volume 2, number 4, at http://onlinelibrary.wiley.com/doi/10.1111/j.1083-6101.1997.tb00192.x/full, accessed 26 August 2013.
http://dx.doi.org/10.1111/j.1083-6101.1997.tb00192.x, accessed 26 August 2013.

Alecia Wolf, 2000. “Emotional expression online: Gender differences in emoticon use,” CyberPsychology & Behavior, volume 3, number 5, pp. 827–833.
http://dx.doi.org/10.1089/10949310050191809, accessed 26 August 2013.

Editorial history

Received 22 March 2013; accepted 21 August 2013.