jueves, 25 de octubre de 2012

Primas, sesgo de asimiliación y prueba social en Twitter


Priming, Assimilation Bias,
Social Proof in Social Media


Social media world is abuzz about the upcoming US election.  Who has the upper hand in the debates?  Who has the better budget plan?  Who has the heart of the middle class voters?  How are social media users trying to influence their favorite candidates?  In an effort to understand some of these questions, it is probably useful to draw some background knowledge from social theories and psychological research to understand these phenomenons. 
First, there is the research on impression formation.  Psychologists have long known that impression formation is deeply connected to priming effects, but yet are the subject of considerable debate. Priming  is the exposure of some stimulus influencing the response to a later stimulus, including perceptual, semantic, or conceptual stimulus repetition.  For example, repeatedly seeing the word "economy" associated with a candidate help influence voters to think that particular candidate cares more about the economy.  Despite the recent debate about whether priming effects can be replicated reliably, most psychologists agree that the effect is real in many contexts.  A good overview article on priming and impression formation is done by Decoster and Claypool in 2004 [1].  So yes, some people are posting over and over again on exactly the same debate points on many social media sites, but what they're hoping for is a priming effect when voters go to the polls.  
Why is social media so important in recent elections?  Well, that's where many people are spending their spare time now, and you want to go where the attention lies.  A recent report by Lee Rainie et al. at Pew Internet Research show 60% of American adults using social networking sites, and 66% of those users have engaged in political activities in social media.
Second, once the impression forms, clearly there is a lot of Assimilation Bias going on in social media.  Also called confirmation bias, this is the tendency of people to favor information that confirms to their beliefs or hypotheses.  We all would like to believe that voters are rational actors that evaluate the evidence presented by the candidates equally, and then make an informed decision, but in fact, of course that's not the case.  What's worse is that Assimilation Bias contributes directly to Attitude Polarization, with people holding on to their belief stronger by actually searching for and interpreting evidence selectively.  
So how does this play out in social media?  Adamic and Glance's well-known paper [2] shows this effect quite directly with a visualization, with bloggers talking mostly within their own political spectrum, and few that bridges across the spheres.  Worse, some recent research posted on SSRN also suggest the spread of polarization to neighboring issues [3].
Third, once attitude polarization sets in, it appears that Social Proof adds fuel to the fire.  Social Proof is the social psychology fancy word for herd behavior --- the tendency to assume the action of others reflect the correct thing to do.  (Sometimes network analysts will refer to this as Preferential Attachment if they're talking about tie formation [4].) Therefore, a voter living in a blue neighborhood is more likely to vote blue, and vice versa.  It's conformity, pure and simple.
How does this play out in social media?  There has been considerable debate about whether it is influence or homophily that causes people to take on certain viewpoints.  A favorite topic of research particularly amongst marketers, a definitive answer will help them better understand how to spend their ad dollars (including political ads, of course!)  A good place to start on reading up on this debate is with the recent publication by Aral and Walker [5] in Science.
Computational social science have a long ways to go in collaborating with social scientists, political scientists, and other information scientists in understanding these phenomenons.  The wealth of data hints at the possibility of new findings and confirmation of these effects that have already been studied extensively in the lab.  Beyond confirming these effects, what else might we find?  How does social media affect these processes?  Presumbly it helps speed them up, but perhaps there is also a negative consequence of increasing polarization?  I can't wait to find out!
References
  1. DeCoster, J., & Claypool, H. M. (2004). A meta-analysis of priming effects on impression formation supporting a general model of informational biases. Personality and Social Psychology Review, 8, 2-27. doi: 10.1207/S15327957PSPR0801_1
  2. Lada A. Adamic and Natalie Glance. 2005. The political blogosphere and the 2004 U.S. election: divided they blog. In Proceedings of the 3rd international workshop on Link discovery (LinkKDD '05). ACM, New York, NY, USA, 36-43. DOI=10.1145/1134271.1134277 http://doi.acm.org/10.1145/1134271.1134277
  3. Palmer, Carl L., Driven to Extremes? Motivated Bias and Attitude Polarization in One-Sided Communication Flows (July 10, 2012). Available at SSRN: http://ssrn.com/abstract=1733983 or http://dx.doi.org/10.2139/ssrn.1733983
  4. Golder, S.A.; Yardi, S.; , "Structural Predictors of Tie Formation in Twitter: Transitivity and Mutuality," Social Computing (SocialCom), 2010 IEEE Second International Conference on , vol., no., pp.88-95, 20-22 Aug. 2010.  doi: 10.1109/SocialCom.2010.22
  5. Identifying Influential and Susceptible Members of Social Networks
    Sinan Aral and Dylan Walker
    Science 20 July 2012: 337 (6092), 337-341.Published online 21 June 2012 [DOI:10.1126/science.1215842]


Communications of the ACM

martes, 23 de octubre de 2012

Redes sociales digitales ayudan a las parejas


Crean una red social que ayuda en las relaciones de pareja

POR MAURICIO NOLO PEDRAT

Hombres y mujeres cuentan detalles de cómo fue la primera cita, el primer beso, los regalos, y el resto de los usuarios opinan. Clarín




Internet está penetrando cada vez más la vida íntima. Hombres y mujeres se liberan ante el anonimato que permite la red de redes y la utilizan para despejar dudas de todo tipo. Ahora, el sitio Wotwentwrong.com (del inglés Qué salió mal) propone ir un paso más allá. Crearon allí una comunidad virtual que ofrece a sus usuarios la posibilidad de compartir su vida amorosa con todos sus integrantes. El sitio ya tiene casi 3.000 usuarios registrados y unas 1.500 visitas diarias.
El funcionamiento es simple y dinámico. El interesado crea su cuenta y tiene la posibilidad de generar una línea de tiempo o Timeline con sus experiencias amorosas, pasadas o actuales.
Allí se indica la primera cita con detalles del encuentro, el primer beso, regalos, atenciones y gestos . De ese modo, los demás miembros de la comunidad podrán tener registro y antecedentes de cómo es el comportamiento de los integrantes de la pareja.
Estos antecedentes sirven, además, para orientar a los opinólogos virtuales en situaciones de crisis. Para ello cuenta con la aplicación Impressions o Impresiones. Allí se pueden cargar datos sobre los momentos difíciles, como el típico mensaje de texto que, a priori, parece conflictivo o aparenta ocultar un reclamo. El usuario debe especificar el contexto, sus propias conclusiones y el textual que le generó la duda. De ese modo el resto de los internautas podrán ofrecerle otro punto de vista sobre el escrito que le despertó los interrogantes.
Esa necesidad de ayuda mutua y de compartir la intimidad puede ser beneficiosa y, a la vez, puede marcar un problema de la persona que la requiere. Así lo explica el doctor Carlos Emilio Antar, miembro de la Asociación Psicoanalítica Argentina (APA) y de la International Psychoanalytical Association (IPA): “Todo depende de cómo uno se maneje. Hay gente que necesita estar todo el tiempo consultándoles todo a los demás. El tema es que un exceso en esa área puede indicar una dificultad en la propia percepción de la persona ”.
Antar también describe otro tipo de casos, en el que las personas son incapaces de ver por sí mismas las dificultades: “Hay gente a la que le cuesta mucho tomar una decisión. Les puede estar pasando un camión por al lado y no lo pueden ver. Ese tipo de personalidades necesitan que alguien les diga ‘escuchame, ¿no te das cuenta que de que esta historia está un poco oscura, que hay otra persona jugando en la cancha?’”, ejemplifica.
Son dos casos puntuales y extremos, según el psicólogo, pero la realidad es que “ uno necesita tener comunicación comparativa” que despeje dudas , como en los casos en los que la persona no está convencida de qué tan real es lo que ve, o si simplemente es algo que cree estar viendo. Ese caso se da comúnmente en alguien que tiene sospechas de que su pareja le puede estar siendo infiel.
Si bien el atractivo del consejo anónimo es grande, tener en cuenta los dichos de una persona tendenciosa puede perjudicar una relación con futuro , y Wotwentwrong.com ofrece un escenario ideal para esas prácticas negativas. De todos modos, como cualquier tipo de consejo, el usuario es quién deberá decidir si tomarlo o no. Y si permite que la Web, de a poco, siga penetrando en su intimidad.

lunes, 22 de octubre de 2012

Radiografía de un ataque de spam

Deconstructing a Twitter spam attack

Data analysis shows the structure of a network can separate true influencers from fake accounts.


Strata - Making Data Work (http://s.tt/1q7Z0)

There has been a lot of discussion recently about the effect fake Twitter accounts have on brands trying to keep track of social media engagement. A recent tweet spam attack offers an instructive example.
On the morning of October 1, the delegates attending the Strata Conference in London started to notice that a considerable number of spam tweets were being sent using the #strataconf hashtag. Using a tool developed by Bloom Agency, with data from DataSift, an analysis has been done that sheds light on the spam attack directed at the conference.
The following diagram shows a snapshot of the Twitter conversation after a few tweets had been received containing the #strataconf hashtag. Each red or blue line represents a connection between two Twitter accounts and shows how information flowed as a result of the tweet being sent. By 11 a.m., individual communities had started to emerge that were talking to each other about the conference, and these can clearly be seen in the diagram.
Strataconf tweeting communities
The diagram below shows a further visualisation, this time after 30 minutes of listening to the conversation. In an organic conversation, developing of its own accord, you would expect to see lots of random connections and a number of communities spread across the network.
Strataconf Twitter conversation: random conversations and communities taking shape
If we zoom into the network to seek out the spammers the tool has identified, we start to see some different patterns, as shown in the diagram below.
Spammers show up on the fringe of the real Twitter conversation
The spammers are not involved in the conversation, but exist on the fringe of the conversation. They aren’t able to get a message directly to the people tweeting about #strataconf, as those accounts don’t follow the spammers, but by putting #strataconf at the beginning of their tweet, the clear intention is that those searching for tweets about the conference will pick up on their content.
If we pull out just those accounts we identify as spammers, we see a far-from-random pattern emerging. These patterns are well known to the researchers at Bloom Agency and are used to train the tool to identify and spot potential spammers. The spammers’ network is too highly organised and shows too much structure. There isn’t enough randomness in this network: it has clearly been generated for a purpose, and likely by a computer.
Spammers show up on the fringe of the real Twitter conversation
By identifying spamming accounts through how much structure they bring to the network, the tool can produce a list of true influencers or a list of true followers, rather than including a list of fake accounts.
For example, at 11:15 on the Monday morning during the conference, a tweet from @MarieBoyd14 was flagged as suspicious. It said:
“#strataconf Can not believe I ran across this kind of http://t.co/79fGWudr”
If you search for @MarieBoyd14 right now, you’ll find the account has been suspended. The account was seemingly suspended within minutes of posting the tweet.
The same shortened URL was posted six times in quick succession, between 11:15:43 and 11:17, before the account was suspended.
The first tweet picked up here, at 11:15:43, was shown as the user’s 78th tweet: the account had not been active for very long. By the time the sixth tweet featuring this shortened URL was observed, the tweet count was up to 93. Even the most prolific of conference tweeters couldn’t manage 15 tweets in less than two minutes, unless their finger got stuck on the “tweet” button.
Another tweet that began with the hashtag was received five seconds after @MarieBoyd14′s, at 11:15:48. The tweet was from @RosalindaKline8. Again, if you search for this account, you’ll find it’s been suspended. The tweet said:
“#strataconf I can’t believe this… Is the real deal? http://t.co/GKc4rnr5″
Although the format of the t.co link is different, this link directs the user to the same domain: the
http://barsa1.free-football.tv domain.
@RosalindaKline8 tweeted this link, with different text, seven times between 11:15 and 11:19. This account fits the same profile as the @MarieBoyd14 account, where the account is relatively new, posts up to 100 tweets very quickly, and is then suspended.
Two clear patterns emerged. First, the accounts being used to generate the messages were named after females with a number at the end of the account. Next, the messages all started with the conference hashtag.
In a 30-minute period, 424 tweets were recorded from 140 different accounts, at a rate of 14 tweets per minute. On deeper investigation, it was found that all the spammer accounts had IDs starting with 85613, suggesting the accounts had all been created around the same time. The accounts were all seemingly suspended within a few minutes of the last tweet being sent.
In the 30-minute time period being discussed here, there were 750 tweets recorded, from 306 different accounts, at a rate of 25 tweets per minute. More than half the tweets were from spammers: discounting the spammers, the rate would have been around 10 tweets per minute.
Another link being propagated by these accounts was to the URL: http://yourson999.tk/rivers.php. On investigation, it was found that this site is generating headers with the HTTP 203 response, rather than the 200 or 301 header response we expected. This suggested something unusual was going on. Upon further inspection, it was found that the URL was directing traffic to different end points, seemingly at random. Each time the URL was generating traffic to third-party ecommerce sites, and each time with an affiliate referrer attached. This was likely an attempt to direct traffic to ecommerce websites while securing affiliate referrer fees for the organisation or individual behind the attack.
On the surface, a tweet spam attack may seem like a limited hindrance, but there’s an important repercussion that needs to be considered. The spammers had a big impact on the basic metrics used to measure the spread of the #strataconf hashtag. Without a spam filtration embedded within a social media listening tool, the tool is in danger of giving inflated figures to the organisation using it. If these figures are used by brands to make decisions about future campaigns, the spammers can change the numbers so much that the wrong decisions could be made.
Peter Laflin is Head of Data Insight at Bloom Agency, an integrated marketing agency based in Leeds, UK. Peter is interested in using big data to predict how consumers behave and how predictive modelling can be used to gain a commercial edge.

Strata - Making Data Work (http://s.tt/1q7Z0)

jueves, 18 de octubre de 2012

El libro de la red política 2012


2012 Political Book Network

I have been mapping political book networks since before 2004 U.S. presidential election. These network maps are like a social graph of books.   The data is gathered from Amazon.com -- their list of top political books.  Two books are linked if they were often bought together, or by the same buyer.  These are also-bought pairs -- people who bought this book also bought that book.

During the the 2008 election the political book map reflected the deep divide in the country between conservative (RED) voters and liberal (BLUE) voters.  There were no connections, nor any intermediaries between red and blue books -- each cluster was completely closed off to the other. There was a separate cluster of people reading books on the then new candidate -- Obama, but they were not interested in reading/purchasing other political books (upper left corner of network map below).

2008 Political Book Network Map


I expected a similar pattern for 2012 -- a big chasm between right and left.  I thought the map would show each group honing up on their side's talking/debating points and ignoring books of non-conforming opinions.  I was surprised, the two clusters in October 2012 were connected by several books!  The hub in the center of the network, with spokes to many blue and red books, is The Price of Politics by Bob Woodward.  Woodward is viewed as a center-right journalist, and this book is about politics in general, so it makes sense that both sides would be reading his usually excellent prose.  No Easy Day, by one of the Navy Seals that took out bin Laden reads more like a novel, than a history book, attracting readers from all political persuasions.

The third bridging book was a surprise! The Little Blue Book is intended for a progressive audience -- it is a handbook for how to argue effectively with the right wing.  So, you would expect it to be firmly in the center of the dense blue cluster, right?  Wrong!  It has both blue and red readers!  I checked all editions of the books -- hardback, paperback and Kindle.  For the Kindle version, The Little Blue Book was connected (also bought) with other blue books, as expected.  It was with the paperback edition where I was surprised -- 4 of the first 10 also-bought books were red books!  Amazon shows their also-boughts by decreasing count/volume, therefore there were many instances of readers of certain red books were buying The Little Blue Book.  Why is this so?  Maybe the right wing is trying to understand the left wing and reading their blue handbook -- similarly to how they read the far left book Rules for Radicals during the 2008 election campaign.  

2012 Political Book Network Map


This year we also have books about the candidates -- their biographies and positions on major issues.  Obama has the same set of books as last election, Romney has his No Apology series, and Romeny's running mate is written up in the Young Guns book.  Potential voters appear to be reading books about both of the candidates -- Amazon readers are buying books about Romney and Obama together!  See books in upper left frame (2012 Candidate biographies) above.  

Another pattern is different in 2012 than in 2008. Now, people reading about the candidates, are also reading other political books.  The pattern is positive for Romney -- people reading about him are reading other red books -- not so, for Obama.  People reading his positive biographies and position books are also reading polemics attacking Obama.  The most influential anti-Obama book in the above network is Obama's America -- it is read by potential voters who are reading about both Obama and Romney.  See the link patterns in the upper left corner of the above diagram.

Even though the two book networks are connected, we still have a polarized voter base -- those are two strongly defined communities.  Running one of the network metrics from InFlow software, reveals two tightly defined clusters.  The E/I Ratio (External/Internal) is near -1.0 for both the blue and red groups indicating two exclusionary communities.  Polarization persists in America.

Can we use these network maps to predict the election?  Probably not.  The main insight I get from these maps is that the 2008 election provided a more clear cut choice for voters.  Although supporters of each candidate today would also say the choice is clear this time around (they always say that), the data does not support that. This time some readers are examining books from both sides... are these that small percentage of undecided voters who will likely decide this close election?  I bet each campaign would love to know who these Amazon readers are... and they may want to know each other!

Que noticias viajan vía Twitter


BBC vs. Wired: Whose news travels on Twitter?



U. ARIZONA (US) —News from BBC, Mashable, and the New York Times has the maximum reach on Twitter, according to an analysis of a dozen news organizations.


Researchers tracked what happened to a news article after it was tweeted by a news organization. They rendered the data they collected from each organization visually as images showing how the news is diffused. The network visualizations appear something like fireworks, with dots representing individual twitter users and cascade streams from those dots depicting retweets. (Credit: University of Arizona)


Sudha Ram, a professor of management at the University of Arizona, used network analysis to gauge how news agencies use Twitter to share news and how that news spreads via retweets.
Ram, who recently presented her findings at the International Workshop on Business Applications of Social Network Analysis in Istanbul, examined, over a six-month period, the Twitter activity of 12 major news organizations focused on US news, global news, technology news, or financial news.

The Twitter activity network for the New York Times shows a high number of users participating in long chains of tweeeting and retweeting. (Credit: University of Arizona)

The Twitter activity network for Reuters shows a high number of users posting direct retweets of news agencies’ tweets. (Credit: University of Arizona)
All of the agencies selected—the New York Times, Washington Post, BBC, NPR, Reuters, Guardian, Forbes, Financial Times, Mashable, Arstechnica, Wired, and Bloomberg—regularly share news articles on Twitter.
Ram and doctoral student Devi Bhattachary tracked what happened to a news article after it was tweeted by a news organization. Together, they looked at how many people retweeted, or reposted, the article on their own Twitter feeds, then how many times it was subsequently retweeted from those accounts and so forth.
They were then able to evaluate the volume and extend of spread of an article on Twitter, as well as its overall lifespan.
“The goal for a news agency is to have a lot of people reading and following your articles,” says Ram, who is also a professor of computer science. “What we’ve done is use network analysis, which is quite different from just looking at the total number of tweets or total number of retweets. You’re starting to see, over time, how information is spreading.”
Ram and Bhattacharya rendered the data they collected from each organization visually as images showing how the news is diffused. The network visualizations appear something like fireworks, with dots representing individual twitter users and cascade streams from those dots depicting retweets.
The images reveal different diffusion patterns for the different agencies, which can provide clues to those organizations about how their news is spreading and what they might want to focus on to be successful, Ram says.
“This gives them good feedback, and it’s kind of a performance report for them,” Bhattacharya adds. “It gives them an idea about the reading habits of people online and how they like to consume news.”

Of the organizations analyzed, BBC had the maximum reach in terms of affected users and retweet levels. BBC articles also had the highest chance of survival on Twitter, with 0.1 percent of articles surviving, through continual retweets, for three or more days.
The BBC’s high numbers were likely due in large part to the fact that the main “bbcnews” Twitter account also is supported by two other agency accounts—”bbcbreaking” and “bbcworld”—Ram notes.
The New York Times and Mashable had the second highest reach. Articles from Forbes, Wired, and Bloomberg had the shortest Twitter lifespans.
Overall, Ram says the data showed that articles on Twitter dissipate fairly quickly, with retweeting typically ending between 10 and 72 hours after an article is originally shared.
The Twitter study is a jumping off point for further research into how news is disseminated through various social media platforms, Ram adds. In December, Ram will present a follow-up paper at the Workshop on Information Technologies and Systems in Florida on the importance of Twitter-follower engagement for news organizations, as opposed to volume of followers.
“The term ‘social media’ refers to a lot of things. The first thing people think about is Facebook and then Twitter, but it’s so much more than that,” Ram explains. “It’s really all the various forums—the blogs, photo sharing sites, video sharing sites, microblogging, social bookmarking like Digg, Delicious and Reddit, and so on.”
Ram says she hopes to do more extensive research on news sharing and develop partnerships with news agencies to help them answer specific questions about their social media practices and performance.
“The idea is really to see if we can make some predictions,” Ram says. “What are some attributes of these networks that will help us make predictions? Is it number of followers? Is it engagement of followers?
“Is it what time you tweet? Is it who else is tweeting at the same time? Which are the more useful attributes that will help us predict, and therefore will help us give organizations suggestions on how to be more effective in spreading their news?
“Because ultimately their goal is more people reading their articles and talking about them.”