jueves, 3 de octubre de 2013

Software: Commetrix CMX para redes sociales dinámicas

Commetrix CMX Analyzer: Dynamic Social Network Visualization
Commetrix CMX Analyzer is a social network analysis platform from a German company Trilexis (www.trilexis.com) which originated in a research group at the Technical University of Berlin. (Note: the website, user interface and documentation are all in English.) What is interesting about this particular tool is its emphasis on the dynamics of social interactions over time. It achieves this through a data format that captures information about each individual link event including not only originator, destination and time but also user specified attributes which could include communication mode (email, IM, twitter), type of exchange (social, work, ecommerce), topic (e.g. keywords extracted from the subject).

Commetrix CMX Analyzer User Interface

A small subset of the Enron Email dataset –from the size and the individuals referenced we are guessing a single custodian - is provided for demonstration purposes. Part of our interest in this particular software is that we are familiar with the Enron dataset and had researched it using the social network analysis functionality of an eDiscovery system called MetaLINCS. We were curious to see what additional insights CMX Analyzer might provide. 

CMX Analyzer is a desktop tool built in Java incorporating the 3D graphical capabilities of Java 3D and the Java Media Framework. Once we had obtained the license key, the application was straightforward to install and comes with a user guide. To date we have only been able to try it out on the sample data set provided as the process of creating new data sets requires end-user coding (of link attributes) followed by a data transformation process that requires as separate tool (Commetrix Producer) or the data being sent to Trilexis for processing by their systems.

Commetrix Data Preparation Process:

Commetrix is not as functionally or visually rich as some of the other tools we have investigated and reported on in previous blogs (e.g. Gephi, nodeXL). However, where it comes into its own is in the dynamic visualization of email communications over time. The MetaLINCs software we had used in the past had provided a “time-slider” but was essentially a “snapshot” approach. Commetrix has time-sliders too but also animates the traffic creating a unique perspective on what is, after all, a time-based series of events. (We should also warn readers that the resulting animations make for highly addictive viewing. We were totally captivated!) The start-end of the time period can be set, as can the intervals and speed of animation. It is also possible to run the time line backwards as well as forwards. This makes it possible to identify “hot spots” of communication activity between group subsets at particular points in time. In other types of communication e.g. twitter or facebook – we can see how this would provide valuable insight into the evolution of a topic of discussion or a social group.

Snapshot of Communications: Jan 2000

Snapshot of Communications: Dec 2000
Visually, Commetrix is more limited than some of the other packages we have used e.g. it is not possible to pan or zoom. There are options to change node size and color to represent parameters such as communications sent, communications received, number of direct contacts. Color schemes cannot be chosen directly but can be set to show selected attributes e.g. the following screenshot shows nodes color coded by the ‘function’ attribute where dark blue represents employees, pale blue represents directors, green represents traders, wholly purple circles represent managers and purple circles with yellow centers represent in-house lawyers. (Note: we found the use of full and semi colored circles to be somewhat confusing).

Colorcoding by Function

Included in Commetrix is an “egoview” option which allows you to select a particular node and investigate communication to and from that individual node. Links can be filtered to include only direct communications (a 1-step link) or communications involving two or more steps. The image below, for example, shows communications to and from Sara Shackleton. While this capability is helpful focusing down on traffic to and from a node, in the case of email communications if the data set is from only one custodian, the egoview has limited value when used outside that custodian as it will show only those communications that happen to have been referenced in emails sent to and from the primary custodian i.e. it is an imperfect sample. 

Screenshot Showing Ego View - Tana Jones

Commetrix also comes with a Keyword filter. The intent is to allow the user to focus on interactions “about” the selected keywords. The interface is less obvious than some of the other areas and we confess to wondering if there was a bug until – rereading the manual – we realized that “In” didn’t mean “inbound” but include and “Out” meant exclude. Selecting the terms was also rather tedious as it meant scrolling through a long list of options. To validate the filtering, we took ‘california’ related terms and looked to see if Jeff Dasovitch was included, which he is – see screenshot below. It would be interesting to see this concept better developed with better keyword lists, more complex keyword filtering options and possibly the employment of automated topic determination techniques such as keyword clustering.

Screenshot Showing Use of Keyword Filter
Although the enron data set was provided only for demo purposes – having worked with this data, we were curious about two things: firstly how were the keywords derived (we guessed email subject but some of the keywords were email domains – indicating other metadata might have been used as well – and some phrases had been concatenated (e.g. ‘californiaattached’) or include a leading article (e.g.thenumber), or word fragments (e.g.’t’, ‘e’). Secondly, and more importantly, how were the “identities” of the individuals represented by the nodes resolved? This is always a major issue in email communications if the only information about senders and recipients is an email address. Most individuals have multiple email addresses – even within companies – and the names on email addresses may be difficult to resolve to a single individual. We raise this question because MetaLINCS included functionality that attempted to link individuals with their email accounts based not only on email address but also on communications patterns. Even then, many individuals/email accounts that a human would identify as probably being connected, could not be automatically linked. We are guessing that the identity of individuals was manually coded since the node table has a clean one-to-one mapping between individuals and a single email address.

In summary, while we think some of the other software we have used and researched offer better social network visualization options, we really liked the time-line animation Commetrix provides and believe it could be very helpful when studying the evolution of a network or communication patterns over time. While the keyword filtering option was disappointing in both the implementation and the demo dataset provided, we think it has obvious potential – particularly when analyzing large data sets of email, IM and twitter – in enabling users to focus in on only those communications “about” a particular topic. Of course, with that come all the provisos of using keywords as a substitute for “aboutness” but if it was combined with stemming, a better stop word list, and some form of thesaurus (to apply synonyms automatically) it would be very powerful.

Chroma Scope

No hay comentarios:

Publicar un comentario