martes, 27 de noviembre de 2012

Estadísticos y agrupamiento de grafo

Graph statistics and graph clustering
Participants : Daniel Archambault, Romain Bourqui, Maylis Delest, Frédéric Gilbert, Guy Melançon, François Queyroi, Arnaud Sallaberry, Paolo Simonetto, Faraz Zaidi.

Community detection in static networks
Searching of information on the web is a frequent task requiring some sort of organization to facilitate the searching process. Often this information is distributed, semistructured, overlapping and heterogeneous. Organization and structuring this information is an active area of research where the goal is to help users locate required information efficiently. Clustering is a well known technique to group similar information. Although often described as the unsupervised learning, clustering is quite trivial, as it often requires human intervention in terms of a number of parameters to guide the process. We address the clustering problem of web pages where the goal is to organize information to facilitate users for faster access to required information. In [30] We introduce a hierarchical fuzzy clustering algorithm to organize web pages. The algorithm uses a topological decomposition on the co-occurrence network of keywords to devise heuristics which help determine the input parameters for our clustering algorithm. Finally, we compare the results of the proposed algorithm with existing algorithms in the literature.

The exponential growth of data in various fields such as Social Networks and Internet has stimulated lots of activity in the field of network analysis and data mining. Identifying Communities remains a fundamental technique to explore and organize these networks. Few metrics are widely used to discover the presence of communities in a network. We argue that these metrics do not truly reflect the presence of communities by presenting counter examples. This is because these metrics concentrate on local cohesiveness among nodes where the goal is to judge whether two nodes belong to the same community or vise versa. Thus loosing the overall perspective of the presence of communities in the entire network. In [29] , we propose a new metric to identify the presence of communities in real world networks. This metric is based on the topological decomposition of networks taking into account two important ingredients of real world networks, the degree distribution and the density of nodes. We show the effectiveness of the proposed metric by testing it on various real world data sets.

Community detection in dynamic social networks


Figure 7Community detection in social networks – Framework overview

Detection of community structures in social networks has attracted lots of attention in the domain of sociology and behavioral sciences. Social networks also exhibit dynamic nature as these networks change continuously with the passage of time. Social networks might also present a hierarchical structure led by individuals who play important roles in a society such as managers and decision makers. Detection and visualization of these networks that are changing over time is a challenging problem where communities change as a function of events taking place in the society and the role people play in it. In [15] , we address these issues by presenting a system to analyze dynamic social networks (see Fig. 7 ). The proposed system is based on dynamic graph discretization and graph clustering. The system allows detection of major structural changes taking place in social communities over time and reveals hierarchies by identifying influential people in social networks. We use two different data sets for the empirical evaluation and observe that our system helps to discover interesting facts about the social and hierarchical structures present in these social networks.
In [35] we give the complete description of the graph decomposition algorithm used in [15] to generate overlapping clusters. The complexity of this algorithm is Im1 ${{O(|E|·de}g_max^2+{|V|·log(|V|)))}}$ . This algorithm is particularly efficient due to its ability to detect major modifications along dynamic processes such as time related ones.

Figure 8. Result of our decomposition algorithm on a subgraph of the "Hollywood graph" (actors graph) containing 421 movies. Our algorithm detected 404 of these movies.

Evaluation of clustering quality

Many real world systems can be modeled as networks or graphs. Clustering algorithms that help us to organize and understand these networks are usually referred to as, graph based clustering algorithms. Many algorithms exist in the literature for clustering network data. Evaluating the quality of these clustering algorithms is an important task addressed by different researchers. An important ingredient of evaluating these clustering techniques is the node-edge density of a cluster. We argue that evaluation methods based on density are heavily biased to networks having dense components, such as social networks, but are not well suited for data sets with other network topologies where the nodes are not densely connected. Example of such data sets are the transportation and Internet networks. We justify our hypothesis by presenting examples from real world data sets.
In [28] , we present a new metric to evaluate the quality of a clustering algorithm to overcome the limitations of existing cluster evaluation techniques. This new metric is based on the path length of the elements of a cluster and avoids judging the quality based on cluster density. We show the effectiveness of the proposed metric by comparing its results with other existing evaluation methods on artificially generated and real world data sets.

Figure 9Air Traffic Network Drawn using Hong Kong at the center and some airports directly connected to it.

In [33] , We design and study a multilevel modularity quality for clustered graphs, explicitly taking the nesting structure of clusters into account. Multilevel models appear crucial in the natural and social sciences. The multilevel modularity quality measure generalizes a modularity quality measure introduced by Mancoridis in the context of reverse software engineering. The measure we designed recursively traverses the hierarchy of clusters and computes a one variable polynomial encoding the intra and inter-cluster connectivity ratios appearing at all levels in a hierarchical clustering. The resulting polynomial reflects how the graph combines with the hierarchy of clusters and can be used to assess the quality of a hierarchical clustering. We discuss examples as proof-of-concept.


No hay comentarios:

Publicar un comentario