Graphs, texts and Statistics
Since the early works of Moreno in the 30s, graph's analysis has become an intensive area of research which is no longer limitated to sociology.
Indeed, the presence of a large range of data on the form of graphs is available in our numeric world. Besides, impressive progress has been performed in the modelling and analysis of these data. In a recent work, Charles Bouveyron and Pierre Latouche propose a new statistical method, called STBM (Stochastic Topic Block Model), which allows to separate nodes of a network with textual edges by simultaneously providing the main topics of discussions. One can for example analyze text exchanges between persons in a social network or in the context of emails exchanges in a firm.
From a mathematical point of view, STBM generalized the Stochastic Block Model (SBM), dedicated to the nodes' clustering, and the Latent Dirichlet Allocation (LDA), dedicated to texts' analysis. STBM was used to analyse the emails of the firm Enron, which went through a mediatic bankrupt beginning of the 2000s. It identified that the network was consisting of 10 groups of persons and 5 topics for discussions.
Figure 1 gives a visualization of the groups (by the color of the nodes) together with the topics (by the color of the links).
Among awaited topics related to the firm activity, STBM enlighted topics 2 and 3, which happen to be two aspects of the Enron scandal, namely the relationship between Enron, the White House and the Talibans, and the implication of Enron in the bankrupt of the Edison firm.
Have a go at the Interactive Results for the Enron Email Network online.
Link to the detailed French version of the article
Reference :
C. Bouveyron, P. Latouche and R. Zreik, The Stochastic Topic Block Model for the Clustering of Networks with Textual Edges, Statistics and Computing, in press, 2017.
Contacts :
Charles Bouveyron | Mathématiques Appliquées à Paris 5 (MAP5) | UMR 8145 | CNRS & Université Paris Descartes.
Pierre Latouche | Laboratoire Statistique, Analyse, Modélisation Multidisciplinaire (SAMM) | EA 4543 | Université Paris 1 (Panthéon-Sorbonne).