Lectures

There will be five sets of lectures by the following distinguished lecturers:

Opinion Mining and Sentiment Analysis

Bing Liu

Abstract

Sentiment analysis or opinion mining is the computational study of people's opinions, sentiments, evaluations, attitudes, and emotions expressed in written text. It is one of the most active research areas in natural language processing, and is also widely studied in data mining and Web mining. Its popularity is mainly due to two reasons. First, it is useful to every organization and individual because opinions are central to almost all human activities and are key influencers of our behaviors. Whenever we need to make a decision, we want to hear others' opinions. That is why applications of sentiment analysis have flourished in recent years. Second, it presents many challenging research problems, which had never been attempted before the year 2000. Part of the reason for this lack of study before was that there was little opinionated text in digital forms. It is thus no surprise that the inception and the rapid growth of the field coincide with those of the social media on the Web. This research has in fact spread outside of computer science to management sciences and social sciences due to its importance to business and society as a whole. In this lecture, I will introduce the key problems of sentiment analysis, main existing techniques, and their applications. I will also go beyond to discuss the problem of detecting fake and deceptive opinions in social media.

Biography

Bing Liu is a professor of Computer Science at University of Illinois at Chicago (UIC). He received his PhD in Artificial Intelligence from University of Edinburgh. Before joining UIC, he was with the National University of Singapore. His current research interests include sentiment analysis and opinion mining, opinion spam detection, and social media modeling. He has published extensively in top conferences and journals in these areas, and given many keynote and invited talks. His work has also received wide press coverage including a front page article of The New York Times. In 2012, he published a book titled "Sentiment Analysis and Opinion Mining" (Morgan and Claypool Publishers). Liu's earlier work was in the areas of data mining, Web mining, and machine learning, where he also published extensively in leading conferences and journals, including a textbook titled "Web Data Mining: Exploring Hyperlinks, Contents and Usage Data" (Springer). On professional services, Liu has served as program chairs of KDD, ICDM, CIKM, WSDM, SDM, and PAKDD, and as area/track chairs or senior PC members of many natural language processing, Web technology, data mining, and Artificial Intelligence conferences. Additional information about him can be found at http://www.cs.uic.edu/~liub/.

Mobility Data Analysis and Mining

Fosca Giannotti

Abstract

The wireless networks that surround us, as a by-product of their normal operations, allow for sensing and collecting massive repositories of spatio-temporal data, such as the call detail records from mobile phones and the GPS tracks from car navigation devices, which represent society-wide proxies of human mobile activities. These big mobility data provide an unprecedented powerful social microscope, which helps us understand human mobility, and discover the hidden patterns and profiles that characterize the trajectories we follow during our daily activity. The lecture will illustrate the basic methods of mobility data mining, designed to extract from the big mobility data the patterns of collective movement behavior, i.e., discover the subgroups of travelers characterized by a common purpose, and the profiles of individual movement activity, i.e., characterize the routine mobility of each traveller. We also present how mobility data mining can be combined with complex network analysis to address fascinating new questions, such as how to discover the geographical borders that emerge from the network of flows between any two zones in a territory, and how to measure to what extent the mobility patterns shape and impact the social networks we inhabit.

Biography

Fosca Giannotti is a director of research at the Information Science and Technology Institute of the National Research Council, ISTI-CNR, Pisa, Italy. Her current research interests include spatio-temporal data mining, privacy preserving data mining, social network analysis, data mining query languages. She has been the coordinator of various European research projects, including the FP6-IST project GeoPKDD. She is a member of steering committee of the FP7 European Coordination Action MODAP: Mobility, Data mining and Privacy. She is the author of more than one hundred publications and served in the scientific committee of the main conferences in the area of Databases and Data Mining. She chaired ECML/PKDD 2004, the European Conf. on Machine Learning and Knowledge Discovery in Data Bases, and ICDM 2008, the IEEE Int. Conf. on Data Mining. Fosca Giannotti co-leads the KDD Lab – Knowledge Discovery and Data Mining Laboratory2 – a joint research initiative of the University of Pisa and ISTI- CNR.

Inference of State Machines

Frits Vaandrager

Abstract

Once they have high-level models of the behavior of software components, engineers can construct better software in less time. A key problem in practice, however, is the construction of models for existing software components, for which no or only limited documentation is available. I will present an overview of recent work by my group - done in close collaboration with the Universities of Dortmund and Uppsala - in which we use machine learning to infer state diagram models of embedded controllers and network protocols fully automatically through observation and test, that is, through black box reverse engineering. Starting from the well-known L^* algorithm of Angluin, our aim is to develop algorithms for active learning of richer classes of (extended) finite state machines. Abstraction is the key when learning behavioral models of realistic systems. Hence, in practical applications, researchers manually define abstractions which, depending on the history, map a large set of concrete events to a small set of abstract events that can be handled by automata learning tools. Our work, which builds on earlier results from concurrency theory and the theory of abstraction interpretation, shows how such abstractions can be constructed fully automatically for a restricted class of extended finite state machines in which one can test for equality of data parameters, but no operations on data are allowed. Our approach uses counterexample-guided abstraction refinement (CEGAR): whenever the current abstraction is too coarse and induces nondeterministic behavior, the abstraction is refined automatically. Using the LearnLib tool from Dortmund in combination with Tomte, a prototype implementation of our CEGAR algorithm, we have succeeded to learn models of several realistic software components, such as the TCP and SIP protocols , the new biometric passport, banking cards, and printer controllers. Once we have learned a model of a software component, we may use model checking technology to analyze this model and model-based testing to automatically infer test suites. This allows us to check, for instance, whether no new faults have been introduced in a modified version of the component (regression testing), whether an alternative implementation by some other vendor agrees with a reference implementation, or whether some communication protocol is secure.

Biography

Frits Vaandrager is a full professor and principal investigator within the Institute for Computing and Information Sciences of the Radboud University Nijmegen. He received his Ph.D from the University of Amsterdam in 1990. After postdoc positions at MIT, Cambridge, USA, and at the École des Mines, Sophia Antipolis, France, he was group leader at the Centrum voor Wiskunde en Informatica (CWI) in Amsterdam until his appointment in Nijmegen in 1995. Vaandrager has a strong interest in the development and application of theory, (formal) methods and tools for the specification and analysis of computer based systems. In particular, he is interested in real-time embedded systems, distributed algorithms and protocols. Together with Lynch, Segala, and Kaynar he developed the (timed, probabilistic and hybrid) input/output automata formalisms, which are basic mathematical frameworks to support description and analysis of computing systems. He has been/is involved in more than 30 European and national research projects. Within many of these projects, formal verification and model checking technology has been applied to tackle practical problems from industrial partners. Currently, his main research interest is automata learning. Vaandrager is editor of the journals Information & Computation and Logical Methods in Computer Science, and has been PC member of leading conferences in the field such as CAV, TACAS, CONCUR, HSCC and RTSS.

Privacy Technologies: What Works, What Doesn't, and What Is To Be Done

Vitaly Shmatikov

(The University of Texas at Austin)

Abstract

These lectures will (1) survey the state of the art in privacy technologies, (2) analyze the mismatch between these technologies and users' privacy expectations, business requirements, and privacy laws and regulations, and (3) describe open research problems in domain-specific privacy protection, including privacy of health-care data and privacy in perceptual computing.

Biography

Vitaly Shmatikov is an associate professor of computer science at the University of Texas at Austin, where he leads the security and privacy group. Research papers published by Vitaly and his students received multiple awards, including the 2008 PET Award for Outstanding Research in Privacy-Enhancing Technologies, the 2012 AT&T Best Applied Security Paper Award, and the Best Student Paper Awards at both the 2012 IEEE Symposium on Security and Privacy (Oakland) and 2013 Network and Distributed System Security Symposium (NDSS). Before joining UT Austin in 2004, Vitaly worked at SRI International on formal methods for analyzing security protocols. He received his PhD in computer science and MS in management science and engineering from Stanford University.

Social network analysis: a crash mini-course

Dino Pedreschi

(KDD lab. Dept. Computer Science University of Pisa, Italy)

Abstract

Over the past decade there has been a growing public fascination with the complex “connectedness” of modern society. This connectedness is found in many contexts: in the rapid growth of the Internet and the Web, in the ease with which global communication now takes place, and in the ability of news and information as well as epidemics and financial crises to spread around the world with surprising speed and intensity. These are phenomena that involve networks and the aggregate behavior of groups of people; they are based on the links that connect us and the ways in which each of our decisions can have subtle consequences for the outcomes of everyone else. This crash mini-course is an introduction to the analysis of complex networks, with a special focus on the social network and its structure and function. Drawing on ideas from computing and information science, applied mathematics, economics and sociology, this lecture sketchily describes the emerging field of study that is growing at the interface of all these areas, addressing fundamental questions about how the social, economic, and technological worlds are connected.

Syllabus

• Big graph data and social, information, biological and technological networks

• How real networks differ from random: node degree and long tails, social distance and small worlds, clustering and triadic closure. Comparing real networks and random graphs.

• Strong and weak ties, community structure and long-range bridges. Diffusion and epidemics. The strength of weak ties for the diffusion of information. The strength of the strong ties for the diffusion of innovation.

• The correlation between the social network and human mobility.

Biography

Dino Pedreschi is a Professor of Computer Science at the University of Pisa, and a pioneering scientist in mobility data mining, social network mining and privacy-preserving data mining. He co-leads the Pisa KDD Lab - Knowledge Discovery and Data Mining Laboratory http://kdd.isti.cnr.it, a joint research initiative of the University of Pisa and the Information Science and Technology Institute of the Italian National Research Council, one of the earliest research lab centered on data mining. His research focus is on big data analytics and mining and their impact on society. He is a founder of the Business Informatics MSc program at Univ. Pisa, a course targeted at the education of interdisciplinary data scientists. Dino has been a visiting scientist at Barabasi Lab (Center for Complex Network Research) of Northeastern University, Boston, and earlier at the University of Texas at Austin, at CWI Amsterdam and at UCLA. In 2009, Dino received a Google Research Award for his research on privacy-preserving data mining.

12th Estonian Summer School on Computer and Systems Science

Lectures

Opinion Mining and Sentiment Analysis

Bing Liu

Mobility Data Analysis and Mining

Fosca Giannotti

Inference of State Machines

Frits Vaandrager

Privacy Technologies: What Works, What Doesn't, and What Is To Be Done

Vitaly Shmatikov

Social network analysis: a crash mini-course

Dino Pedreschi