- Mining Spammers in Social Media: Techniques and Applications
Lecturers: Xia Hu and Huan Liu
The rapid growth of social media has attracted billions of users, promoting information sharing and communication to a new stage. With the growing popularity of social media, spamming has become rampant in the newly emerged platforms. Many (fake) accounts, known as spammers, are employed to overwhelm other users with unwanted information, and launch various attacks such as befriending victims and then surreptitiously grabbing their personal information, sneaking ads to generate sales, disseminating pornography, viruses, or phishing. Spamming is nowadays a serious issue in almost every kind of social networking services. Therefore, characterizing and detecting spammers in social media can signiﬁcantly improve the quality of user experience, and to promote the healthy use and development of a social networking system.
In the proposed tutorial, we provide a structured overview of recent developments of spammer detection in social media, and future research directions as well. In particular, we discuss basic concepts, fundamental techniques, methodologies and applications in characterizing and detecting spammers in social media. We start our discussion from content analysis which has shown its eﬀectiveness in traditional spammer detection. Then we introduce the distinct feature of social spammer detection, in which social network analysis methods are extensively used. As a new dimension of spammer detection study, we further discuss the problem of spam campaign in social media, including its concepts and applications.
Xia Hu Affiliation: Computer Science and Engineering, Arizona State University
Xia Hu is a research assistant of Computer Science and Engineering at Arizona State University. His research interests are in cybersecurity, text analytics in social media, social network analysis, machine learning, sentiment analysis, etc. As a result of his research work, he has published nearly 30 papers in several major academic venues, including WWW, SIGIR, WSDM, CIKM, IJCAI, AAAI, SDM, ACL, COLING, RecSys, etc. He was awarded Best Paper Shortlist in WSDM’13, University Graduate Fellowship, Machine Learning Summer School at Purdue Fellowship, SDM Doctoral Student Forum Fellowship, and various Student Travel Awards and Scholarships. Before joining ASU, he obtained his MSc and BSc from Beihang University. He also worked as a research intern at Microsoft Research and National University of Singapore. His research attracts wide range of external government and industry sponsors, including NSF, ONR, AFOSR, Yahoo!, and Microsoft. Updated information is available at: http://www.public.asu.edu/~xiahu.
Huan Liu Affiliation: Computer Science and Engineering, Arizona State University
Huan Liu is a professor of Computer Science and Engineering at Arizona State University. He obtained his Ph.D. in Computer Science at the University of Southern California and B.Eng. in Computer Science and Electrical Engineering at Shanghai JiaoTong University. Before he joined ASU, he worked at Telecom Australia Research Labs and was on the faculty at the National University of Singapore. He was recognized for excellence in teaching and research in Computer Science and Engineering at Arizona State University. His research interests are in data mining, machine learning, social computing, artiﬁcial intelligence, and investigating problems that arise in real-world, data intensive applications with high-dimensional data of disparate forms, such as social media. His well cited publications include books, book chapters, and encyclopedia entries as well as conference and journal papers. He serves on journal editorial boards and numerous conference program committees and is a founding organizer of the International Conference Series on Social Computing, Behavioral Cultural Modeling, and Prediction (http://sbp.asu.edu/). He is an IEEE Fellow and an ACM Distinguished Scientist. Updated information is available at: http://www.public.asu.edu/~huanliu .
- Social Media and Network Mining - Models, Systems and Applications
Lecturer: Ee-Peng Lim and Freddy Chong Tat Chua
Mining social media and social networks are now important research topics due to the increasing presence of users' online activities. The users' data generated through these activities represent a wealth of information for business platforms to tap on for marketing and decision making purposes. Similar to other big data research, conducting social media/network data analytics research for consumer insights and turning insights into useful business applications are challenging. In this tutorial, we shall describe some ongoing research eorts to address the above challenges. We shall cover three main topics, namely: (a) social media and network mining framework, (b) social media and network mining techniques, and (c) social media analytics systems and applications. We shall also showcase some working social analytics systems for Twitter data. The tutorial does not assume any data mining or machine learning background. It is tailored for both researchers and practitioners who would like to venture into the social media and network analytics area.
Ee-Peng Lim is the Professor of Information Systems in Singapore Management University (SMU). He received Ph.D. from the University of Minnesota, Minneapolis in 1994 and B.Sc. in Computer Science from National University of Singapore. His research interests include social network and web mining, information integration, and digital libraries. He has authored and co-authored more than 250 refereed papers in international journals and conferences. He is the co-director of Living Analytics Research Center (LARC) jointly established by Singapore Management University and Carnegie Mellon University to pursue research in experimentation driven data and decision analytics. He is currently an Associate Editor of the IEEE TKDE, ACM TOIS, IPM, SNAM, JWE, IEEE Intelligent Systems, International Journal of Digital Libraries (IJDL) and International Journal of Data Warehousing and Mining (IJDWM). Professor Lim is also active in organizing international conferences. He was the General Co-Chairs of SocInfo2011, WICOW2008, and PAKDD2006, Program Co-Chairs of SocInfo2013, WI2009, JCDL2004, WIDM2001-2003, ICADL2002 and ICADL2004. He serves on the Steering Committee of the International Conference on Asian Digital Libraries (ICADL), Pacic Asia Conference on Knowledge Discovery and Data Mining (PAKDD), and International Conference on Social Informatics. He was a member of the ACM Publications Board until December 2012.
Freddy Chua is a graduating PhD student in School of Information Systems, Singapore Management University. He received his Bachelor of Computer Science from School of Computing, National University of Singapore in 2007. He started the PhD programme in 2009 under the supervision of Professor Ee-Peng Lim. His main research interests are in modeling of social networks. In 2011, he visited Professor William W. Cohen at Carnegie Mellon University to work on Information Extraction in Twitter. In the summer of 2012, he worked as an intern at Hewlett Packard Research Labs, Social Computing Group with Sitaram Asur and Bernardo A. Huberman on Event Summarization in social media. He has published many papers in various top computer science international conferences and journal which include KDD, SDM, CIKM, ICDM, ICWSM and TKDE. He has also served as the program committee member of SocInfo2013, SAC2014 and as external reviewer of WSDM2014, WSDM2013, SDM2013 and ICDM2012.
- Research Issues and Challenges on Brain Informatics
Lecturer: Ning Zhong
Brain Informatics (BI) is a new interdisciplinary and multidisciplinary field that focuses on studying the mechanisms underlying the human information processing system. It brings together researchers and practitioners from diverse fields to explore the main research problems that lie in the interplay between the studies of human brain and the research of informatics, by using powerful equipment, including functional magnetic resonance imaging (fMRI), electroencephalogram (EEG), positron emission tomography (PET), and eye-tracking as well as various wearable, ubiquitous, active, micro and nano devices. The systematic BI methodology has resulted in the brain big data, including various raw brain data, data-related information, extracted data features, found domain knowledge related to human intelligence, and so forth. In this talk, I demonstrate a systematic approach to an integrated understanding of macroscopic and microscopic level working principles of the brain by means of experimental, computational, and cognitive neuroscience studies, as well as utilizing advanced Web intelligence centric information technologies. I discuss research issues and challenges from three aspects of Brain Informatics studies that deserve closer attention: systematic investigations for complex brain science problems, new information technologies for supporting systematic brain science studies, and Brain Informatics studies based on Web intelligence research needs. These three aspects offer different ways to study traditional cognitive science, neuroscience, mental health and artificial intelligence.
Ning Zhong received the Ph.D. degree from the University of Tokyo. He is currently head of Knowledge Information Systems Laboratory, and a professor in Department of Life Science and Informatics at Maebashi Institute of Technology, Japan. He is also director and an adjunct professor in the International WIC Institute (WICI), Beijing University of Technology. Prof. Zhong's present research interests include Web Intelligence (WI), Brain Informatics (BI), Data Mining, Granular Computing, and Intelligent Information Systems. In 2000 and 2004, Zhong and colleagues introduced WI and BI as new research directions, respectively. Currently, he is focusing on "WI meets BI" research with three aspects: (1) systematic investigations for complex brain science problems; (2) new information technologies for supporting systematic brain science studies; and (3) BI studies based on WI research needs. The synergy between WI and BI advances our ways of analyzing and understanding of data, information, knowledge, wisdom, as well as their interrelationships, organizations, and creation processes, to achieve human-level Web intelligence reality. In 2010, Zhong and colleagues extended such a vision to develop Wisdom Web of Things (W2T) as a holistic framework for computing and intelligence in the big data era.
- Managing the Quality of Crowdsourced Databases
Lecturers: Reynold Cheng
Crowdsourcing systems, such as the Amazon Mechanical Turk (AMT), CrowdFlowers, and Facebook, have attracted a lot of interest from the academia and industry. On these Internet-based platforms, a human worker performs jobs such as rating a physician, commenting a product, and translating a sentence. These tasks are often difficult for a computer but easier for a human. Due to the growth of the Internet, a large amount of human-provided (or “crowdsourced”) information has been obtained, which enables interesting applications like product recommendation and spam detection
In this proposal, we would like to discuss the management of the quality of crowdsourced databases. This is a very important issue, since the crowdsourced information may be incorrect – a worker may be careless or incapable of performing a task. This problem has to be overcome, or else applications depending on these data may make wrong decisions. We will first give an overview of crowdsourced databases. We then examine how existing techniques, collectively known as voting, are used to gauge the quality of these data.
While voting is simple, it may lead to wrong results, or ignore the fact that there can be split in opinions. This problem could be solved by the use of probabilistic databases, which store noisy and uncertain information. However, its use for this purpose has only been recently studied. In the second part of the tutorial, we will give an overview of probabilistic database, and how it is used to store, query, and measure the quality of crowdsourced data. We also point out possible directions and problems for handling crowdsourced information with probabilistic databases.
Dr. Reynold Cheng is an Associate Professor of the Department of Computer Science in the University of Hong Kong. He was an Assistant Professor in HKU in 2008-11. He received his BEng (Computer Engineering) in 1998, and MPhil (Computer Science and Information Systems) in 2000, from the Department of Computer Science in the University of Hong Kong. He then obtained his MSc and PhD from Department of Computer Science of Purdue University in 2003 and 2005 respectively. Dr. Cheng was an Assistant Professor in the Department of Computing of the Hong Kong Polytechnic University during 2005-08. He was a visiting scientist in the Institute of Parallel and Distributed Systems in the University of Stuttgart during the summer of 2006.
Dr. Cheng was granted an Outstanding Young Researcher Award 2011-12 by HKU. He was the recipient of the 2010 Research Output Prize in the Department of Computer Science of HKU. He also received the U21 Fellowship in 2011. He received the Performance Reward in years 2006 and 2007 awarded by the Hong Kong Polytechnic University. He is the Chair of the Department Research Postgraduate Committee, and is the Vice Chairperson of the ACM (Hong Kong Chapter). He is a member of the IEEE, the ACM, the Special Interest Group on Management of Data (ACM SIGMOD), the UPE (Upsilon Pi Epsilon Honor Society). He is also a guest editor for a special issue in TKDE. He is a keynote speaker in the First International Workshop on Quality of Context (QuaCon ’09). He received an Outstanding Service Award in the CIKM 2009 conference. He has served as PC members and reviewer for international conferences and journals including TODS, TKDE, TMC, VLDBJ, IS, DKE, KAIS, VLDB, ICDE, ICDM, DEXA and DASFAA.
- Non-IIDness Learning in Big Data
Lecturer: Longbing Cao
Most of existing data mining and machine learning algorithms are based on the IID assumption, which assumes objects are independent and identically distributed from each other. In the real world, in particular, big data, objects are either loosely or tightly coupled with each other. The interactions, or coupling relationships, between objects are ubiquitous, and spread at various levels, between objects, between attributes describing an object, and between attribute values within an attribute. On the other hand, the usual patterns identified by data mining are based on independent objects or items. In fact, due to the object coupling relationships, patterns are associated with each other in structural and/or semantic aspects. Pattern relationship analysis is often ignored.
This tutorial will explore the needs, challenges, opportunities of noniidness learning in analyzing complex object and pattern relations. Following a framework for noniid-based coupled object and pattern analysis, we will introduce several corresponding techniques: coupled object analysis to define and quantify the coupling relationships within and between objects and within and between attributes, combined pattern mining to identify a group of patterns coupled by certain relationships, coupled behavior analysis to analyse a group of actors‚Äô behaviors, and coupled ensemble clustering to cater for relations between clusterings. We will show how such new frameworks redefine the learning of complex data, behavior, relation, environment and pattern in clustering, pattern mining, classification and pattern relation learning. Further discussions will be about how knowledge engineering and semantic web can be connected with noniidness learning for complex but actionable object and pattern relation analysis.
Dr Longbing Cao is a professor of information technology at the Faculty of Engineering and IT, University of Technology Sydney, Australia; and the founding Director of the university‚Äôs Advanced Analytics Institute. Longbing was awarded PhD in Computing Sciences and PhD in Intelligent Sciences. Before joining UTS, Longbing had several years of research experience in Chinese Academy of Sciences, and working experiences in managing and leading industry and commercial projects in telecommunications, banking and publishing, as a manager or chief technical officer. Besides general interest on areas such as data mining, machine learning, artificial intelligence, multi-agent systems and software engineering, Longbing has been initiating and now leading research in particular topics including behavior informatics and computing, noniidness learning, pattern relation learning, agent mining, and complex intelligent systems. He was one of the few people started to talk about data sciences, and established the Advanced Analytics Institute dedicating to data science research, education and development. He established the first research degrees in analytics: Master of Analytics by Research and PhD Analytics.
He is very keen on bridging the gap between academics and industries, with tremendous efforts contributed to the enterprise innovation and applications of data mining and behavior informatics in the real world. In Australia, Longbing has solid links with broad-based major business, industry, vendor and government organizations, leading and managing many projects such as in social security, taxation, banking, telecommunication, capital market, insurance, public sector and airline business. During these exercises, Longbing fosters a strong research culture of conducting cutting-edge and applied research inspired by challenging critical business and social problems, forming a strong interaction and balance between high quality Research, high calibre analyst Education, and high impact Development (so-called the RED model).
- [CANCELED] Towards Building Analytical Models for Monitoring Large-scale System
Lecturer: Jian Cao
Due to unexpected reasons, we are sorry to inform that Tutorial 6 on May 16th will be canceled.
The core businesses of many companies rely on the support of its IT system, which includes large quantity of heterogeneous software and hardware resources. To maintain the performance and functionalities of whole IT system when failures inevitably happen from time to time has already become a bigger and bigger challenge for the IT administrators from these companies. A monitoring system helps IT administrators know ever-changing states of resources and can speed up the problem solving process in the case of a failure or an anomaly happens. Although a number of open source and commercial tools are available in the market each with varying capabilities, they can not satisfy the requirements like scalability, intelligent analysis and high level decision making support when the IT system becomes more and more complicated. As a way to solve parts of problems of system monitoring, analytical models should be developed. In this tutorial, the challenges for developing these models will be introduced. Then three important topics, i.e., an event based approach for information representation, processing and transferring, performance prediction algorithms and anomaly detection algorithms, will be discussed in detail.
Dr. Cao is a Professor with Department of Computer Science and Technology at Shanghai Jiaotong University (SJTU). He is the director of Morgan Stanley and SJTU Joint Research Center of Computing in Financial Service. He received his B.Sc. and Ph.D. from Nanjing University of Science and Technology (P.R. China) in 1997 and 2000 respectively. He was a Post-doctoral Research Fellow at Shanghai Jiaotong University during Jan 2000 to Dec 2001 and then joined SJTU. Dr. Cao's research interests include Network Computing and Service Computing. He has authored or co-authored over 100 journal and conference papers. Recently, he is cooperating with some companies to research on developing the next generation monitoring system, which has been deployed into the real applications.
Website: Website Link
- Feature Engineering in Health Informatics
Lecturer: Fei Wang
Dr. Fei Wang is now a Research Staff Member in healthcare analytics research group, IBM T. J. Watson Research Center. He got his M.S. and Ph. D. degrees from Department of Automation, Tsinghua University in 2008. After that, he spent one year in School of Computing and Information Science, Florida International University as a posdoc and another year in Department of Statistical Science, Cornell University as a postdoc. His research interests include semi-supervised learning, clustering, relational learning, optimization, social network analysis and healthcare data analytics. He has published over 100 papers on the leading conferences like SIGKDD, SIGIR, ICML, IJCAI, AAAI, SDM, ICDM. He also serves as a referee for many distinguished journals including IEEE TPAMI, IEEE TKDE, DMKD, ACM TKDD and program committee member for many international conferences including KDD, ICDM and SDM. For more details, one can refer to his personal homepage at https://sites.google.com/site/feiwang03/.
Dr. Fei Wang is very active in data mining in recent years. He has given tutorials of “Information and Knowledge Management with Matrices and Graphs” in CIKM2008, “Data Mining with Matrices and Graphs” in SDM2009 and ICDM2009, and “Distance Metric Learning in Data Mining” in SDM 2012, “Recent Advances in Applied Matrix Technologies” in SDM 2013, ”Applied Matrix Analytics: Recent Advances and Case Studies” in ICDM 2013, ”Large Scale Similarity Learning and Indexing” in CIKM 2013. A short version of this tutorial has been invited to present on University of Rochester Big Data Forum 2013 and Stanford Biomedical Informatics Research Center, Stanford University 2013.