Bridging Collaborative Filtering and Semi-Supervised Learning
Bridging Collaborative Filtering and Semi-Supervised Learning: A Neural Approach for POI recommendation Carl Yang (University of Illinois, Urbana Champaign) Lanxiao Bai (University of Illinois, Urbana Champaign) Chao Zhang (University of Illinois, Urbana Champaign) Quan Yuan (University of Illinois, Urbana Champaign) Jiawei Han (University of Illinois, Urbana Champaign) Recommender system is one of the most popular data mining topics that keep drawing extensive attention from both academia and industry. Among them, POI (point of interest) recommendation is extremely practical but challenging: it greatly benefits both users and businesses in real-world life, but it is hard due to data scarcity and various context. While a number of algorithms attempt to tackle the problem \wrt~specific data and problem settings, they often fail when the scenarios change. In this work, we propose to devise a general and principled SSL (semi-supervised learning) framework, to alleviate data scarcity via smoothing among neighboring users and POIs, and treat various context by regularizing user preference based on context graphs. To enable such a framework, we develop PACE (Preference And Context Embedding), a deep neural architecture that jointly learns the embeddings of users and POIs to predict both user preference over POIs and various context associated with users and POIs. We show that PACE successfully bridges CF (collaborative filtering) and SSL by generalizing the \textit{de facto} methods matrix factorization of CF and graph Laplacian regularization of SSL. Extensive experiments on two real location-based social network datasets demonstrate the effectiveness of PACE. More on http://www.kdd.org/kdd2017/
Mining Knowledge from Databases: An Information Network Analysis Approach
Most people consider a database is merely a data repository that supports data storage and retrieval. Actually, a database contains rich, inter-related, multi-typed data and information, forming one or a set of gigantic, interconnected, heterogeneous information networks. Much knowledge can be derived from such information networks if we systematically develop an effective and scalable database-oriented information network analysis technology. In this talk, we introduce database-oriented information network analysis methods and demonstrate how information networks can be used to improve data quality and consistency, facilitate data integration, and generate interesting knowledge. Moreover, we present interesting case studies on real datasets, including DBLP and Flickr, and show how interesting and organized knowledge can be generated from database-oriented information networks
Heterogeneous Network Embedding via Deep Architectures
Authors: Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C. Aggarwal, Thomas S. Huang Abstract: Data embedding is used in many machine learning applications to create low-dimensional feature representations, which preserves the structure of data points in their original space. In this paper, we examine the scenario of a heterogeneous network with nodes and content of various types. Such networks are notoriously difficult to mine because of the bewildering combination of heterogeneous contents and structures. The creation of a multidimensional embedding of such data opens the door to the use of a wide variety of off-the-shelf mining techniques for multidimensional data. Despite the importance of this problem, limited efforts have been made on embedding a network of scalable, dynamic and heterogeneous data. In such cases, both the content and linkage structure provide important cues for creating a unified feature representation of the underlying network. In this paper, we design a deep embedding algorithm for networked data. A highly nonlinear multi-layered embedding function is used to capture the complex interactions between the heterogeneous data in a network. Our goal is to create a multi-resolution deep embedding function, that reflects both the local and global network structures, and makes the resulting embedding useful for a variety of data mining tasks. In particular, we demonstrate that the rich content and linkage information in a heterogeneous network can be captured by such an approach, so that similarities among cross-modal data can be measured directly in a common embedding space. Once this goal has been achieved, a wide variety of data mining problems can be solved by applying off-the-shelf algorithms designed for handling vector representations. Our experiments on real-world network datasets show the effectiveness and scalability of the proposed algorithm as compared to the state-of-the-art embedding methods. ACM DL: http://dl.acm.org/citation.cfm?id=2783296 DOI: http://dx.doi.org/10.1145/2783258.2783296
Fake product review detection and removal using opinion mining
In our final year project, we have used VADER for sentiment analysis first, and then we have used our own classification method using basic neural network to first classify suspicious-clear-hazy reviews. Then we have annotated the review with the same along with the polarity of it for user information. Thus user knows if it is positive spam or negative spam.
TrioVecEvent: Embedding-Based Online Local Event Detection in Geo-Tagged Tweet Streams
TrioVecEvent: Embedding-Based Online Local Event Detection in Geo-Tagged Tweet Streams Chao Zhang (University of Illinois at Urbana-Champaign) Liyuan Liu (University of Illinois at Urbana-Champaign) Dongming Lei (University of Illinois at Urbana-Champaign) Quan Yuan (University of Illinois at Urbana-Champaign) Honglei Zhuang (University of Illinois at Urbana-Champaign) Tim Hanratty (U.S. Army Research Lab) Jiawei Han (University of Illinois at Urbana-Champaign) Detecting local events (e.g., protest, disaster) at their onsets is an important task for a wide spectrum of applications, ranging from disaster control to crime monitoring and place recommendation. Recent years have witnessed growing interest in leveraging geo-tagged tweet streams for online local event detection. Nevertheless, the accuracies of existing methods still remain unsatisfactory for building reliable local event detection systems. We propose TrioVecEvent, a method that leverages multimodal embeddings to achieve accurate online local event detection. The effectiveness of TrioVecEvent is underpinned by its two-step detection scheme. First, it ensures a high coverage of the underlying local events by dividing the tweets in the query window into coherent geo-topic clusters. To generate quality geo-topic clusters, we capture short-text semantics by learning multimodal embeddings of the location, time, and text, and then perform online clustering with a novel Bayesian mixture model. Second, TrioVecEvent considers the geo-topic clusters as candidate events and extracts a set of features for classifying the candidates. Leveraging the multimodal embeddings as background knowledge, we introduce discriminative features that can well characterize local events, which enables pinpointing true local events from the candidate pool with a small amount of training data. We have used crowdsourcing to evaluate TrioVecEvent, and found that it improves the detection precision of the state-of-the-art method from 36.8% to 80.4% and the pseudo recall from 48.3% to 61.2%. More on http://www.kdd.org/kdd2017/
Structured Learning from Heterogeneous Behavior for Social Identity Linkage
A Survey of Link-oriented Prediction in Signed Social Media Networks
Presented by Manoj Krishnaraj, Delft University of Technology Most social networks allow us to create positive relations and can be called unsigned social networks. The emergence of signed social networks like Slashdot adds negative links to distinguish friends with enemies creating richer relationship among the users. The literature available on the traditional unsigned networks needs to be redefined in order to be suitable for signed networks. Also, the implementation and meaning of the negative link vary depending on the social media which does not allow generalization of all the properties. The study of the signed social networks is on the rise and recent developments including negative links have shown improved prediction accuracy over an unsigned social network. In this survey, we review the latest literature for the prediction of a sign, link, and tie strength in the case of signed networks.
NetSpam a Network based Spam Detection Framework for Reviews in Online Social Media
