Machine Learning for Text (Springer), Authored by Charu Aggarwal, April 2018. -- Comprehensive textbook on machine learning for Text.

PDF Download Link (Free for computers connected to subscribing institutions only)

Buy hard-cover or PDF (for general public- The PDF has embedded links and can be loaded on a kindle reader. The PDF version's equations read better on a kindle e-reader than the kindle edition from Amazon)

This book covers machine learning techniques from text using both bag-of-words and sequence-centric methods. The scope of coverage is vast, and it includes traditional information retrieval methods and also recent methods from neural networks and deep learning. The chapters of this book can be organized into three categories:

Classical machine learning methods: These chapters discuss the classical machine learning methods such as matrix factorization, topic modeling, dimensionality reduction, clustering, classification, linear models, and evaluation. All these techniques treat text as a bag of words. Contextual learning methods that combine different types of text and also combine text with heterogeneous data types are covered.

Classical information retrieval and search engines: Although this book is focussed on text mining, the importance of retrieval and ranking methods in mining applications is quite significant. Therefore, the book covers the key aspects of information retrieval, such as data structures, Web ranking, crawling, and search engine design. Importance is given to different types of information retrieval scoring models and learning-to-rank techniques.

Sequence-centric, deep learning, and linguistic methods for mining: While the bag-of-words representation can be useful for traditional applications like classification and clustering, more advanced applications like machine translation, image captioning, opinion mining, information extraction, and text segmentation require one to treat text as a sequence. These chapters discuss methods for sequence-centric mining methods such as deep learning techniques, word2vec, recurrent neural networks, LSTMs, maximum entropy Markov models, and Conditional Random Fields. Custom methods for applications like text summarization, opinion mining, and event detection are also discussed.

The book can be used as a textbook and it is numerous exercises. However, it is also designed to be useful to researchers and industrial practitioners. It therefore contains extensive bibliographic references for researchers, and the bibliographic section also contains software references for practitioners. Numerous examples and exercises have been provided.

The book is available in both hardcopy and in electronic form. The electronic version is available at this Springerlink pointer, which might allow you to download the book for free, depending on your institution's subscriptions. To attempt a free download, click from a computer directly connected to your institution network. To be eligible, your institution must subscribe to "e-book package english Computer Science" or "e-book package english (full collection)". If your institution is eligible, you will see a (free) `Download Book' button. Otherwise you will see a (paid) `Get Access' button. The Springer site uses the domain name of your computer to regulate access. Sometimes you may be able to download it from your library e-collection, even when it is not Web-accessible from your institution. Members of eligible (subscribing) institutions might also be able to buy a low-cost paperback edition ($25 MyCopy edition) from the same Web page at which the free book can be downloaded .

Recommender Systems: The Textbook (Springer), Authored by Charu Aggarwal, April 2016. -- Comprehensive textbook on recommender systems.

Book page with book description, solution manual, and other resources

PDF Download Link (Free for computers connected to subscribing institutions only)

Buy hard-cover or PDF (for general public)

Buy low-cost paperback edition (Instructions for computers connected to subscribing institutions only)

This book covers the topic of recommender systems comprehensively, starting with the fundamentals and then exploring the advanced topics. The chapters of this book can be organized into three categories:

Algorithms and evaluation: These chapters discuss the fundamental algorithms in recommender systems, including collaborative filtering methods, content-based methods, knowledge-based methods, ensemble-based methods, and evaluation.

Recommendations in specific domains and contexts: The context of a recommendation can be viewed as important side information that affects the recommendation goals. Different types of context such as temporal data, spatial data, social data, tagging data, and trustworthiness are explored.

Advanced topics and applications: Various robustness aspects of recommender systems, such as shilling systems, attack models, and their defenses are discussed. In addition, recent topics, such as multi-armed bandits, learning to rank, group systems, multi-criteria systems, and active learning systems, are discussed together with applications.

Although this book is primarily written as a textbook, it is recognized that a large portion of the audience will comprise industrial practitioners and researchers. Therefore, the book is also designed to be useful from an applied and reference point of view. Numerous examples and exercises have been provided.

The book is available in both hardcopy and in electronic form. The electronic version is available at this Springerlink pointer, which might allow you to download the book for free, depending on your institution's subscriptions. To attempt a free download, click from a computer directly connected to your institution network. To be eligible, your institution must subscribe to "e-book package english Computer Science" or "e-book package english (full collection)". If your institution is eligible, you will see a (free) `Download Book' button. Otherwise you will see a (paid) `Get Access' button. The Springer site uses the domain name of your computer to regulate access. Sometimes you may be able to download it from your library e-collection, even when it is not Web-accessible from your institution. Members of eligible (subscribing) institutions might also be able to buy a low-cost paperback edition ($25 MyCopy edition) from the same Web page at which the free book can be downloaded . Here is an screenshot and description of what the download/MyCopy Web page will look like, when you are accessing it from a computer connected to a subscribing institution. Interestingly, you can use these methods for virtually any Springer book.

Data Mining: The Textbook (Springer), Authored by Charu Aggarwal, April 2015. -- Comprehensive textbook on data mining.

Book page with book description, solution manual, and other resources

The emergence of data science as a discipline requires the development of a book that goes beyond the traditional focus of books on fundamental data mining problems. More emphasis needs to be placed on the advanced data types such as text, time series, discrete sequences, spatial data, graph data, and social networks. This comprehensive data mining book explores the different aspects of data mining, starting from the fundamentals, and subsequently explores the complex data types and their applications. Therefore, this book may be used for both introductory and advanced data mining courses. The chapters of this book fall into one of three categories:

The fundamental chapters: Data mining has four main problems, which correspond to clustering, classification, association pattern mining, and outlier analysis. These chapters comprehensively discuss a wide variety of methods for these problems.

Domain chapters: These chapters discuss the specific methods used for different domains of data such as text data, time-series data, sequence data, graph data, and spatial data.

Application chapters: These chapters study important applications such as stream mining, Web mining, ranking, recommendations, social networks, and privacy preservation. The domain chapters also have an applied flavor.

The book carefully balances mathematical details and intuition. It contains the necessary mathematical details for professors and researchers, but it is presented in a simple and intuitive style to improve accessibility for students and industrial practitioners. Numerous illustrations, examples, and exercises are included with an emphasis on semantically interpretable examples.

The book is available in both hardcopy and in electronic form. The electronic version is available at this Springerlink pointer, which might allow you to to download the book for free, depending on your institution's subscriptions. To attempt a free download, click from a computer directly connected to your institution network. To be eligible, your institution must subscribe to "e-book package english Computer Science" or "e-book package english (full collection)". If your institution is eligible, you will see a (free) `Download Book' button. Otherwise you will see a (paid) `Get Access' button. Members of eligible institutions might also be able to buy a low-cost hardcopy. See explanation of MyCopy program above for details.

Outlier Analysis (Springer) Authored by Charu Aggarwal, 2017. Comprehensive text book on outlier analysis, including examples and exercises for classroom teaching. Most of the previous books on outlier detection were written by statisticians for statisticians, with little or no coverage from the data mining and computer science perspective. This book is intended to fill that gap. Each chapter contains key research content on the topic, case studies, extensive bibliographic notes and the future direction of research in this field. Includes exercises as well.

Covers applications for credit card fraud, network intrusion detection, law enforcement etc.

Content is simplified so students and practitioners can also benefit from this book.

Chapters will typically cover one of three areas: methods and techniques commonly used in outlier analysis, such as linear methods, proximity-based methods, subspace methods, and supervised methods; data domains, such as, text, categorical, mixed-attribute, time-series, streaming, discrete sequence, spatial and network data; and key applications of these methods as applied to diverse domains such as credit card fraud detection, intrusion detection, medical diagnosis, earth science, web log analytics, and social network analysis. The second edition contains significant new material in one-class support vector machines, neural networks, matrix factorization, outlier ensembles, text outliers, and graph mining. It also has a solution manual to aid class room teaching.

The original (first edition) book had been selected among the Best publications of 2013 by ACM Computing Reviews.

Table of Contents and Sample Chapters

Sample chapter on outlier detection for high dimensional data

Springer Link (For subscribing institutions click from within your institution network. If your institution is eligible, you will see a (free) `Download Book' button. Otherwise, you will see a (paid) `Get Access' button.)

Frequent Pattern Mining (Springer), Ed. Charu Aggarwal and Jiawei Han, September 2014. -- Comprehensive survey driven book on frequent pattern mining with chapters contributed by prominent researchers in the field.

Table of Contents and Introductory Chapters

Springerlink for Electronic Version (For subscribing institutions click from within your institution network. If your institution is eligible, you should see a (free) `Download Book' button. Otherwise you will see a (paid) `Get Access' button. )

Data Classification: Algorithms and Applications (CRC Press), Ed. Charu Aggarwal, June 2014. -- Comprehensive survey driven book on data classification with chapters contributed by prominent researchers in the field.

Table of Contents and Introductory Chapter

CRC Netbase Link for Electronic Book

Data Clustering: Algorithms and Applications (CRC Press), Ed. Charu Aggarwal, Chandan Reddy, August 2013. -- Comprehensive survey driven book on data clustering with chapters contributed by prominent researchers in the field.

Table of Contents and Introductory Chapter

Electronic Book from CRC Press Netbase

Healthcare data analytics (CRC Press), Ed. Chandan Reddy, Charu Aggarwal, June 2015. -- Comprehensive survey driven book on healthcare with chapters contributed by prominent researchers in the field.

Table of Contents and Introductory Chapter

CRC Netbase Link for Electronic Book

Outlier Detection for Temporal Data. by M. Gupta, J. Gao, C. Aggarwal, J. Han, 2014. Morgan and Claypool Publishers.

Managing and Mining Sensor Data Ed. C. Aggarwal, 2013. PDF on Springerlink

Mining Text Data. Ed. C. Aggarwal, C. Zhai, 2012. PDF on Springerlink

Social Network Data Analytics. Ed. C. Aggarwal, 2011 Here is the Introductory Chapter PDF on Springerlink

Managing and Mining Graph Data (Springer) Ed. C. Aggarwal, H. Wang, 2010. PDF on Springerlink

Managing and Mining Uncertain Data (Springer) Ed. C. Aggarwal, 2009. PDF on Springerlink

Privacy-Preserving Data Mining: Models and Algorithms Ed. C. Aggarwal, P. Yu, 2008. Here is an Introductory Survey. PDF on Springerlink

Data Streams: Models and Algorithms Ed. C. Aggarwal, 2007. Survey Chapter on Synopsis Construction in Data Streams PDF on Springerlink

My podcast on data streams from IBM Research

J. Cadena, A. Vullikanti, and C. Aggarwal. On Dense Subgraphs in Signed Network Streams, ICDM Conference, 2016.

A. Khan and C. Aggarwal. Query-Friendly Compression of Graph Streams ASONAM Conference, 2016.

P. Zhao, C. Aggarwal, and G. He. Link Prediction in Graph Streams ICDE Conference, 2016.

C. Aggarwal, P. Zhao, and G. He. Edge Classification in Networks ICDE Conference, 2016.

R. Hu, C. Aggarwal, S. Ma, J. Huai. An Embedding Approach to Anomaly Detection. ICDE Conference, 2016.

Z. Wu, C. Aggarwal, and J. Sun. The Troll-Trust Model for Ranking in Signed Networks. WSDM Conference, 2016.

L. Duan, C. Aggarwal, S. Ma, R. Hu, and J. Huai. Scaling up Link Prediction with Ensembles. WSDM Conference, 2016.

J. Tang, C. Aggarwal, and H. Liu. Recommendations in Signed Social Networks. World Wide Web Conference, 2016.

S. Sathe, and C. Aggarwal. LODES: Local Density Meets Spectral Outlier Detection. SDM Conference, 2016.

J. Tang, C. Aggarwal, and H. Liu. Node Classification in Signed Social Networks. SDM Conference, 2016.

K. Subbian, C. Aggarwal, J. Srivasatava, and V. Kumar. Rare Class Detection in Networks SDM Conference, 2015.

M.-H. Tsai, C. Aggarwal, and T. Huang. Towards Classification of Social Streams SDM Conference, 2015.

J. Liu, C. Wang, J. Gao, Q. Gu, C. Aggarwal, L. Kaplan, and J. Han. GIN: A Clustering Model for Capturing Dual Heterogeneity in Networked Data. SDM Conference, 2015.

W. Feng, J. Han. J. Wang, C. Aggarwal, and J. Huang. StreamCube: Hierarchical Spatiotemporal Hashtag Clustering for Event Detection in the Twitter Stream ICDE Conference, 2015.

S. Chang, G. Qi, C. Aggarwal, M. Weng, J. Zhou, and T. Huang. Factorized Similarity Learning in Networks, ICDM Conference, 2014. ** (Best Student Paper Award) **

S. Chang, C. Aggarwal and T. Huang. Learning Local Semantic Distances with Limited Supervision, ICDM Conference, 2014.

C. Aggarwal and K. Subbian. Evolutionary Network Analysis: A Survey, ACM Computing Surveys, July 2014.

M. Tsai, C. Aggarwal, and T. Huang. Ranking in Heterogeneous Social Media WSDM Conference, 2014.

Y. Wu, C. Aggarwal, S. Ma, H. Wang. On Anamolous Hotspot Discovery in Graph Streams ICDM Conference, 2013.

K. Subbian, C. Aggarwal, J. Srivasatava. Content-centric Flow Mining for Influence Analysis in Social Streams CIKM Conference, 2013 (Influence Analysis in Social Streams)

Q. Gu, C. Aggarwal, J. Han. Selective Sampling on Graphs for Classification KDD Conference, 2013.

A. Khan, Y. Wu, C. Aggarwal, X. Yan. Fast Graph Search with Label Similarity VLDB Conference, 2013.

K. Subbian, C. Aggarwal, J. Srivasatava, P. Yu. Community Detection with Prior Knowledge SDM Conference, 2013 (Community Detection with prior knowledge).

G. Qi, C. Aggarwal, T. Huang. Link Prediction across Networks by Cross-Network Biased Sampling, ICDE Conference, 2013. (Link Prediction across Networks)

L. Liu, R. Jin, C. Aggarwal, Y. Shen. Reliable Clustering on Uncertain Graphs. ICDM Conference, 2012 (Reliable clustering on uncertain graphs)

C. Aggarwal, Y.Xie, P. Yu. On Dynamic Link Inference in Heterogeneous Networks, SDM Conference, 2012. (Link Inference in Heterogeneous Networks)

G. Qi, C. C. Aggarwal, T. Huang. Community Detection with Edge Content in Social Media Networks, ICDE Conferemce, 2012 (Community Detection with Edge Content)

P. Zhao, C. Aggarwal, M. Wang. gSketch: On Query Estimation in Graph Streams, PVLDB 5(3): 193-204 (2011) (Query Estimation in Graph Streams)

C. Aggarwal, K. Subbian. Event Detection in Social Streams, SDM Conference, 2012 (Event Detection in Social Streams)

C. Aggarwal, S. Lin, P. Yu. On Influential Node Discovery in Dynamic Social Networks, SDM Conference, 2012 (Dynamic Influence Analysis)

Y. Sun, J. Han, C. Aggarwal, N. Chawla. When will it happen?-- Relationship Prediction in Heterogeneous Information Networks, WSDM Conference, 2012 (Predicting the time at which a link will appear in a network)

G. Qi, C. Aggarwal, T. Huang. On Clustering Heterogeneous Social Media Objects with Outlier Links, WSDM Conference, 2012 (Clustering Social Media Objects in Noisy Networks)

Y. Sun, C. Aggarwal, J. Han. Relation Strength aware clustering of Heterogeneous Information Networks with Incomplete Attributes, VLDB Conference, 2012 (Relation-strength aware clustering of heterogeneous information networks)

C. Aggarwal, Y. Li, P. Yu. On the Hardness of Graph Anonymization. ICDM Conference, 2011 Presentation slides (Hardness of Graph Anonymization-- Re-identification Attacks are easy in graphs!)

G. Qi, C. Aggarwal, Q. Tian, H. Ji, T. Huang. Exploring Content and Context Links in Social Media: A Latent Space Method. IEEE Transactions on Pattern Recognition and Machine Intelligence, to appear. (Image Classification in Social Media with Content and Social Context Information)

M. Gupta, C. Aggarwal, J. Han. Finding top-k shortest path distance changes in an evolutionary network. SSTD Conference, 2011. (Finding evolution in distances in a network)

R. Jin, L. Liu, C. Aggarwal. Discovering highly reliable subgraphs in uncertain graphs. KDD Conference, 2011 (Highly reliable subgraph mining in uncertain graphs)

C. Aggarwal, A. Bar-Noy, S. Shamoun. On Sensor Selection in Linked Information Networks. DCOSS Conference, 2011 (Sensor Selection in Linked Information Networks)

M. Gupta, C. Aggarwal, J. Han, Y. Sun.
On Evolutionary Clustering and Analysis in Heterogeneous Bibliographic Networks. ASONAM Conference, 2011 ** (Best Paper Award) **

C. Aggarwal, A. Khan, X. Yan. On Flow Authority Discovery in Social Networks. Proceedings of the SDM Conference, 2011 Presentation slides (Finding Influential Nodes in Social Networks)

C. Aggarwal, N. Li. On Node Classification in Dynamic Content-based Networks. Proceedings of the SDM Conference, 2011 Presentation slides (Node Classification in Dynamic Networks with the use of both node text and links)

C. Aggarwal, Y. Xie, P. Yu. Towards Community Detection in Locally Heterogeneous Networks. Proceedings of the SDM Conference, 2011 Presentation slides (Finding Communities in a Locally Heterogeneous Network)

C. Aggarwal. On Classification of Graph Streams. Proceedings of the SDM Conference, 2011 Presentation slides (Classification of Graph Streams)

V. Lee, N. Ruan, R. Jin, C. Aggarwal. S Survey of Algorithms for Dense Subgraph Discovery, Managing and Mining Graph Data, Springer, 2010 (Dense subgraph mining survey)

C. Li, C. C. Aggarwal, J. Wang. On Anonymization of Multigraphs. Proceedings of the SDM Conference, 2011. (Anonymization of Multigraphs)

C. Aggarwal, Y. Zhao, P. Yu. Outlier Detection in Graph Streams. Proceedings of the ICDE Conference, 2011 Presentation slides (Outlier Detection in Graph Streams)

C. Aggarwal, H. Wang. On Dimensionality Reduction of Massive Graphs for Indexing and Retrieval. Proceedings of the ICDE Conference, 2011 Presentation slides (Indexing and Dimensionality Reduction of Massive Disk-resident Graphs)

C. Aggarwal, and T. Abdelzaher. Social Sensing. Book Chapter in Managing and Mining Sensor Data, Springer, 2013. (Extended version of Chapter below)

C. Aggarwal, and T. Abdelzaher. Integrating Sensors and Social Networks , Book Chapter in Social Network Data Analytics, Springer, 2011.

C. Aggarwal, Y. Li, P. Yu, R. Jin. On Dense Pattern Mining in Graph Streams. Proceedings of the VLDB Conference, 2010 (Dense Pattern Mining in Graphs Streams)

C. Aggarwal, Y. Zhao, P. Yu. On Clustering Graph Streams. Proceedings of the SDM Conference, 2010 (Clustering Graph Streams)

C. Aggarwal, Y. Xie, P. Yu. GConnect: A Connectivity Index for Massive Disk-Resident Graphs. Proceedings of the VLDB Conference, 2009 Presentation slides in PDF (A Connectivity Index)

M. J. Zaki, C. C. Aggarwal. XRules: An Effective Structural Classifier for XML Data. Proceedings of the KDD Conference, 2003. (Structural Classification of XML documents: can also be used for graphs).

C. C. Aggarwal, Na Ta, Jianyong Wang, Jianhua Feng, M. J. Zaki. XProj: A Framework for Structural Projected Clustering of XML Documents. Proceedings of the ACM KDD Conference, 2007 (Stuctural Clustering of XML Documents: can also be used for graphs)

C. C. Aggarwal, P. S. Yu. Online Analysis of Community Evolution in Data Streams. Proceedings of the ACM SIAM on Data Mining, 2005. (Community Detection and Evolution in Graph and Social Network Edge Streams)

A. Khan and C. Aggarwal. Query-Friendly Compression of Graph Streams ASONAM Conference, 2016.

K. Subbian, C. Aggarwal and J. Srivastava. Querying and Tracking Influencers in Social Streams Web Search and Data Mining Conference, 2016.

A. Haque, L. Khan, M. Barony, B. Thuraisingham, and C. Aggarwal. Efficient Handing of Concept Drift and Concept Evolution over Stream Data. IEEE ICDE Conference, 2016.

P. Zhao, C. Aggarwal, and G. He. Link Prediction in Graph Streams ICDE Conference, 2016.

M.-H. Tsai, C. Aggarwal, and T. Huang. Towards Classification of Social Streams SDM Conference, 2015.

W. Feng, J. Han. J. Wang, C. Aggarwal, and J. Huang. StreamCube: Hierarchical Spatiotemporal Hashtag Clustering for Event Detection in the Twitter Stream ICDE Conference, 2015.

Y. Wu, C. Aggarwal, S. Ma, H. Wang. On Anamolous Hotspot Discovery in Graph Streams ICDM Conference, 2013.

K. Subbian, C. Aggarwal, J. Srivasatava. Content-centric Flow Mining for Influence Analysis in Social Streams CIKM Conference, 2013 (Influence Analysis in Social Streams)

P. Zhao, C. Aggarwal, M. Wang. gSketch: On Query Estimation in Graph Streams, PVLDB 5(3): 193-204 (2011) (Query Estimation in Graph Streams)

C. Aggarwal, K. Subbian. Event Detection in Social Streams, SDM Conference, 2012 (Event Detection in Social Streams)

C. Aggarwal. On Classification of Graph Streams. Proceedings of the SDM Conference, 2011 Presentation slides (Classification of Graph Streams)

C. Aggarwal, Y. Zhao, P. Yu. Outlier Detection in Graph Streams. Proceedings of the ICDE Conference, 2011 Presentation slides (Outlier Detection in Graph Streams)

C. Aggarwal, Y. Xie, P. Yu. On Dynamic Data-Driven Sensor in Sensor Streams. KDD Conference, 2011 (Sensor Selection in Dynamic Scenarios)

C. Aggarwal, Y. Li, P. Yu, R. Jin. On Dense Pattern Mining in Graph Streams. Proceedings of the VLDB Conference, 2010 (Dense Pattern Mining in Graphs Streams)

C. Aggarwal, Y. Zhao, P. Yu. On Clustering Graph Streams. Proceedings of the SDM Conference, 2010 (Clustering Graph Streams)

C. Aggarwal, Y. Zhao, P. Yu. On Classification of High-Cardinality Data Streams. Proceedings of the SIAM Data Mining Conference, 2010 (Classification of high Cardinality Graph Streams)

C. Aggarwal. On Segment Based Stream Modeling and its Applications. Proceedings of the SDM Conference, 2009 (Method for Segment-based Stream Summarization and its Applications)

D. Thomas, R. Bordawekar, C. Aggarwal, P. Yu. On Efficient Query-Processing of Stream Counts on the Cell Processor. Proceedings of the ICDE Conference, 2009 (Method for Parallelizing Sketches on the Multi-Core Cell Processor)

C. C. Aggarwal. A Framework for Clustering Massive-Domain Data Streams. Proceedings of the ICDE Conference, 2009 (Method for Massive-Domain Clustering).

C. C. Aggarwal, P. S. Yu. LOCUST: An Online Analytical Processing Framework for High Dimensional Classification of Data Streams. Proceedings of the ICDE Conference, 2008 (Extending Lazy Learning to High Dimensional Stream Classification).

C. C. Aggarwal. On Biased Reservoir Sampling in the Presence of Stream Evolution. Proceedings of the VLDB Conference, 2006 PDF of Presentation Slides (Reservoir Sampling for Evolving Data Streams).

C. C. Aggarwal, P. S. Yu. A Survey of Synopsis Construction Algorithms in Data Streams. Data Streams: Models and Algorithms ed. C. Aggarwal, Springer. (Survey on Synopsis Construction Methods - Reservoir Sampling, wavelets, histograms, sketches)

C. C. Aggarwal, P. S. Yu. On String Classification in Data Streams. Proceedings of the ACM KDD Conference, 2007. (String Classification in Data Streams)

C. C. Aggarwal. A Framework for Classification and Segmentation of Massive Audio Data Streams. Proceedings of the ACM KDD Conference, 2007. (Micro-clustering for speaker recognition)

C. C. Aggarwal, P. Yu. A Framework for Clustering Massive Text and Categorical Data Streams. Proceedings of the ACM SIAM Conference on Data Mining, 2006 Here is the extended version in KAIS journal which combines clustering and outlier detection in text streams. (Text and Categorical Clustering and outlier detection in Data Streams).

C. C. Aggarwal. On Futuristic Query Processing in Data Streams. Proceedings of the EDBT Conference, 2006 (Query Processing of future stream behavior).

C. C. Aggarwal. On Abnormality Detection in Spuriously Populated Data Streams. Proceedings of the ACM SIAM Conference on Data Mining, 2005. (Detecting Abnormal Events in Noisy Data Streams)

C. C. Aggarwal, J. Han, J. Wang, P. Yu. A Framework for High Dimensional Projected Clustering of Data Streams. Proceedings of the VLDB Conference, 2004 (Projected Clustering of High Dimensional Data Streams).

C. C. Aggarwal, J. Han, J. Wang, P. Yu. On Demand Classification of Data Streams. Proceedings of the ACM KDD Conference, 2004 (Classifying a point from an evolving data stream with the most optimized model when you receive it.)

C. C. Aggarwal, J. Han, J. Wang, P. Yu. A Framework for Clustering Evolving Data Streams. Proceedings of the VLDB Conference, 2003 (An OLAP Framework for Clustering Data Streams.)

C. C. Aggarwal. A Framework for Diagnosing Changes in Evolving Data Streams. Proceedings of the ACM SIGMOD Conference, 2003. ( Change Detection in Data Streams with diagnosis and visualization capability).

C. C. Aggarwal. An Intuitive Framework for Understanding Changes in Evolving Data Streams. Proceedings of the ICDE Conference, 2002 ( Detecting Change in Data Streams (summary))

G. Qi, C. Aggarwal, and T. Huang. Online Community Detection in Social Sensing WSDM Conference, 2013.

C. Aggarwal, J. Han. A Survey of RFID Data Processing, Managing and Mining Sensor Data, Springer, 2013 (Survey on RFID Data Processing)

C. Aggarwal, N. Ashish, A. Sheth. The Internet of Things: A Survey from the Data-Centric Perspective, Managing and Mining Sensor Data, Springer, 2013 (Survey on Data Processing Issues in the Internet of Things)

C. Aggarwal, Y. Xie, P. Yu. On Dynamic Data-Driven Sensor in Sensor Streams. KDD Conference, 2011 (Sensor Selection in Dynamic Scenarios)

C. Aggarwal, and T. Abdelzaher. Social Sensing. Book Chapter in Managing and Mining Sensor Data, Springer, 2013. (Extended version of Chapter below)

C. Aggarwal, and T. Abdelzaher. Integrating Sensors and Social Networks , Book Chapter in Social Network Data Analytics, Springer, 2011.

C. Aggarwal, A. Bar-Noy, S. Shamoun. On Sensor Selection in Linked Information Networks. DCOSS Conference, 2011 (Sensor Selection in Linked Information Networks)

C. Aggarwal, Y. Li, P. Yu. On the Hardness of Graph Anonymization. ICDM Conference, 2011 (Hardness of Graph Anonymization-- Re-identification Attacks are easy in graphs!)

C. Li, C. C. Aggarwal, J. Wang. On Anonymization of Multigraphs. SDM Conference, 2011. (Anonymization of Multi-graphs)

C. C. Aggarwal. On Unifying Privacy and Uncertain Data Models. ICDE Conference, 2008.

C. C. Aggarwal, P. S. Yu. On Privacy-Preservation of Text and Sparse Binary Data with Sketches. SIAM Conference on Data Mining, 2007.

C. C. Aggarwal, P. S. Yu. On Anonymization of Strings SIAM Conference on Data Mining, 2007.

C. C. Aggarwal. On Randomization, Public Information, and the Curse of Dimensionality. ICDE Conference, 2007.

C. C. Aggarwal. On k-anonymity and the curse of dimensionality. VLDB Conference, 2005. Slides. In PDF format. (The line between quasi-identifiers and sensitive attributes is often unclear because of partial knowledge. An analysis in high dimensionality when a large fraction of attributes is included in the anonymization process- the curse is ubiquitous!)

D. Agrawal, C. C. Aggarwal. On the design and quantification of Privacy Preserving Data Mining. ACM PODS Conference, 2001. (Perturbation Approach to Privacy Preserving Data Mining in a General Environment).

C. C. Aggarwal, S. Parthasarathy. Mining Massively Incomplete Data Sets by Conceptual Reconstruction. ACM KDD Conference, 2001. ( Privacy Preserving Data Mining , when many values are hidden or incomplete.)

C. C. Aggarwal, J. Pei, B. Zhang On Privacy Preservation against Adversarial Data Mining. ACM KDD Conference, 2006. (Privacy Preserving Data Mining with a Data Mining Proficient Adversary)

C. C. Aggarwal, P. S. Yu. On Variable Constraints in Privacy Preserving Data Mining. Proceedings of the ACM SIAM Conference on Data Mining, 2005. (Privacy Preserving Data Mining with Personalized Levels of Anonymity)

C. C. Aggarwal, P. S. Yu. A Condensation Based Approach to Privacy Preserving Data Mining Proceedings of the EDBT Conference, 2004. (Condensation Approach to Privacy in a Trusted Server Environment.Y. Zhao, C. Aggarwal, P. Yu. On Wavelet Decomposition of Uncertain Time Series Data Sets, ACM CIKM Conference, 2010. (Wavelet Decomposition of Uncertain Data)

C. Aggarwal. On Multidimensional Sharpening of Uncertain Data. Proceedings of the SIAM Conference on Data Mining, 2010 (Sharpening Multidimensional Uncertain Data with PCA)

C. C. Aggarwal, Yan Li, Jianyong Wang, Jing Wang. Frequent Pattern Mining with Uncertain Data. ACM KDD Conference, 2009. Presentation Slides

C. C. Aggarwal, P. S. Yu. A Survey of Uncertain Data Algorithms and Applications. IEEE TKDE, May 2009.

C. C. Aggarwal, P. S. Yu. Outlier Detection with Uncertain Data. SIAM Data Mining Conference, 2008.

C. C. Aggarwal, P. S. Yu. On Indexing High Dimensional Data with Uncertainity. (11-page version) SIAM Data Mining Conference, 2008. (2-page poster version appears in ICDE Conference, 2008).

C. C. Aggarwal, P. S. Yu. A Framework for Clustering Uncertain Data Streams. ICDE Conference, 2008.

C. C. Aggarwal. On Density Based Transforms for Uncertain Data Mining. ICDE Conference, 2007.

S. Sathe and C. Aggarwal. Subspace Outlier Detection in Linear Time with Randomized Hashing. ICDM Conference, 2016. (Subspace histograms for outlier detection)

C. C. Aggarwal. Towards Local Supervised Dimensionality Reduction of High Dimensional Data. Proceedings of the ACM SIAM Conference on Data Mining, 2006 (Explores Local Supervised Dimensionality Reduction).

C. C. Aggarwal. Towards Systematic Design of Distance Functions for Data Mining Applications. Proceedings of the ACM KDD Conference, 2003. (Framework for tailoring distance functions to the summary characteristics of high dimensional data sets and user preferences.)

C. C. Aggarwal. Hierarchical Subspace Sampling: A Unified Framework for High Dimensional Data Reduction, Selectivity Estimation and Nearest Neighbor Search. Proceedings of the ACM SIGMOD Conference, 2002 (Subspace Sampling for Projected Clustering and Local Dimensionality Reduction )

C. C. Aggarwal, A Hinneburg, D. A. Keim. On the Surprising Behavior of Distance Metrics in High Dimensional Space. International Conference on Database Theory, (ICDT Conference), 2001. (Manhattan Metric is better than the Euclidean Metric. Fractional Metrics are even better.)

A. Hinneburg, C. C. Aggarwal, D. A. Keim. What is the nearest neighbor in high dimensional spaces? Proceedings of the VLDB Conference, 2000. (Discusses the roots of high dimensional sparsity.)

C. C. Aggarwal. A Human Computer Interactive Method for Projected Clustering. IEEE Transactions on Knowledge and Data Engineering. 16(4), pp 448--460, 2004. (Extended Version: IPCLUS: An Interactive Projected Clustering Algorithm).

C. C. Aggarwal, P. Yu. Finding Generalized Projected Clusters in High Dimensional Spaces. ACM SIGMOD Conference, 2000. (Finds non-axis parallel projected clusters. Also known as local dimensionality reduction .)

C. C. Aggarwal, C. Procopiuc, J. Wolf, P. Yu, J. Park. Fast Algorithms for Projected Clustering. ACM SIGMOD Conference, 1999. (Finds axis parallel projected clusters.)

C. C. Aggarwal, P. Yu. Outlier Detection for High Dimensional Data. ACM SIGMOD Conference, 2001. (Methods for projected outlier search.)

C. C. Aggarwal. Re-designing distance functions and distance based applications for high dimensional data. ACM SIGMOD Record, March 2001. (A summary paper on high dimensional data mining. )

C. C. Aggarwal. On the Effects of Dimensionality Reduction on High Dimensional Similarity Search. ACM PODS Conference, 2001. (Dimensionality Reduction Effects on Similarity Search.)

C. C. Aggarwal. On Point Sampling versus Space Sampling for Dimensionality Reduction. SIAM Conference on Data Mining, 2007.

C. C. Aggarwal. Towards Exporatory Test Instance Specific Algorithms for High Dimensional Classification. Proceedings of the ACM KDD Conference, 2005 (Test Instance specific visual exploration and classification to find diagnostic classification causality of individual test instances.)

C. C. Aggarwal. Towards Meaningful Nearest Neighbor Search by Human-Computer Interaction. Proceedings of the ICDE Conference, 2002 (Visual nearest neighbor search by projections.)

C. C. Aggarwal. Towards Effective and Interpretable Data Mining by Visual Interaction. Proceedings of the ACM KDD Explorations, January 2002 (Summary paper on visual data mining.

C. C. Aggarwal. A Human-Computer Cooperative System for Effective High Dimensional Clustering. Proceedings of the KDD Conference, 2001. (Visual methods for projected clustering- IPCLUS - an interactive projected clustering algorithm.)

C. C. Aggarwal. Collaborative Crawling: Mining User Experiences for Topical Resource Discovery. Proceedings of the KDD Conference, 2002. (Focussed Crawling by learning user access behavior.)

C. C. Aggarwal, F. Al-Garawi, P. Yu. Intelligent Crawling on the World Wide Web with Arbitrary Predicates. WWW Conference, 2001 (Focussed Crawling by learning linkage patterns.)

C. C. Aggarwal. On Learning Strategies for Topic-Specific Web Crawling. Next Generation Data Mining Applications. Edited by: Zurada and Kantardzic, Published by IEEE. ISBN 0-471-65605-4. (Book Chapter on Focussed Crawling by learning and other adaptive strategies.)

C. C. Aggarwal, J. L. Wolf, P. S. Yu. Caching on the Worldwide Web. IEEE Transactions on Knowledge and Data Engineering, Vol 11, No 1, January 1999. (Miscellaneous performance analysis paper related to the web but not data mining.)

C. C. Aggarwal, J. L. Wolf, K. L. Wu, P. Yu. Horting Hatches an Egg: A New Graph Theoretic Approach for Collaborative Filtering. ACM KDD Conference, 1999. (A Collaborative Filtering Paper.)

C. C. Aggarwal, P. S. Yu A System for Automated Personalization of Web Portals. Proceedings of the VLDB Conference, 2002. (Methods for Targeted advertising and Portal Personalization.)

C. C. Aggarwal, P. S. Yu Online Auctions: There can be only one. IEEE CEC Conference, 2009. (A mathematical analysis of the network effect in online auctions.) Slides with audio: click the audio button on each slide

Extended Version as IBM Research Report On the Network Effect in Web 2.0 Applications , IBM Research Report, RC24842, 2009.

R. C. Agarwal, C. C. Aggarwal, V. V. V. Prasad. A Tree Projection Algorithm For Generation of Frequent Itemsets. Journal on Parallel and Distributed Computing, (Special Issue on High Performance Data mining), 2001. (Tree Projection Algorithm)

R. C. Agarwal, C. C. Aggarwal, V. V. V. Prasad. Depth First Generation of Long Patterns. KDD Conference, 2000. (Depth First Version of Tree Projection)

C. C. Aggarwal. Towards Long Pattern Generation in Dense Databases. ACM SIGKDD Explorations, Volume 3, Issue 1, 2001. (Summary Paper on the Topic.)

C. C. Aggarwal, P. Yu. Online Generation of Association Rules. ICDE Conference, 1998. (OLAP framework for association rule mining.)

C. C. Aggarwal, P. Yu. A New Framework for Itemset Generation. ACM PODS Conference, 1998. (Mining interesting itemsets).

C. C. Aggarwal, C. Procopiuc, P. Yu. Finding Localized Associations in Market Basket Data. Proceedings of the IEEE TKDE Journal, March 2002 (Magnifying the association rule discovery process by segmenting the data in a way which is friendly to association discovery.)

C. C. Aggarwal, W. Lin, P. Yu. Searching by Corpus with Fingerprints. EDBT Conference, 2012 (Searching text collections when the query is a full document or set of documents rather than a set of keywords)

C. C. Aggarwal, P Zhao. Towards Graphical Models for Text Processing, KAIS Journal, to appear, 2013. Poster version in SIGIR (New paradigm for text representation with graphical model-- retains some word sequence information)

C. C. Aggarwal, Y. Zhao, P. Yu. On Text Clustering with Side Information. ICDE Conference, 2012 (Text Clustering with Side Information)

C. C. Aggarwal, C. Zhai. A survey of Text Clustering Algorithms. Mining Text Data, Ed. C. Aggarwal, C. Zhai, Springer 2012. (Survey on Text Clustering)

C. C. Aggarwal, C. Zhai. A survey of Text Classification Algorithms. Mining Text Data, Ed. C. Aggarwal, C. Zhai, Springer 2012. (Survey on Text Classification)

C. C. Aggarwal, P. Yu. On Effective Conceptual Indexing and Similarity Search in Text Data. IEEE International Conference on Data Mining (ICDM Conference), 2001. (New representation of text which provides effective and efficient similarity search).

G. Qi, C. Aggarwal, Q. Tian, H. Ji, T. Huang. Exploring Content and Context Links in Social Media: A Latent Space Method. IEEE Transactions on Pattern Recognition and Machine Intelligence, to appear. (Image Classification in Social Media with Content and Social Context Information)

G. Qi, C. C. Aggarwal, Y. Rui, Q. Tian, S. Chang, T. Huang. Towards Cross-Category Knowledge Propagation for Learning Visual Concepts. CVPR Conference, 2011. (Transfer Learning across different categories in multimedia data)

C. C. Aggarwal, S. Gates, P. Yu. On the Merits of Using Supervised Clustering for Building Categorization Systems. ACM KDD Conference, 1999. ( Partial Supervision for Text Classification ).

C. C. Aggarwal, P. Yu. A Framework for Clustering Massive Text and Categorical Data Streams. Proceedings of the ACM SIAM Conference on Data Mining, 2006 Here is the extended version in KAIS journal which combines clustering and outlier detection in text streams. (Text and Categorical Clustering of High Dimensional Data Streams).

G. Qi, C. C. Aggarwal, T. Huang. Towards Semantic Knowledge Propagation from Text Corpus to Web Images. WWW Conference, 2011. Presentation Slides (Transfer Learning from Text to Images with Linkage Hints)

G. Qi, C. C. Aggarwal, T. Huang. Transfer Learning of Distance Metrics by Cross-Domain Metric Sampling across Heterogeneous Domains, SDM Conference, 2012. Presentation Slides (Distance Metric Transfer Learning)

C. C. Aggarwal, P. Yu. The IGrid Index: Reversing the Dimensionality Curse for Similarity Indexing in High Dimensional Space. ACM KDD Conference, 2000. (Indexing high dimensional data by designing index-friendly distance functions.)

C. C. Aggarwal, P. Yu. On Effective Conceptual Indexing and Similarity Search in Text Data. IEEE International Conference on Data Mining (ICDM Conference), 2001. (New representation of text which provides effective and efficient similarity search).

C. C. Aggarwal, J. Wolf, P. Yu. A New Method for Similarity Indexing of Market Basket Data. ACM SIGMOD Conference, 1999. ( Indexing categorical data or Indexing Market Basket Data ).

C. C. Aggarwal, D. Agrawal. On Nearest Neighbor Indexing of Nonlinear Trajectories. Proceedings of the ACM PODS Conference, 2003. (First method for indexing mobile objects which are moving in a nonlinear trajectory).

M. Gupta, J. Gao, C. Aggarwal, J. Han. Temporal Outlier Detection. SDM Conference, 2013 Tutorial) A survey paper has also been submitted based on this work.

C. C. Aggarwal. Outlier Ensembles. ACM SIGKDD Explorations, December 2012. (A Position Paper on Outlier Ensembles) Presentation Slides

C. C. Aggarwal and S. Sathe. Theoretical foundations and algorithms for outlier ensembles. ACM SIGKDD Explorations, June 2015 (Theoretical foundations of outlier ensembles)

X. Liu, C. Aggarwal, Y.-F. Li, X. Kong, X. Sun, S. Sathe. Kernelized Matrix Factorization for Collaborative Filtering. SDM Conference, 2016.

C. C. Aggarwal, S. C. Gates and P. S. Yu.. On using partial supervision for text categorization. IEEE TKDE, 2014. (Partially supervised clustering for text categorization)

C. C. Aggarwal. Representation is Everything: Towards Efficient and Adaptable Similarity Measures for Biological Data. Proceedings of the ACM SIAM Conference on Data Mining, 2006 (Explores Alternatives to Alignment Based Similarity Measures).

C. C. Aggarwal, C. Chen, J. Han. On the Inverse Classification Problem and its Applications, ICDE Conference, 2006 (A Framework for changing attributes to match a desired class: IBM Research Report (Extended Version) )

C. Aggarwal. The Generalized Dimensionality Reduction Problem. Proceedings of the SIAM Conference on Data Mining, 2010 (Designing Dimensionality reduction to optimize for arbitrary data mining criteria rather than simple variance preservation)

C. C. Aggarwal. On Effective Classification of Strings with Wavelets. Proceedings of the KDD Conference, 2002. (String Classification by Wavelet Decomposition).

G. Qi, C. Aggarwal, J. Han, T. Huang. Mining Collective Intelligence in Diverse Groups WWW Conference, 2013.

Back to home page of Charu Aggarwal