Data Mining: The Textbook (Springer), Authored by Charu Aggarwal, May 2015. -- Comprehensive textbook on data mining.

Book page with book description, solution manual, and other resources

The emergence of data science as a discipline requires the development of a book that goes beyond the traditional focus of books on fundamental data mining problems. More emphasis needs to be placed on the advanced data types such as text, time series, discrete sequences, spatial data, graph data, and social networks. This comprehensive data mining book explores the different aspects of data mining, starting from the fundamentals, and subsequently explores the complex data types and their applications. Therefore, this book may be used for both introductory and advanced data mining courses. The chapters of this book fall into one of three categories:

The fundamental chapters: Data mining has four main problems, which correspond to clustering, classification, association pattern mining, and outlier analysis. These chapters comprehensively discuss a wide variety of methods for these problems.

Domain chapters: These chapters discuss the specific methods used for different domains of data such as text data, time-series data, sequence data, graph data, and spatial data.

Application chapters: These chapters study important applications such as stream mining, Web mining, ranking, recommendations, social networks, and privacy preservation. The domain chapters also have an applied flavor.

The book carefully balances mathematical details and intuition. It contains the necessary mathematical details for professors and researchers, but it is presented in a simple and intuitive style to improve accessibility for students and industrial practitioners. Numerous illustrations, examples, and exercises are included with an emphasis on semantically interpretable examples.

The book is available in both hardcopy and in electronic form. The electronic version is available at this Springerlink pointer, which might allow you to to download the book for free, depending on your institution's subscriptions. To attempt a free download, click from a computer directly connected to your institution network. To be eligible, your institution must subscribe to "e-book package english Computer Science" or "e-book package english (full collection)". If your institution is eligible, you will see a (free) `Download Book' button. Otherwise you will see a (paid) `Get Access' button. Members of eligible institutions might also be able to buy a low-cost hardcopy. See explanation of MyCopy program above for details.

Frequent Pattern Mining (Springer), Ed. Charu Aggarwal and Jiawei Han, September 2014. -- Comprehensive survey driven book on frequent pattern mining with chapters contributed by prominent researchers in the field.

Table of Contents and Introductory Chapters

Springerlink for Electronic Version (For subscribing institutions click from within your institution network. If your institution is eligible, you should see a (free) `Download Book' button. Otherwise you will see a (paid) `Get Access' button. )

Data Classification: Algorithms and Applications (CRC Press), Ed. Charu Aggarwal, June 2014. -- Comprehensive survey driven book on data classification with chapters contributed by prominent researchers in the field.

Table of Contents and Introductory Chapter

CRC Netbase Link for Electronic Book

Data Clustering: Algorithms and Applications (CRC Press), Ed. Charu Aggarwal, Chandan Reddy, August 2013. -- Comprehensive survey driven book on data clustering with chapters contributed by prominent researchers in the field.

Table of Contents and Introductory Chapter

Electronic Book from CRC Press Netbase

Outlier Analysis (Springer) Authored by Charu Aggarwal, January 2013. First comprehensive book on outlier analysis from a computer science/data mining perspective (rather than purely statistical perspective). Each chapter contains key research content on the topic, case studies, extensive bibliographic notes and the future direction of research in this field. Includes exercises as well.

Covers applications for credit card fraud, network intrusion detection, law enforcement and more

Content is simplified so students and practitioners can also benefit from this book

Chapters will typically cover one of three areas: methods and techniques commonly used in outlier analysis, such as linear methods, proximity-based methods, subspace methods, and supervised methods; data domains, such as, text, categorical, mixed-attribute, time-series, streaming, discrete sequence, spatial and network data; and key applications of these methods as applied to diverse domains such as credit card fraud detection, intrusion detection, medical diagnosis, earth science, web log analytics, and social network analysis are covered.

The book has been selected among the Best publications of 2013 by ACM Computing Reviews.

Table of Contents and Sample Chapters

Sample chapter on outlier detection for high dimensional data

Springer Link (For subscribing institutions click from within your institution network. If your institution is eligible, you should see a (free) `Download Book' button. Otherwise you will see a (paid) `Get Access' button. )

Managing and Mining Sensor Data (Springer) Ed. Charu Aggarwal, March 2013. -- Comprehensive survey driven book on sensor data management and mining with chapters contributed by prominent researchers in the field.

Table of Contents and Introductory Chapter

Springer Link (For subscribing institutions click from within your institution network. If your institution is eligible, you should see a (free) `Download Book' button. Otherwise you will see a (paid) `Get Access' button. )

Mining Text Data (Springer) Ed. Charu Aggarwal, ChengXiang Zhai, March 2012. -- Comprehensive survey driven book on text mining with chapters contributed by prominent researchers in the field.

Table of Contents and Sample Survey Chapters on Clustering and Classification

Springer Link (For subscribing institutions click from within your institution network. If your institution is eligible, you should see a (free) `Download Book' button. Otherwise you will see a (paid) `Get Access' button.)

SOCIAL NETWORK DATA ANALYTICS BOOK:

Social Network Data Analytics (Springer) Ed. Charu Aggarwal, March 2011. -- Comprehensive survey driven book on social networks with chapters contributed by prominent researchers in the field.

Springer Link (For subscribing institutions click from within your institution network. If your institution is eligible, you should see a (free) `Download Book' button. Otherwise you will see a (paid) `Get Access' button.)

GRAPH MANAGEMENT AND MINING BOOK: Managing and Mining Graph Data (Springer) Ed. Charu Aggarwal, Haixun Wang; February 2010.

-- Comprehensive survey driven book on graph data with chapters contributed by prominent researchers in the field.

Table of Contents and Introductory Survey Chapters

Springer Link (For subscribing institutions click from within your institution network. If your institution is eligible, you should see a (free) `Download Book' button. Otherwise you will see a (paid) `Get Access' button. )

UNCERTAIN DATA BOOK:

Managing and Mining Uncertain Data (Springer) Ed. Charu Aggarwal, February 2009. -- Comprehensive survey driven book on Uncertain Data with chapters contributed by prominent researchers in the field.

Table of Contents and introductory survey chapters

ACM Computing Reviews for the Book

PRIVACY-PRESERVING DATA MINING BOOK:

Privacy-Preserving Data Mining: Models and Algorithms (Springer) Ed. Charu Aggarwal, Philip S. Yu, July 2008. -- Comprehensive survey driven book on Privacy-Preserving Data Mining Research with chapters contributed by prominent researchers in the field.

ACM Computing Reviews for the book

DATA STREAM BOOK:

Data Streams: Models and Algorithms (Springer) Ed. Charu Aggarwal, January 2007. -- Comprehensive survey driven book on Data Stream Research with chapters contributed by prominent researchers in the field. Table of Contents

ACM Computing Reviews for the Book

Survey Chapter on Synopsis Construction in Data Streams

My podcast on data streams from IBM Research

K. Subbian, C. Aggarwal, J. Srivasatava, and V. Kumar, Rare Class Detection in Networks SDM Conference, 2015.

M.-H. Tsai, C. Aggarwal, and T. Huang. Towards Classification of Social Streams SDM Conference, 2015.

J. Liu, C. Wang, J. Gao, Q. Gu, C. Aggarwal, L. Kaplan, and J. Han. GIN: A Clustering Model for Capturing Dual Heterogeneity in Networked Data. SDM Conference, 2015.

W. Feng, J. Han. J. Wang, C. Aggarwal, and J. Huang. StreamCube: Hierarchical Spatiotemporal Hashtag Clustering for Event Detection in the Twitter Stream ICDE Conference, 2015.

S. Chang, G. Qi, C. Aggarwal, M. Weng, J. Zhou, and T. Huang. Factorized Similarity Learning in Networks, ICDM Conference, 2014. ** (Best Student Paper Award) **

S. Chang, C. Aggarwal and T. Huang. Learning Local Semantic Distances with Limited Supervision, ICDM Conference, 2014.

C. Aggarwal and K. Subbian. Evolutionary Network Analysis: A Survey, ACM Computing Surveys, July 2014.

M. Tsai, C. Aggarwal, and T. Huang. Ranking in Heterogeneous Social Media WSDM Conference, 2014.

Y. Wu, C. Aggarwal, S. Ma, H. Wang. On Anamolous Hotspot Discovery in Graph Streams ICDM Conference, 2013.

K. Subbian, C. Aggarwal, J. Srivasatava. Content-centric Flow Mining for Influence Analysis in Social Streams CIKM Conference, 2013 (Influence Analysis in Social Streams)

Q. Gu, C. Aggarwal, J. Han. Selective Sampling on Graphs for Classification KDD Conference, 2013.

A. Khan, Y. Wu, C. Aggarwal, X. Yan. Fast Graph Search with Label Similarity VLDB Conference, 2013.

K. Subbian, C. Aggarwal, J. Srivasatava, P. Yu. Community Detection with Prior Knowledge SDM Conference, 2013 (Community Detection with prior knowledge).

G. Qi, C. Aggarwal, T. Huang. Link Prediction across Networks by Cross-Network Biased Sampling, ICDE Conference, 2013. (Link Prediction across Networks)

L. Liu, R. Jin, C. Aggarwal, Y. Shen. Reliable Clustering on Uncertain Graphs. ICDM Conference, 2012 (Reliable clustering on uncertain graphs)

C. Aggarwal, Y.Xie, P. Yu. On Dynamic Link Inference in Heterogeneous Networks, SDM Conference, 2012. (Link Inference in Heterogeneous Networks)

G. Qi, C. C. Aggarwal, T. Huang. Community Detection with Edge Content in Social Media Networks, ICDE Conferemce, 2012 (Community Detection with Edge Content)

P. Zhao, C. Aggarwal, M. Wang. gSketch: On Query Estimation in Graph Streams, PVLDB 5(3): 193-204 (2011) (Query Estimation in Graph Streams)

C. Aggarwal, K. Subbian. Event Detection in Social Streams, SDM Conference, 2012 (Event Detection in Social Streams)

C. Aggarwal, S. Lin, P. Yu. On Influential Node Discovery in Dynamic Social Networks, SDM Conference, 2012 (Dynamic Influence Analysis)

Y. Sun, J. Han, C. Aggarwal, N. Chawla. When will it happen?-- Relationship Prediction in Heterogeneous Information Networks, WSDM Conference, 2012 (Predicting the time at which a link will appear in a network)

G. Qi, C. Aggarwal, T. Huang. On Clustering Heterogeneous Social Media Objects with Outlier Links, WSDM Conference, 2012 (Clustering Social Media Objects in Noisy Networks)

Y. Sun, C. Aggarwal, J. Han. Relation Strength aware clustering of Heterogeneous Information Networks with Incomplete Attributes, VLDB Conference, 2012 (Relation-strength aware clustering of heterogeneous information networks)

C. Aggarwal, Y. Li, P. Yu. On the Hardness of Graph Anonymization. ICDM Conference, 2011 Presentation slides (Hardness of Graph Anonymization-- Re-identification Attacks are easy in graphs!)

G. Qi, C. Aggarwal, Q. Tian, H. Ji, T. Huang. Exploring Content and Context Links in Social Media: A Latent Space Method. IEEE Transactions on Pattern Recognition and Machine Intelligence, to appear. (Image Classification in Social Media with Content and Social Context Information)

M. Gupta, C. Aggarwal, J. Han. Finding top-k shortest path distance changes in an evolutionary network. SSTD Conference, 2011. (Finding evolution in distances in a network)

R. Jin, L. Liu, C. Aggarwal. Discovering highly reliable subgraphs in uncertain graphs. KDD Conference, 2011 (Highly reliable subgraph mining in uncertain graphs)

C. Aggarwal, A. Bar-Noy, S. Shamoun. On Sensor Selection in Linked Information Networks. DCOSS Conference, 2011 (Sensor Selection in Linked Information Networks)

M. Gupta, C. Aggarwal, J. Han, Y. Sun.
On Evolutionary Clustering and Analysis in Heterogeneous Bibliographic Networks. ASONAM Conference, 2011 ** (Best Paper Award) **

C. Aggarwal, A. Khan, X. Yan. On Flow Authority Discovery in Social Networks. Proceedings of the SDM Conference, 2011 Presentation slides (Finding Influential Nodes in Social Networks)

C. Aggarwal, N. Li. On Node Classification in Dynamic Content-based Networks. Proceedings of the SDM Conference, 2011 Presentation slides (Node Classification in Dynamic Networks with the use of both node text and links)

C. Aggarwal, Y. Xie, P. Yu. Towards Community Detection in Locally Heterogeneous Networks. Proceedings of the SDM Conference, 2011 Presentation slides (Finding Communities in a Locally Heterogeneous Network)

C. Aggarwal. On Classification of Graph Streams. Proceedings of the SDM Conference, 2011 Presentation slides (Classification of Graph Streams)

V. Lee, N. Ruan, R. Jin, C. Aggarwal. S Survey of Algorithms for Dense Subgraph Discovery, Managing and Mining Graph Data, Springer, 2010 (Dense subgraph mining survey)

C. Li, C. C. Aggarwal, J. Wang. On Anonymization of Multigraphs. Proceedings of the SDM Conference, 2011. (Anonymization of Multigraphs)

C. Aggarwal, Y. Zhao, P. Yu. Outlier Detection in Graph Streams. Proceedings of the ICDE Conference, 2011 Presentation slides (Outlier Detection in Graph Streams)

C. Aggarwal, H. Wang. On Dimensionality Reduction of Massive Graphs for Indexing and Retrieval. Proceedings of the ICDE Conference, 2011 Presentation slides (Indexing and Dimensionality Reduction of Massive Disk-resident Graphs)

C. Aggarwal, and T. Abdelzaher. Social Sensing. Book Chapter in Managing and Mining Sensor Data, Springer, 2013. (Extended version of Chapter below)

C. Aggarwal, and T. Abdelzaher. Integrating Sensors and Social Networks , Book Chapter in Social Network Data Analytics, Springer, 2011.

C. Aggarwal, Y. Li, P. Yu, R. Jin. On Dense Pattern Mining in Graph Streams. Proceedings of the VLDB Conference, 2010 (Dense Pattern Mining in Graphs Streams)

C. Aggarwal, Y. Zhao, P. Yu. On Clustering Graph Streams. Proceedings of the SDM Conference, 2010 (Clustering Graph Streams)

C. Aggarwal, Y. Xie, P. Yu. GConnect: A Connectivity Index for Massive Disk-Resident Graphs. Proceedings of the VLDB Conference, 2009 Presentation slides in PDF (A Connectivity Index)

M. J. Zaki, C. C. Aggarwal. XRules: An Effective Structural Classifier for XML Data. Proceedings of the KDD Conference, 2003. (Structural Classification of XML documents: can also be used for graphs).

C. C. Aggarwal, Na Ta, Jianyong Wang, Jianhua Feng, M. J. Zaki. XProj: A Framework for Structural Projected Clustering of XML Documents. Proceedings of the ACM KDD Conference, 2007 (Stuctural Clustering of XML Documents: can also be used for graphs)

C. C. Aggarwal, P. S. Yu. Online Analysis of Community Evolution in Data Streams. Proceedings of the ACM SIAM on Data Mining, 2005. (Community Detection and Evolution in Graph and Social Network Edge Streams)

M.-H. Tsai, C. Aggarwal, and T. Huang. Towards Classification of Social Streams SDM Conference, 2015.

W. Feng, J. Han. J. Wang, C. Aggarwal, and J. Huang. StreamCube: Hierarchical Spatiotemporal Hashtag Clustering for Event Detection in the Twitter Stream ICDE Conference, 2015.

Y. Wu, C. Aggarwal, S. Ma, H. Wang. On Anamolous Hotspot Discovery in Graph Streams ICDM Conference, 2013.

K. Subbian, C. Aggarwal, J. Srivasatava. Content-centric Flow Mining for Influence Analysis in Social Streams CIKM Conference, 2013 (Influence Analysis in Social Streams)

P. Zhao, C. Aggarwal, M. Wang. gSketch: On Query Estimation in Graph Streams, PVLDB 5(3): 193-204 (2011) (Query Estimation in Graph Streams)

C. Aggarwal, K. Subbian. Event Detection in Social Streams, SDM Conference, 2012 (Event Detection in Social Streams)

C. Aggarwal. On Classification of Graph Streams. Proceedings of the SDM Conference, 2011 Presentation slides (Classification of Graph Streams)

C. Aggarwal, Y. Zhao, P. Yu. Outlier Detection in Graph Streams. Proceedings of the ICDE Conference, 2011 Presentation slides (Outlier Detection in Graph Streams)

C. Aggarwal, Y. Xie, P. Yu. On Dynamic Data-Driven Sensor in Sensor Streams. KDD Conference, 2011 (Sensor Selection in Dynamic Scenarios)

C. Aggarwal, Y. Li, P. Yu, R. Jin. On Dense Pattern Mining in Graph Streams. Proceedings of the VLDB Conference, 2010 (Dense Pattern Mining in Graphs Streams)

C. Aggarwal, Y. Zhao, P. Yu. On Clustering Graph Streams. Proceedings of the SDM Conference, 2010 (Clustering Graph Streams)

C. Aggarwal, Y. Zhao, P. Yu. On Classification of High-Cardinality Data Streams. Proceedings of the SIAM Data Mining Conference, 2010 (Classification of high Cardinality Graph Streams)

C. Aggarwal. On Segment Based Stream Modeling and its Applications. Proceedings of the SDM Conference, 2009 (Method for Segment-based Stream Summarization and its Applications)

D. Thomas, R. Bordawekar, C. Aggarwal, P. Yu. On Efficient Query-Processing of Stream Counts on the Cell Processor. Proceedings of the ICDE Conference, 2009 (Method for Parallelizing Sketches on the Multi-Core Cell Processor)

C. C. Aggarwal. A Framework for Clustering Massive-Domain Data Streams. Proceedings of the ICDE Conference, 2009 (Method for Massive-Domain Clustering).

C. C. Aggarwal, P. S. Yu. LOCUST: An Online Analytical Processing Framework for High Dimensional Classification of Data Streams. Proceedings of the ICDE Conference, 2008 (Extending Lazy Learning to High Dimensional Stream Classification).

C. C. Aggarwal. On Biased Reservoir Sampling in the Presence of Stream Evolution. Proceedings of the VLDB Conference, 2006 PDF of Presentation Slides (Reservoir Sampling for Evolving Data Streams).

C. C. Aggarwal, P. S. Yu. A Survey of Synopsis Construction Algorithms in Data Streams. Data Streams: Models and Algorithms ed. C. Aggarwal, Springer. (Survey on Synopsis Construction Methods - Reservoir Sampling, wavelets, histograms, sketches)

C. C. Aggarwal, P. S. Yu. On String Classification in Data Streams. Proceedings of the ACM KDD Conference, 2007. (String Classification in Data Streams)

C. C. Aggarwal. A Framework for Classification and Segmentation of Massive Audio Data Streams. Proceedings of the ACM KDD Conference, 2007. (Micro-clustering for speaker recognition)

C. C. Aggarwal, P. Yu. A Framework for Clustering Massive Text and Categorical Data Streams. Proceedings of the ACM SIAM Conference on Data Mining, 2006 Here is the extended version in KAIS journal which combines clustering and outlier detection in text streams. (Text and Categorical Clustering and outlier detection in Data Streams).

C. C. Aggarwal. On Futuristic Query Processing in Data Streams. Proceedings of the EDBT Conference, 2006 (Query Processing of future stream behavior).

C. C. Aggarwal. On Abnormality Detection in Spuriously Populated Data Streams. Proceedings of the ACM SIAM Conference on Data Mining, 2005. (Detecting Abnormal Events in Noisy Data Streams)

C. C. Aggarwal, J. Han, J. Wang, P. Yu. A Framework for High Dimensional Projected Clustering of Data Streams. Proceedings of the VLDB Conference, 2004 (Projected Clustering of High Dimensional Data Streams).

C. C. Aggarwal, J. Han, J. Wang, P. Yu. On Demand Classification of Data Streams. Proceedings of the ACM KDD Conference, 2004 (Classifying a point from an evolving data stream with the most optimized model when you receive it.)

C. C. Aggarwal, J. Han, J. Wang, P. Yu. A Framework for Clustering Evolving Data Streams. Proceedings of the VLDB Conference, 2003 (An OLAP Framework for Clustering Data Streams.)

C. C. Aggarwal. A Framework for Diagnosing Changes in Evolving Data Streams. Proceedings of the ACM SIGMOD Conference, 2003. ( Change Detection in Data Streams with diagnosis and visualization capability).

C. C. Aggarwal. An Intuitive Framework for Understanding Changes in Evolving Data Streams. Proceedings of the ICDE Conference, 2002 ( Detecting Change in Data Streams (summary))

G. Qi, C. Aggarwal, and T. Huang. Online Community Detection in Social Sensing WSDM Conference, 2013.

C. Aggarwal, J. Han. A Survey of RFID Data Processing, Managing and Mining Sensor Data, Springer, 2013 (Survey on RFID Data Processing)

C. Aggarwal, N. Ashish, A. Sheth. The Internet of Things: A Survey from the Data-Centric Perspective, Managing and Mining Sensor Data, Springer, 2013 (Survey on Data Processing Issues in the Internet of Things)

C. Aggarwal, Y. Xie, P. Yu. On Dynamic Data-Driven Sensor in Sensor Streams. KDD Conference, 2011 (Sensor Selection in Dynamic Scenarios)

C. Aggarwal, and T. Abdelzaher. Social Sensing. Book Chapter in Managing and Mining Sensor Data, Springer, 2013. (Extended version of Chapter below)

C. Aggarwal, and T. Abdelzaher. Integrating Sensors and Social Networks , Book Chapter in Social Network Data Analytics, Springer, 2011.

C. Aggarwal, A. Bar-Noy, S. Shamoun. On Sensor Selection in Linked Information Networks. DCOSS Conference, 2011 (Sensor Selection in Linked Information Networks)

C. Aggarwal, Y. Li, P. Yu. On the Hardness of Graph Anonymization. ICDM Conference, 2011 (Hardness of Graph Anonymization-- Re-identification Attacks are easy in graphs!)

C. Li, C. C. Aggarwal, J. Wang. On Anonymization of Multigraphs. SDM Conference, 2011. (Anonymization of Multi-graphs)

C. C. Aggarwal. On Unifying Privacy and Uncertain Data Models. ICDE Conference, 2008.

C. C. Aggarwal, P. S. Yu. On Privacy-Preservation of Text and Sparse Binary Data with Sketches. SIAM Conference on Data Mining, 2007.

C. C. Aggarwal, P. S. Yu. On Anonymization of Strings SIAM Conference on Data Mining, 2007.

C. C. Aggarwal. On Randomization, Public Information, and the Curse of Dimensionality. ICDE Conference, 2007.

C. C. Aggarwal. On k-anonymity and the curse of dimensionality. VLDB Conference, 2005. Slides. In PDF format. (The line between quasi-identifiers and sensitive attributes is often unclear because of partial knowledge. An analysis in high dimensionality when a large fraction of attributes is included in the anonymization process- the curse is ubiquitous!)

D. Agrawal, C. C. Aggarwal. On the design and quantification of Privacy Preserving Data Mining. ACM PODS Conference, 2001. (Perturbation Approach to Privacy Preserving Data Mining in a General Environment).

C. C. Aggarwal, S. Parthasarathy. Mining Massively Incomplete Data Sets by Conceptual Reconstruction. ACM KDD Conference, 2001. ( Privacy Preserving Data Mining , when many values are hidden or incomplete.)

C. C. Aggarwal, J. Pei, B. Zhang On Privacy Preservation against Adversarial Data Mining. ACM KDD Conference, 2006. (Privacy Preserving Data Mining with a Data Mining Proficient Adversary)

C. C. Aggarwal, P. S. Yu. On Variable Constraints in Privacy Preserving Data Mining. Proceedings of the ACM SIAM Conference on Data Mining, 2005. (Privacy Preserving Data Mining with Personalized Levels of Anonymity)

C. C. Aggarwal, P. S. Yu. A Condensation Based Approach to Privacy Preserving Data Mining Proceedings of the EDBT Conference, 2004. (Condensation Approach to Privacy in a Trusted Server Environment.Y. Zhao, C. Aggarwal, P. Yu. On Wavelet Decomposition of Uncertain Time Series Data Sets, ACM CIKM Conference, 2010. (Wavelet Decomposition of Uncertain Data)

C. Aggarwal. On Multidimensional Sharpening of Uncertain Data. Proceedings of the SIAM Conference on Data Mining, 2010 (Sharpening Multidimensional Uncertain Data with PCA)

C. C. Aggarwal, Yan Li, Jianyong Wang, Jing Wang. Frequent Pattern Mining with Uncertain Data. ACM KDD Conference, 2009. Presentation Slides

C. C. Aggarwal, P. S. Yu. A Survey of Uncertain Data Algorithms and Applications. IEEE TKDE, May 2009.

C. C. Aggarwal, P. S. Yu. Outlier Detection with Uncertain Data. SIAM Data Mining Conference, 2008.

C. C. Aggarwal, P. S. Yu. On Indexing High Dimensional Data with Uncertainity. (11-page version) SIAM Data Mining Conference, 2008. (2-page poster version appears in ICDE Conference, 2008).

C. C. Aggarwal, P. S. Yu. A Framework for Clustering Uncertain Data Streams. ICDE Conference, 2008.

C. C. Aggarwal. On Density Based Transforms for Uncertain Data Mining. ICDE Conference, 2007.

C. C. Aggarwal. Towards Local Supervised Dimensionality Reduction of High Dimensional Data. Proceedings of the ACM SIAM Conference on Data Mining, 2006 (Explores Local Supervised Dimensionality Reduction).

C. C. Aggarwal. Towards Systematic Design of Distance Functions for Data Mining Applications. Proceedings of the ACM KDD Conference, 2003. (Framework for tailoring distance functions to the summary characteristics of high dimensional data sets and user preferences.)

C. C. Aggarwal. Hierarchical Subspace Sampling: A Unified Framework for High Dimensional Data Reduction, Selectivity Estimation and Nearest Neighbor Search. Proceedings of the ACM SIGMOD Conference, 2002 (Subspace Sampling for Projected Clustering and Local Dimensionality Reduction )

C. C. Aggarwal, A Hinneburg, D. A. Keim. On the Surprising Behavior of Distance Metrics in High Dimensional Space. International Conference on Database Theory, (ICDT Conference), 2001. (Manhattan Metric is better than the Euclidean Metric. Fractional Metrics are even better.)

A. Hinneburg, C. C. Aggarwal, D. A. Keim. What is the nearest neighbor in high dimensional spaces? Proceedings of the VLDB Conference, 2000. (Discusses the roots of high dimensional sparsity.)

C. C. Aggarwal. A Human Computer Interactive Method for Projected Clustering. IEEE Transactions on Knowledge and Data Engineering. 16(4), pp 448--460, 2004. (Extended Version: IPCLUS: An Interactive Projected Clustering Algorithm).

C. C. Aggarwal, P. Yu. Finding Generalized Projected Clusters in High Dimensional Spaces. ACM SIGMOD Conference, 2000. (Finds non-axis parallel projected clusters. Also known as local dimensionality reduction .)

C. C. Aggarwal, C. Procopiuc, J. Wolf, P. Yu, J. Park. Fast Algorithms for Projected Clustering. ACM SIGMOD Conference, 1999. (Finds axis parallel projected clusters.)

C. C. Aggarwal, P. Yu. Outlier Detection for High Dimensional Data. ACM SIGMOD Conference, 2001. (Methods for projected outlier search.)

C. C. Aggarwal. Re-designing distance functions and distance based applications for high dimensional data. ACM SIGMOD Record, March 2001. (A summary paper on high dimensional data mining. )

C. C. Aggarwal. On the Effects of Dimensionality Reduction on High Dimensional Similarity Search. ACM PODS Conference, 2001. (Dimensionality Reduction Effects on Similarity Search.)

C. C. Aggarwal. On Point Sampling versus Space Sampling for Dimensionality Reduction. SIAM Conference on Data Mining, 2007.

C. C. Aggarwal. Towards Exporatory Test Instance Specific Algorithms for High Dimensional Classification. Proceedings of the ACM KDD Conference, 2005 (Test Instance specific visual exploration and classification to find diagnostic classification causality of individual test instances.)

C. C. Aggarwal. Towards Meaningful Nearest Neighbor Search by Human-Computer Interaction. Proceedings of the ICDE Conference, 2002 (Visual nearest neighbor search by projections.)

C. C. Aggarwal. Towards Effective and Interpretable Data Mining by Visual Interaction. Proceedings of the ACM KDD Explorations, January 2002 (Summary paper on visual data mining.

C. C. Aggarwal. A Human-Computer Cooperative System for Effective High Dimensional Clustering. Proceedings of the KDD Conference, 2001. (Visual methods for projected clustering- IPCLUS - an interactive projected clustering algorithm.)

C. C. Aggarwal. Collaborative Crawling: Mining User Experiences for Topical Resource Discovery. Proceedings of the KDD Conference, 2002. (Focussed Crawling by learning user access behavior.)

C. C. Aggarwal, F. Al-Garawi, P. Yu. Intelligent Crawling on the World Wide Web with Arbitrary Predicates. WWW Conference, 2001 (Focussed Crawling by learning linkage patterns.)

C. C. Aggarwal. On Learning Strategies for Topic-Specific Web Crawling. Next Generation Data Mining Applications. Edited by: Zurada and Kantardzic, Published by IEEE. ISBN 0-471-65605-4. (Book Chapter on Focussed Crawling by learning and other adaptive strategies.)

C. C. Aggarwal, J. L. Wolf, P. S. Yu. Caching on the Worldwide Web. IEEE Transactions on Knowledge and Data Engineering, Vol 11, No 1, January 1999. (Miscellaneous performance analysis paper related to the web but not data mining.)

C. C. Aggarwal, J. L. Wolf, K. L. Wu, P. Yu. Horting Hatches an Egg: A New Graph Theoretic Approach for Collaborative Filtering. ACM KDD Conference, 1999. (A Collaborative Filtering Paper.)

C. C. Aggarwal, P. S. Yu A System for Automated Personalization of Web Portals. Proceedings of the VLDB Conference, 2002. (Methods for Targeted advertising and Portal Personalization.)

C. C. Aggarwal, P. S. Yu Online Auctions: There can be only one. IEEE CEC Conference, 2009. (A mathematical analysis of the network effect in online auctions.) Slides with audio: click the audio button on each slide

Extended Version as IBM Research Report On the Network Effect in Web 2.0 Applications , IBM Research Report, RC24842, 2009.

R. C. Agarwal, C. C. Aggarwal, V. V. V. Prasad. A Tree Projection Algorithm For Generation of Frequent Itemsets. Journal on Parallel and Distributed Computing, (Special Issue on High Performance Data mining), 2001. (Tree Projection Algorithm)

R. C. Agarwal, C. C. Aggarwal, V. V. V. Prasad. Depth First Generation of Long Patterns. KDD Conference, 2000. (Depth First Version of Tree Projection)

C. C. Aggarwal. Towards Long Pattern Generation in Dense Databases. ACM SIGKDD Explorations, Volume 3, Issue 1, 2001. (Summary Paper on the Topic.)

C. C. Aggarwal, P. Yu. Online Generation of Association Rules. ICDE Conference, 1998. (OLAP framework for association rule mining.)

C. C. Aggarwal, P. Yu. A New Framework for Itemset Generation. ACM PODS Conference, 1998. (Mining interesting itemsets).

C. C. Aggarwal, C. Procopiuc, P. Yu. Finding Localized Associations in Market Basket Data. Proceedings of the IEEE TKDE Journal, March 2002 (Magnifying the association rule discovery process by segmenting the data in a way which is friendly to association discovery.)

C. C. Aggarwal, W. Lin, P. Yu. Searching by Corpus with Fingerprints. EDBT Conference, 2012 (Searching text collections when the query is a full document or set of documents rather than a set of keywords)

C. C. Aggarwal, P Zhao. Towards Graphical Models for Text Processing, KAIS Journal, to appear, 2013. Poster version in SIGIR (New paradigm for text representation with graphical model-- retains some word sequence information)

C. C. Aggarwal, Y. Zhao, P. Yu. On Text Clustering with Side Information. ICDE Conference, 2012 (Text Clustering with Side Information)

C. C. Aggarwal, C. Zhai. A survey of Text Clustering Algorithms. Mining Text Data, Ed. C. Aggarwal, C. Zhai, Springer 2012. (Survey on Text Clustering)

C. C. Aggarwal, C. Zhai. A survey of Text Classification Algorithms. Mining Text Data, Ed. C. Aggarwal, C. Zhai, Springer 2012. (Survey on Text Classification)

C. C. Aggarwal, P. Yu. On Effective Conceptual Indexing and Similarity Search in Text Data. IEEE International Conference on Data Mining (ICDM Conference), 2001. (New representation of text which provides effective and efficient similarity search).

G. Qi, C. Aggarwal, Q. Tian, H. Ji, T. Huang. Exploring Content and Context Links in Social Media: A Latent Space Method. IEEE Transactions on Pattern Recognition and Machine Intelligence, to appear. (Image Classification in Social Media with Content and Social Context Information)

G. Qi, C. C. Aggarwal, Y. Rui, Q. Tian, S. Chang, T. Huang. Towards Cross-Category Knowledge Propagation for Learning Visual Concepts. CVPR Conference, 2011. (Transfer Learning across different categories in multimedia data)

C. C. Aggarwal, S. Gates, P. Yu. On the Merits of Using Supervised Clustering for Building Categorization Systems. ACM KDD Conference, 1999. ( Partial Supervision for Text Classification ).

C. C. Aggarwal, P. Yu. A Framework for Clustering Massive Text and Categorical Data Streams. Proceedings of the ACM SIAM Conference on Data Mining, 2006 Here is the extended version in KAIS journal which combines clustering and outlier detection in text streams. (Text and Categorical Clustering of High Dimensional Data Streams).

G. Qi, C. C. Aggarwal, T. Huang. Towards Semantic Knowledge Propagation from Text Corpus to Web Images. WWW Conference, 2011. Presentation Slides (Transfer Learning from Text to Images with Linkage Hints)

G. Qi, C. C. Aggarwal, T. Huang. Transfer Learning of Distance Metrics by Cross-Domain Metric Sampling across Heterogeneous Domains, SDM Conference, 2012. Presentation Slides (Distance Metric Transfer Learning)

C. C. Aggarwal, P. Yu. The IGrid Index: Reversing the Dimensionality Curse for Similarity Indexing in High Dimensional Space. ACM KDD Conference, 2000. (Indexing high dimensional data by designing index-friendly distance functions.)

C. C. Aggarwal, P. Yu. On Effective Conceptual Indexing and Similarity Search in Text Data. IEEE International Conference on Data Mining (ICDM Conference), 2001. (New representation of text which provides effective and efficient similarity search).

C. C. Aggarwal, J. Wolf, P. Yu. A New Method for Similarity Indexing of Market Basket Data. ACM SIGMOD Conference, 1999. ( Indexing categorical data or Indexing Market Basket Data ).

C. C. Aggarwal, D. Agrawal. On Nearest Neighbor Indexing of Nonlinear Trajectories. Proceedings of the ACM PODS Conference, 2003. (First method for indexing mobile objects which are moving in a nonlinear trajectory).

M. Gupta, J. Gao, C. Aggarwal, J. Han. Temporal Outlier Detection. SDM Conference, 2013 Tutorial) A survey paper has also been submitted based on this work.

C. C. Aggarwal. Outlier Ensembles. ACM SIGKDD Explorations, December 2012. (A Position Paper on Outlier Ensembles) Presentation Slides

C. C. Aggarwal, S. C. Gates and P. S. Yu.. On using partial supervision for text categorization. IEEE TKDE, 2014. (Partially supervised clustering for text categorization)

C. C. Aggarwal. Representation is Everything: Towards Efficient and Adaptable Similarity Measures for Biological Data. Proceedings of the ACM SIAM Conference on Data Mining, 2006 (Explores Alternatives to Alignment Based Similarity Measures).

C. C. Aggarwal, C. Chen, J. Han. On the Inverse Classification Problem and its Applications, ICDE Conference, 2006 (A Framework for changing attributes to match a desired class: IBM Research Report (Extended Version) )

C. Aggarwal. The Generalized Dimensionality Reduction Problem. Proceedings of the SIAM Conference on Data Mining, 2010 (Designing Dimensionality reduction to optimize for arbitrary data mining criteria rather than simple variance preservation)

C. C. Aggarwal. On Effective Classification of Strings with Wavelets. Proceedings of the KDD Conference, 2002. (String Classification by Wavelet Decomposition).

G. Qi, C. Aggarwal, J. Han, T. Huang. Mining Collective Intelligence in Diverse Groups WWW Conference, 2013.

Back to home page of Charu Aggarwal