Charu Aggarwal

Biography

Charu Aggarwal is a Distinguished Research Staff Member (DRSM) at the IBM T. J. Watson Research Center in Yorktown Heights, New York. He completed his B.S. from IIT Kanpur in 1993 and his Ph.D. from Massachusetts Institute of Technology in 1996. He has worked extensively in the field of data mining, with particular interests in data streams, privacy, uncertain data and social network analysis. He has published 14 (3 authored and 11 edited) books, over 250 papers in refereed venues, and has applied for or been granted over 80 patents. His h-index is 80. Because of the commercial value of the above-mentioned patents, he has received several invention achievement awards and has thrice been designated a Master Inventor at IBM. He is a recipient of an IBM Corporate Award (2003) for his work on bio-terrorist threat detection in data streams, a recipient of the IBM Outstanding Innovation Award (2008) for his scientific contributions to privacy technology, a recipient of two IBM Outstanding Technical Achievement Awards (2008) for his scientific contributions to high-dimensional and data stream analytics. He has received two best paper awards and an EDBT Test-of-Time Award (2014). He is a recipient of the IEEE ICDM Research Contributions Award (2015). He has served as the general or program co-chair of the IEEE Big Data Conference (2014), the ICDM Conference (2015), the ACM CIKM Conference (2015), and the KDD Conference (2016). He also co-chaired the data mining track at the WWW Conference 2009. He served as an associate editor of the IEEE Transactions on Knowledge and Data Engineering from 2004 to 2008. He is an associate editor of the ACM Transactions on Knowledge Discovery and Data Mining , an action editor of the Data Mining and Knowledge Discovery Journal , an associate editor of the IEEE Transactions on Big Data, and an associate editor of the Knowledge and Information Systems Journal. He is editor-in-chief of the ACM SIGKDD Explorations. He is a fellow of the IEEE (2010), ACM (2013), and the SIAM (2015) for "contributions to knowledge discovery and data mining algorithms."
DBLP Publication Profile

Google Scholar Citation Profile

C.V.

Research Interests:

Graph mining and Social Networks, Data Stream Mining, Uncertain Data Mining, Text and Multimedia Data Mining, Privacy Preserving Data Mining, High Dimensional Data Mining, Data Mining for Electronic Commerce


You can download the postscript/PDF files of my frequently accessed papers from my publication page. A more comprehensive list of publications is available from the DBLP database maintained by Michael Ley.

My citations can be accessed from this link to google scholar search. My h-index is 91.

My list of granted patents with full text is available from the patent office . A searchable link (by patent number) from the US patent office can be used to access the full text of the patents.

Here is a list of my academic honors and professional activities.

A resume may be found here .

Contact Information: Charu Aggarwal

IBM T. J. Watson Research Center, 1101 Kitchawan Rd, Yorktown, NY 10598

Email: charu (at) us (dot) ibm (dot) com

In case you have sent me an email at my earlier address with domain name watson.ibm.com, it is likely that I have not received it.


BOOKS: Most of my books are published with Springer (some with CRC Press) in both hard copy and electronic form. Both Springer and CRC Press generally have an excellent electronic distribution (through Springer link or CRCNetbase) in addition to hard copies. The web pointers to Springer Link and CRCNetbase for each book are also provided below. Some institutions also have agreements or subscriptions with Springer or CRC Press which allow them access to Springer link or CRC Press electronic material. You may want to check with your library. Springer also has a unique MyCopy Program, whereby you might be able to order a very low-priced ($25) personal softcover copy of Springer published books under certain circumstances (depending on your institution's subscriptions with Springer). Check here for details. To check whether your institution is eligible, you can also search for my Springer books using this link on a computer directly connected to your institution's network. In the event that your institution subscribes to a package containing the relevant book, then you should be able to download the relevant book for free and even order a $25 softcover MyCopy directly from the Springerlink book page. If your institution is eligible, you will see a (free) `Download Book' button. Otherwise, you will see a (paid) `Get Access' button. On the same Web page, you will also see the option to buy the Mycopy book for subscribing institutions. The free download page with the MyCopy option looks like this (see lower right of image for MyCopy option). The MyCopy book is in paperback binding, and contains black and white figures. However, the technical material is otherwise identical to the official hard-cover version sold on regular channels.

NEURAL NETWORKS AND DEEP LEARNING: A TEXTBOOK

Neural Networks and Deep Learning: A Textbook (Springer), authored by Charu C. Aggarwal, September 2018.

Table of Contents

Book Page with slides

PDF Download Link (Free for computers connected to subscribing institutions only)

Buy hardcover from Springer or Amazon (for general public)

Buy low-cost paperback edition (MyCopy link on right appears only for computers connected to subscribing institutions)

This is a textbook on neural networks and deep learning. The book discusses the theory and algorithms of deep learning. The theory and algorithms of neural networks are particularly important for understanding important concepts in deep learning, so that one can understand the important design concepts of neural architectures in different applications. Why do neural networks work? When do they work better than off-the-shelf machine learning models? When is depth useful? Why is training neural networks so hard? What are the pitfalls? Even though the book is not implementation-oriented, it is rich in discussing different applications. Applications associated with many different areas like recommender systems, machine translation, captioning, image classification, reinforcement-learning based gaming, and text analytics are covered. The following aspects are covered:

The basics of neural networks: Chapters 1 and 2 discuss the basics of neural network design and also the fundamentals of training them. The simulation of various machine learning models with neural networks is provided. Examples include least-squares regression, SVMs, logistic regression, Widrow-Hoff learning, singular value decomposition, and recommender systems. Recent models like word2vec are also explored, together with their connections with traditional matrix factorization. Exploring the interface between machine learning and neural networks is important because it provides a deeper understanding of how neural networks generalize known machine learning methods, and the cases in which neural networks have advantages over traditional machine learning.

Challenges in training neural networks: Although Chapters 1 and 2 provide an overview of the training methods for neural networks, a more detailed understanding of the training challenges is provided in Chapters 3 and 4. In particular, issues related to network depth and also overfitting are discussed. Chapter 5 presents a classical architecture, referred to as radial-basis function networks. Even though this architecture is no longer used frequently, it is important because it represents a direct generalization of the kernel support-vector machine.

Advanced architectures and applications: A lot of the success in neural network design is a result of the specialized architectures for various domains and applications. Examples of such specialized architectures include recurrent neural networks and convolutional neural networks. Since the specialized architectures form the key to the understanding of neural network performance in various domains, most of the book will be devoted to this setting. Several advanced topics like deep reinforcement learning, neural Turing mechanisms, and generative adversarial networks are discussed.

Some of the ``forgotten'' architectures like RBF networks and Kohonen self-organizing maps are included because of their potential in many applications. The book is written for graduate students, researchers, and practitioners. The book does require knowledge of probability and linear algebra. Furthermore basic knowledge of machine learning is helpful. Numerous exercises are available along with a solution manual to aid in classroom teaching. Where possible, an application-centric view is highlighted in order to give the reader a feel for the technology.

The book is available in both hardcopy (hardcover) and electronic versions. The hardcover is available at all the usual channels (e.g, Amazon, Barnes and Noble etc.), in Kindle format, and also directly from Springer in hardcopy and pdf format. The good thing about Springer is that electronic versions are often widely accessible at no cost to subscribing institutions, which is particularly convenient for students. My understanding is that a very large fraction of universities in North America, Europe, Australia, and New Zealand are subscribers, and a rapidly increasing number of universities in Asia are also subscribing. The electronic version is available at the following Springerlink pointer . For subscribing institutions click from a computer directly connected to your institution network to download the book for free. Springer uses the domain name of your computer to regulate access. To be eligible, your institution must subscribe to "e-book package english (Computer Science)" or "e-book package english (full collection)". If your institution is eligible, you will see a (free) `Download Book' button. Otherwise you will see a (paid) `Get Access' button. Sometimes you may be able to download it from your library e-collection, even when it is not Web-accessible from your institution. For those who prefer desk copies rather than electronic books, there are some very cost-effective methods to obtain a paperback MyCopy edition for $25 or less (subscribing institutions only). If you have ever published an article (even journal) with Springer, you are also entitled to an additional 40% author discount for any Springer book (including the $25 paperback edition) using the approach described here .


MACHINE LEARNING FOR TEXT

Machine Learning for Text (Springer), Authored by Charu Aggarwal, April 2018. -- Comprehensive textbook on machine learning for Text.

Table of Contents

Book page

PDF Download Link (Free for computers connected to subscribing institutions only)

Buy hard-cover or PDF (for general public- The PDF has embedded links and can be loaded on a kindle reader. The PDF version's equations read better on a kindle e-reader than the kindle edition from Amazon)

This book covers machine learning techniques from text using both bag-of-words and sequence-centric methods. The scope of coverage is vast, and it includes traditional information retrieval methods and also recent methods from neural networks and deep learning. The chapters of this book can be organized into three categories:

Classical machine learning methods: These chapters discuss the classical machine learning methods such as matrix factorization, topic modeling, dimensionality reduction, clustering, classification, linear models, and evaluation. All these techniques treat text as a bag of words. Contextual learning methods that combine different types of text and also combine text with heterogeneous data types are covered.

Classical information retrieval and search engines: Although this book is focussed on text mining, the importance of retrieval and ranking methods in mining applications is quite significant. Therefore, the book covers the key aspects of information retrieval, such as data structures, Web ranking, crawling, and search engine design. Importance is given to different types of information retrieval scoring models and learning-to-rank techniques.

Sequence-centric, deep learning, and linguistic methods for mining: While the bag-of-words representation can be useful for traditional applications like classification and clustering, more advanced applications like machine translation, image captioning, opinion mining, information extraction, and text segmentation require one to treat text as a sequence. These chapters discuss methods for sequence-centric mining methods such as deep learning techniques, word2vec, recurrent neural networks, LSTMs, maximum entropy Markov models, and Conditional Random Fields. Custom methods for applications like text summarization, opinion mining, and event detection are also discussed.

The book can be used as a textbook and it is numerous exercises. However, it is also designed to be useful to researchers and industrial practitioners. It therefore contains extensive bibliographic references for researchers, and the bibliographic section also contains software references for practitioners. Numerous examples and exercises have been provided.

The book is available in both hardcopy and in electronic form. The electronic version is available at this Springerlink pointer, which might allow you to download the book for free, depending on your institution's subscriptions. To attempt a free download, click from a computer directly connected to your institution network. To be eligible, your institution must subscribe to "e-book package english Computer Science" or "e-book package english (full collection)". If your institution is eligible, you will see a (free) `Download Book' button. Otherwise you will see a (paid) `Get Access' button. The Springer site uses the domain name of your computer to regulate access. Sometimes you may be able to download it from your library e-collection, even when it is not Web-accessible from your institution. Members of eligible (subscribing) institutions might also be able to buy a low-cost paperback edition ($25 MyCopy edition) from the same Web page at which the free book can be downloaded .


RECOMMENDER SYSTEMS: THE TEXTBOOK

Recommender Systems: The Textbook (Springer), Authored by Charu Aggarwal, April 2016. -- Comprehensive textbook on recommender systems.

Table of Contents

Book page with book description, solution manual, and other resources

PDF Download Link (Free for computers connected to subscribing institutions only)

Buy hard-cover or PDF (for general public- The PDF has embedded links and can be loaded on a kindle reader. The PDF version's equations read better on a kindle e-reader than the kindle edition from Amazon)

Buy low-cost paperback edition (Instructions for computers connected to subscribing institutions only)

This book covers the topic of recommender systems comprehensively, starting with the fundamentals and then exploring the advanced topics. The chapters of this book can be organized into three categories:

Algorithms and evaluation: These chapters discuss the fundamental algorithms in recommender systems, including collaborative filtering methods, content-based methods, knowledge-based methods, ensemble-based methods, and evaluation.

Recommendations in specific domains and contexts: The context of a recommendation can be viewed as important side information that affects the recommendation goals. Different types of context such as temporal data, spatial data, social data, tagging data, and trustworthiness are explored.

Advanced topics and applications: Various robustness aspects of recommender systems, such as shilling systems, attack models, and their defenses are discussed. In addition, recent topics, such as multi-armed bandits, learning to rank, group systems, multi-criteria systems, and active learning systems, are discussed together with applications.

Although this book is primarily written as a textbook, it is recognized that a large portion of the audience will comprise industrial practitioners and researchers. Therefore, the book is also designed to be useful from an applied and reference point of view. Numerous examples and exercises have been provided.

The book is available in both hardcopy and in electronic form. The electronic version is available at this Springerlink pointer, which might allow you to download the book for free, depending on your institution's subscriptions. To attempt a free download, click from a computer directly connected to your institution network. To be eligible, your institution must subscribe to "e-book package english Computer Science" or "e-book package english (full collection)". If your institution is eligible, you will see a (free) `Download Book' button. Otherwise you will see a (paid) `Get Access' button. The Springer site uses the domain name of your computer to regulate access. Sometimes you may be able to download it from your library e-collection, even when it is not Web-accessible from your institution. Members of eligible (subscribing) institutions might also be able to buy a low-cost paperback edition ($25 MyCopy edition) from the same Web page at which the free book can be downloaded . Here is an screenshot and description of what the download/MyCopy Web page will look like, when you are accessing it from a computer connected to a subscribing institution. Interestingly, you can use these methods for virtually any Springer book.


DATA MINING: THE TEXTBOOK

Data Mining: The Textbook (Springer), Authored by Charu Aggarwal, May 2015. -- Comprehensive textbook on data mining.

Table of Contents

Book page with book description, solution manual, and other resources

PDF Download Link (Free for computers connected to subscribing institutions only)

Buy hard-cover or PDF (for general public- The PDF has embedded links and can be loaded on a kindle reader. The PDF version's equations read better on a kindle e-reader than the kindle edition from Amazon)

Buy low-cost paperback edition (Instructions for computers connected to subscribing institutions only)

The emergence of data science as a discipline requires the development of a book that goes beyond the traditional focus of books on fundamental data mining problems. More emphasis needs to be placed on the advanced data types such as text, time series, discrete sequences, spatial data, graph data, and social networks. This comprehensive data mining book explores the different aspects of data mining, starting from the fundamentals, and subsequently explores the complex data types and their applications. Therefore, this book may be used for both introductory and advanced data mining courses. The chapters of this book fall into one of three categories:

The fundamental chapters: Data mining has four main problems, which correspond to clustering, classification, association pattern mining, and outlier analysis. These chapters comprehensively discuss a wide variety of methods for these problems.

Domain chapters: These chapters discuss the specific methods used for different domains of data such as text data, time-series data, sequence data, graph data, and spatial data.

Application chapters: These chapters study important applications such as stream mining, Web mining, ranking, recommendations, social networks, and privacy preservation. The domain chapters also have an applied flavor.

The book carefully balances mathematical details and intuition. It contains the necessary mathematical details for professors and researchers, but it is presented in a simple and intuitive style to improve accessibility for students and industrial practitioners. Numerous illustrations, examples, and exercises are included with an emphasis on semantically interpretable examples.

The book is available in both hardcopy and in electronic form. The electronic version is available at this Springerlink pointer, which might allow you to download the book for free, depending on your institution's subscriptions. To attempt a free download, click from a computer directly connected to your institution network. To be eligible, your institution must subscribe to "e-book package english Computer Science" or "e-book package english (full collection)". If your institution is eligible, you will see a (free) `Download Book' button. Otherwise you will see a (paid) `Get Access' button. The Springer site uses the domain name of your computer to regulate access. Sometimes you may be able to download it from your library e-collection, even when it is not Web-accessible from your institution. Members of eligible (subscribing) institutions might also be able to buy a low-cost paperback edition ($25 MyCopy edition) from the same Web page at which the free book can be downloaded . Here is an screenshot and description of what the download/MyCopy Web page will look like, when you are accessing it from a computer connected to a subscribing institution. Interestingly, you can use these methods for virtually any Springer book.


OUTLIER ANALYSIS: Second Edition (2017)

Outlier Analysis (Springer) Authored by Charu Aggarwal, 2017. Comprehensive text book on outlier analysis, including examples and exercises for classroom teaching. Most of the previous books on outlier detection were written by statisticians for statisticians, with little or no coverage from the data mining and computer science perspective. This book is intended to fill that gap. Each chapter contains key research content on the topic, case studies, extensive bibliographic notes and the future direction of research in this field. Includes exercises as well.

Covers applications for credit card fraud, network intrusion detection, law enforcement etc.

Content is simplified so students and practitioners can also benefit from this book.

Chapters will typically cover one of three areas: methods and techniques commonly used in outlier analysis, such as linear methods, proximity-based methods, subspace methods, and supervised methods; data domains, such as, text, categorical, mixed-attribute, time-series, streaming, discrete sequence, spatial and network data; and key applications of these methods as applied to diverse domains such as credit card fraud detection, intrusion detection, medical diagnosis, earth science, web log analytics, and social network analysis. The second edition contains significant new material in one-class support vector machines, neural networks, matrix factorization, outlier ensembles, text outliers, and graph mining. It also has a solution manual to aid class room teaching.

The original (first edition) book had been selected among the Best publications of 2013 by ACM Computing Reviews.

Table of Contents and Sample Chapters

Sample chapter on outlier detection for high dimensional data

Springer Link (For subscribing institutions click from within your institution network. If your institution is eligible, you will see a (free) `Download Book' button. Otherwise, you will see a (paid) `Get Access' button.)


FREQUENT PATTERN MINING

Frequent Pattern Mining (Springer), Ed. Charu Aggarwal and Jiawei Han, September 2014. -- Comprehensive survey driven book on frequent pattern mining with chapters contributed by prominent researchers in the field.

Table of Contents and Introductory Chapters

Springerlink for Electronic Version (For subscribing institutions click from within your institution network. If your institution is eligible, you will see a (free) `Download Book' button. Otherwise you will see a (paid) `Get Access' button.)


DATA CLASSIFICATION BOOK

Data Classification: Algorithms and Applications (CRC Press), Ed. Charu Aggarwal, June 2014. -- Comprehensive survey driven book on data classification with chapters contributed by prominent researchers in the field.

Table of Contents and Introductory Chapter

CRC Netbase Link for Electronic Book


DATA CLUSTERING BOOK

Data Clustering: Algorithms and Applications (CRC Press), Ed. Charu Aggarwal, Chandan Reddy, September 2013. -- Comprehensive survey driven book on data clustering with chapters contributed by prominent researchers in the field.

Table of Contents and Introductory Chapter

CRC Netbase Link for Electronic Book


HEALTHCARE DATA ANALYTICS BOOK

Healthcare data analytics (CRC Press), Ed. Chandan Reddy, Charu Aggarwal, June 2015. -- Comprehensive survey driven book on healthcare with chapters contributed by prominent researchers in the field.

Table of Contents and Introductory Chapter

CRC Netbase Link for Electronic Book



OTHER BOOKS (PUBLISHER'S ELECTRONIC LINK to PDF INCLUDED)

Outlier Ensembles: An Introduction. C. Aggarwal, and S. Sathe. Springer, 2017.

Outlier Detection for Temporal Data. by M. Gupta, J. Gao, C. Aggarwal, J. Han, 2014. Morgan and Claypool Publishers.

Managing and Mining Sensor Data Ed. C. Aggarwal, 2013. PDF on Springerlink

Mining Text Data. Ed. C. Aggarwal, C. Zhai, 2012. PDF on Springerlink

Social Network Data Analytics. Ed. C. Aggarwal, 2011 Here is the Introductory Chapter PDF on Springerlink

Managing and Mining Graph Data (Springer) Ed. C. Aggarwal, H. Wang, 2010. PDF on Springerlink

Managing and Mining Uncertain Data (Springer) Ed. C. Aggarwal, 2009. PDF on Springerlink

Privacy-Preserving Data Mining: Models and Algorithms Ed. C. Aggarwal, P. Yu, 2008. Here is an Introductory Survey. PDF on Springerlink

Data Streams: Models and Algorithms Ed. C. Aggarwal, 2007. Survey Chapter on Synopsis Construction in Data Streams PDF on Springerlink


My podcast on data streams from IBM Research
Download Link for PS/PDF files of frequently accessed papers


KDNuggets, Analytics and Data Mining Resources