Here I will be posting informative reviews for books I receive from publishers, and which in my view, students and others practising data science may find useful. Please note that I only post reviews of books that are in my view neutral or good.

Introduction to Data Mining, Second Edition, by Tan, Steinbach, Krpathe and Kumar

When I first looked at content pages of the book, I was not too sure as to whether this would be a good recommendation for my MSc Data Mining module. However, once I started reading various sections I realised that this is indeed a suitable data mining book. The sections are informative with nice examples (non-programming), and the maths are properly explained (useful for students who may not have a maths background!). The book covers important topics such as data pre-processing, classification, clustering, imbalanced data, anomaly detection (i.e. noisy data, observations that do not fit the normal distribution, outliers etc.) using statistical, clustering and other techniques.

The downside of the book is that it does not explain in enough depth and breadth important feature engineering concepts. Feature extraction (SVD, PCA, etc.) approaches that are supposed to be described in the book are missing from the printed version since the subject index page refers to non-existent pages e.g. PCA, 877-880. The last page of the book is page 859. The book covers some of the basics of feature selection but does this very briefly. Feature filtering, feature selection, and feature extraction are all very important data mining topics, and in my view each of these topics requires a dedicated chapter.

The book teaches data mining concepts and its content is not associated with any programming language. Hence one can adopt the book regardless whether they want to learn/teach data mining in Python, MATLAB, R, etc. This also means that you will not see any case studies or exciting applications of data mining concepts in this book.

So, if you are looking for a book to learn the main concepts of data mining then this is a good choice. If you prefer to read about concepts and then go through programming examples to strengthen your understanding, then it is worth searching for such examples online.

For my data mining module, I will use this book to teach the concepts of data mining. I will put together programming examples with large data sets to enable students gain that vital hands on experience that will strengthen their understanding of data mining concepts and applications.

Review date: 24 July 2019 More information about the book can be found here.

Modeling techniques in predictive analytics with Python and R: A guide to data science by Thomas W. Miller published by Pearson Education Inc. This book introduces a number of data science predictive modelling techniques focusing on text analytics, sentiment analysis, sports analysis, economic data analysis and spatial data analysis applications which are essentially tasks requiring regression or classification techniques. Each chapter is focused on an application and presents a solution using Python or R. The book’s solutions are using conventional machine learning approaches (such as Support Vector Machines for classification, ARIMA models for time series economic data analysis, and Regression models for Spatial Data) for these regression and classification problems. The book teaches a lot of concepts on how to deal and analyse non-image data. It can serve as a nice textbook on learning basic data analysis concepts and getting a good understanding of how to code in Python or R.

The book presents code in each chapter without explanations of the code in detail. Hence, a suitable reader for this book is someone who is already familiar and has some experience in coding using the Python and/or R programming language. However, the reader may soon resort into looking at alternative approaches to solving these problems using more current data science libraries, and the more recent deep learning approaches.

Review date: 7 June 2019 More information about the book can be found here.