Tag Archives: data mining

Free Book, Mining Massive Datasets, 2nd Edition

A new edition of Mining Massive Datasets is now available. It is used for a number of data mining courses at colleges across the US (and globe). Here are just a few of the topics from the book.

  • Map-reduce
  • Clustering
  • Recommendation Systems
  • Dimensionality Reduction
  • Social Network Analysis
Advertisements

Data Mining and Analysis Textbook (Free Download)

Mohammed J. Zaki, Computer Science Professor at RPI, and Wagner Meira Jr., Computer Science Professor at Universidade Federal de Minas Gerais, have written the textbook Data Mining and Analysis: Fundamental Concepts and Algorithms. The book is currently available as a PDF download.

Based upon the chapters, the book looks very good. It contains large sections on data analysis, clustering, and classification. The final book will be published sometime in 2014.

Data Mining MOOC

The University of Waikato in New Zealand will be offering a free online course titled, Data Mining with Weka.

Weka is a widely-used toolkit for data mining and machine learning. The University of Waikato developed the toolkit.

Don’t wait too long to sign up, the course starts September 9, 2013.

Here is a video of the instructor of the course providing a brief overview.

Data Mining Standard Processes

There are a couple of standard processes for approaching data mining problems.

CRISP-DM

The most common approach is Cross Industry Standard Process for Data Mining (CRISP-DM).

Steps of CRISP-DM

  1. Business Understanding
  2. Data Understanding
  3. Data Preparation
  4. Modeling
  5. Evaluation
  6. Deployment

The steps are mostly self-explanatory, but the CRISP-DM wikipedia page has a lengthier description.

SEMMA

The second most popular process for data mining is SEMMA.

Steps of SEMMA

  1. Sample
  2. Explore
  3. Modify
  4. Model
  5. Assess

More details can be found on the SEMMA wikipedia page.

A Data Science Process?

Other than The Data Scientific Method (which is not a standard), I am not aware of any other process for data science.

Do you know of any processes for data science? Is anyone aware of a group working on standardizing a data science process?

Win-Vector Blog » Data Science, Machine Learning, and Statistics: what is in a name?

Win-Vector Blog » Data Science, Machine Learning, and Statistics: what is in a name?.

This is an excellent write-up for the differences between:

  • Statistics
  • Machine Learning
  • Data Mining
  • Informatics
  • Big Data
  • Predictive Analytics
  • Data Science

Best Free Data Mining Tools

I recently saw the article, The Best Data Mining Tools You Can Use for Free in Your Company. It contains a very brief description of each of the following tools.

  1. RapidMiner
  2. RapidAnalytics
  3. Weka
  4. PSPP
  5. KNIME
  6. Orange
  7. Apache Mahout
  8. jHepWork
  9. Rattle

See The Best Data Mining Tools You Can Use for Free in Your Company for more details, links, and pictures.