Tag Archives: algorithms

Analytics vs Data Science

The lines between analytics and data science can definitely be very blurry. Different companies might call the same position by two different names, but at their core, they do have some differences.
Below is an infographic from the faculty of the Online MS in Analytics at American University. I think the infographic is accurate.

In my opinion, a true data scientist should spend more time creating and programming new algorithms while a business analyst should spend more time applying existing algorithms.

A couple of notes
  1. Years of Education are not much different, but the academic disciplines are very different. Data Scientists tend to have degrees with more rigorous mathematical training. For me, this is the biggest differentiator.
  2. It appears financial institutions prefer business analysts while the government and colleges prefers data scientists
  3. Surprisingly, Business analyst jobs are projected to grow faster than data scientists (27% to 15%), not sure I totally agree with that!

Know Of Other Differences?

Please, Leave a Comment.

Brought to you by American University’s Analytics@American, a masters in business analytics

Yinyang K-Means: A Drop-In Replacement of the Classic K-Means

This week; Yufei Ding, Yue Zhao, Xipeng Shen, Madanlal Musuvathi, and Todd Mytkowicz will be presenting Yinyang K-means at the 2015 International Conference on Machine Learning.

The algorithm guarantees the same results as traditional K-means, but it produces results with an order of magnitude higher performance.

An abstract of the paper and a PDF download can be accessed at Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup.

Top ten algorithms in data mining (2007) [pdf] | Hacker News

Top ten algorithms in data mining (2007) [pdf] | Hacker News.

The discussion below the link is also very good.

If you are curious, here are the 10 algorithms, and the paper is displayed below.

  1. C4.5
  2. k-Means
  3. SVM
  4. Apriori
  5. EM
  6. PageRank
  7. AdaBoost
  8. kNN
  9. Naive Bayes
  10. CART

6 Machine Learning Algorithms

6 Machine Learning Algorithms

This posts provides a nice quick overview of 6 machine learning algorithms.

  1. Decision Trees
  2. Linear Regression
  3. Neural Networks
  4. Bayesian Networks
  5. Support Vector Machines (SVMs)
  6. Nearest Neighbor

More Free Courses from Stanford

Also this spring, Stanford will be offering two more courses that might benefit a person learning data science.

If you feel these 2 classes might be a bit too advanced at this point, then here are a couple more fundamental computer science classes.  If you are new to computer science and programming, CS 101 would be a good choice.  If you are not not as new to computer science or might be a bit rusty on your core algorithms knowledge, then Design and Analysis of Algorithms 1 might be appropriate.

Actually, the courses are no longer being offered by just Stanford.  A few others schools have been added.  The courses are now being offered through Coursera. Plus all the courses are free.

Did You Miss Strata 2012?

The Strata Conference Making Data Work for 2012 just finished up. If you (like me) were unable to attend the conference, you may have missed out on some of the networking and excitement of actually being at the conference, but you can still glean some knowledge from the videos.

Steve Schoettler “Learning Analytics”

This is a good video about how data can be used to help people learn.
There are many other Strata 2012 videos available as well. See below for links to them.

Other Strata 2012 Videos

See the O’Reilly Strata CA 2012 Playlist on Youtube for more videos. The videos contain numerous interviews with the speakers and even a few of the talks. Also, many of the slide decks can be found on the Strata Conference website.

Have fun catching up on everything that happened at Strata Conference 2012.

What is a data scientist?

If I am going to create a blog about becoming a data scientist, I must at least provide some type of definition.  One of the best definitions I have read is by Hilary Mason, Chief Scientist at Bit.ly,

A data scientist is someone who can obtain, scrub, explore, model and interpret data, blending hacking, statistics, and machine learning.

This definition is short and simple, but there are many more definitions out there.  In fact CITO Research, a site for CIOs and CTOs, set out to define what a data scientist is.  They interviewed six leaders in the data science community, and posted all of the interviews online.  The interviews produced varied results, but focused on some main themes of what a data scientist should know.

After reading Hilary’s definition, the CITO Research interview’s, a great post at Quora, and numerous other articles, I created a list of data science skills:

  • Machine Learning
  • Statistics
  • Story Telling (Communication)
  • Big Data
  • Algorithms
  • Curiosity

I am sure this list will change and evolve over time, but that is where I am going to focus for now.  If you have anything to add to the list, please leave a comment.  If you are interested in gaining some data science skills, please follow along and let’s learn together.