Visual.ly Launches an Infographic Site

I love infographics because they are a great way to convey information about data. They go well with the thought that Data Scientists need to also be good story tellers. Well Visual.ly is startup that is aimed at helping people create, share, and discover infographics. Here is a quick example I created about my twitter account.

My Twitter Infographic

Machine Learning – Big Data – Hadoop

http://ec2-50-17-219-147.compute-1.amazonaws.com/wordpress/2011/07/26/machine-learning-against-big-data-with-hadoop/

This is a nice post by Socketware.  It provides a nice overview of a few machine learning algorithms.

  • Recommendation Mining
  • Document Clustering
  • Document Classification
  • Frequent Itemset Mining

Another Big Data startup launches.

Don't Miss – Stanford Machine Learning

In a matter of days, Stanford will begin the second round of the free online machine learning course. I enrolled in the course last fall, and it exceded all expectations. Professor Andrew Ng is great. The prerequisites are minimal, so don’t worry if your math is a little rusty. Also, the videos are short (around 8 – 12 minutes). Therefore, you don’t need large blocks of time set aside. Just watch a video or two during your lunch and you should be able to keep up. There are programming assignments (optional) and review questions to go along with the videos.

Don’t worry if you fall behind. The videos will still be there. The material you learn is more important than the pace. If you don’t know machine learning, the Stanford class is a great opportunity to get started.

Here is Professor Ng’s introduction to the class.

Hot Tech Gig of 2022: Data Scientist

Hot Tech Gig of 2022: Data Scientist

What is a data scientist?

If I am going to create a blog about becoming a data scientist, I must at least provide some type of definition.  One of the best definitions I have read is by Hilary Mason, Chief Scientist at Bit.ly,

A data scientist is someone who can obtain, scrub, explore, model and interpret data, blending hacking, statistics, and machine learning.

This definition is short and simple, but there are many more definitions out there.  In fact CITO Research, a site for CIOs and CTOs, set out to define what a data scientist is.  They interviewed six leaders in the data science community, and posted all of the interviews online.  The interviews produced varied results, but focused on some main themes of what a data scientist should know.

After reading Hilary’s definition, the CITO Research interview’s, a great post at Quora, and numerous other articles, I created a list of data science skills:

  • Machine Learning
  • Statistics
  • Story Telling (Communication)
  • Big Data
  • Algorithms
  • Curiosity

I am sure this list will change and evolve over time, but that is where I am going to focus for now.  If you have anything to add to the list, please leave a comment.  If you are interested in gaining some data science skills, please follow along and let’s learn together.

The impact of Big Data on the World

http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html?_r=2&hpw=

A great article on data science appeared in the New York Times over the weekend.  It is worth your time to read it.

Why did I create Data Science 101?

Obviously the world does not need another blog. However, blogs are a great way to share information, and I am creating a new one anyway.

The analysis of data is becoming more important everyday. Data Science is quickly becoming a hot topic of interest, and I have a desire to become a data scientist. Thus, this blog will contain information I find useful during my data science journey. I hope others find the blog useful too.

If you are interested in becoming a data scientist, please follow along and let’s start learning together.

Learning To Be A Data Scientist