The tag line for Kaggle is “We’re making data science a sport.” They have successfully created a way to turn data science into a competition. It is both fun, and it yields excellent results. There is also a portion of the site dedicated for classroom use. It is called Kaggle in Class.
Here is how it works. A company that needs some data analyzed can contact Kaggle and host a competition. Then data scientists all over the world can compete to find the best solution. The company benefits from having many algorithms and techniques applied to the same data set. Many more algorithms are applied than what the company could run without Kaggle. The contestants benefit from networking, pre-cleaned data, and learning from others. It is a win/win situation. Plus, the winner gets prize money.
Currently, the large featured competition is the Heritage Health Prize. It is a $3,000,000 competition to identify individuals that will be admitted to the hospital in the next year. The contest lasts until April 2013.
This is definitely a site I want to be involved with in the future. I just wish they could make it a spectator sport as well.
Yesterday, Visual.ly launched a new way of creating infographics. It claims to be able to “create free customizable infographics in seconds.” I have not tried it yet, but it is on my todo list.
Science will use gaming for crunching data
This is an interesting article about how science will use online-gaming to solve some of sciences most difficult questions. Players think they are just solving puzzles.
I love infographics because they are a great way to convey information about data. They go well with the thought that Data Scientists need to also be good story tellers. Well Visual.ly is startup that is aimed at helping people create, share, and discover infographics. Here is a quick example I created about my twitter account.
My Twitter Infographic
If I am going to create a blog about becoming a data scientist, I must at least provide some type of definition. One of the best definitions I have read is by Hilary Mason, Chief Scientist at Bit.ly,
A data scientist is someone who can obtain, scrub, explore, model and interpret data, blending hacking, statistics, and machine learning.
This definition is short and simple, but there are many more definitions out there. In fact CITO Research, a site for CIOs and CTOs, set out to define what a data scientist is. They interviewed six leaders in the data science community, and posted all of the interviews online. The interviews produced varied results, but focused on some main themes of what a data scientist should know.
After reading Hilary’s definition, the CITO Research interview’s, a great post at Quora, and numerous other articles, I created a list of data science skills:
- Machine Learning
- Story Telling (Communication)
- Big Data
I am sure this list will change and evolve over time, but that is where I am going to focus for now. If you have anything to add to the list, please leave a comment. If you are interested in gaining some data science skills, please follow along and let’s learn together.
Obviously the world does not need another blog. However, blogs are a great way to share information, and I am creating a new one anyway.
The analysis of data is becoming more important everyday. Data Science is quickly becoming a hot topic of interest, and I have a desire to become a data scientist. Thus, this blog will contain information I find useful during my data science journey. I hope others find the blog useful too.
If you are interested in becoming a data scientist, please follow along and let’s start learning together.