Also this spring, Stanford will be offering two more courses that might benefit a person learning data science.
- Probabilistic Graphical Models – combining ideas from statistics, probability, and computer science to solve really hard problems…sounds neat
- Natural Language Processing (NLP) – performing algorithms on human language and strings
If you feel these 2 classes might be a bit too advanced at this point, then here are a couple more fundamental computer science classes. If you are new to computer science and programming, CS 101 would be a good choice. If you are not not as new to computer science or might be a bit rusty on your core algorithms knowledge, then Design and Analysis of Algorithms 1 might be appropriate.
Actually, the courses are no longer being offered by just Stanford. A few others schools have been added. The courses are now being offered through Coursera. Plus all the courses are free.
This is an interesting article about how science will use online-gaming to solve some of sciences most difficult questions. Players think they are just solving puzzles.
- A Good Skillset
Jeremy Howard of Kaggle at Strata 2012
In this brief interview he covers a range of other data science topics:
- Big Data is an engineering problem
- Analytics generate value/insight from data
- Predictive Modeling is about answering a question – build a model to do that
- Is Data Science about tools or people? – watch the video for Jeremy’s answer
- And others…
See this previous post for more videos from Strata 2012.
The Strata Conference Making Data Work for 2012 just finished up. If you (like me) were unable to attend the conference, you may have missed out on some of the networking and excitement of actually being at the conference, but you can still glean some knowledge from the videos.
Steve Schoettler “Learning Analytics”
This is a good video about how data can be used to help people learn.
There are many other Strata 2012 videos available as well. See below for links to them.
Other Strata 2012 Videos
See the O’Reilly Strata CA 2012 Playlist on Youtube for more videos. The videos contain numerous interviews with the speakers and even a few of the talks. Also, many of the slide decks can be found on the Strata Conference website.
Have fun catching up on everything that happened at Strata Conference 2012.
In the article Do you need a data scientist?, the following questions get answered:
- What data scientist’s do?
- Who makes a good data scientist?
- When is the right time to hire a data scientist?
Hopefully, I will discuss each of these questions in more detail in a later blog post.
To answer the question, if your data is growing you would probably benefit from a data scientist.
The following is a video that goes along with this topic.
It is a great time to be working on a startup in the Big Data arena. First the topics of big data and data science are really popular in the tech world right now. Second, it appears that investors are interested as well. Below are two examples:
Accel Partners formed a $100 million fund for startups that are focused on Big Data. The fund is not limited to storage or analysis. It is really all encompassing. If you have a startup that, in any way, helps people or businesses deal with lots of data, then you are welcome to apply. For more on the fund and how to apply, see the page on the Accel Partners’ website.
IA Ventures has also set up a $105 million dollar fund for startups that are focused on data.
So if you have a “data” startup, now is a great time to get some funding.
A few days ago, I mentioned that the Stanford Machine Learning class will be starting soon. I thought I should quickly mention some of the topics covered. The list also serves as a great outline for machine learning.
In supervised learning, one has a set of data with features and labels.
- Linear Regression – one/multiple variables
- Gradient Descent – a general algorithm for minimizing a function
- Logistic Regression – This is useful when predicting classification type results. For example, are you looking for a yes or no result. Does the patient have cancer? Will the customer buy my new product? It can also be helpful for more than 2 results. What color will a person choose (red, blue, green, silver)?
- Neural Networks – A learning algorithm that is modeled after the brain. Think of neurons.
- Support Vector Machines
In unsupervised learning, one has a set of data with no features and labels. Can some structure be found for the data?
- Clustering – The most popular technique is K-means.
- PCA (Principal Components Analysis) – speed up a learning algorithm
This section covers methods to determine if data is bad. Bad data is considered an anomaly.
Like the name says, recommender systems are used to make recommendations. Companies like Netflix use recommender systems to recommend new movies to customers. LinkedIn also recommends people to connect with. This is a fairly hot topic in the tech world right now.
- Content Based(Features)
- Modified Linear Regression
- Non-content Based(No Features)
- Collaborative Filtering
- Matrix Factorization
If any of these topics sound interesting to you, signup for the Stanford Machine Learning class. Professor Andrew Ng will do an excellent job explaining the details.