OpenIntro (Free online Stats Book) is getting a new version. The updates sound good. If you are looking to learn statistics, this is an excellent and cost effective solution.
Yesterday, I posted about some traditional strategies to acquire data science skills. Today, I will post a nontraditional strategy.
There is hoards of data science information available on the internet for free. With enough personal motivation, a person could learn all the skills necessary for free (or cheap) online. Coursera is probably a great place to start. There are also other good sites such as Udacity, the Kaggle Wiki, other blogs and websites.
The problem with this approach is knowing exactly what to learn. A course in machine learning is great, but data science is more than just machine learning. How do you know what to learn? It would be really nice to have a collection of data science topics and the associated online training materials.
Would this strategy work for you?
Based upon the popularity of a previous post about a certificate program from the University of Washington, it appears that many people are interested in learning the skills necessary to become a data scientist. Thus, I decided to compile a list of some of the possible learning strategies.
Traditional College Education
The most obvious path would be to study at a traditional college or university. Colleges and universities are starting to notice the demand for data science skills, and many colleges are currently offering programs to prepare someone as a data scientist. This path is safe and predictable. Do the homework, complete the courses, and get the degree or certificate. Most people are familiar with the process, and it offers few surprises. The problems here are the costs, lack of flexibility, and time involved.
Companies are now starting to offer training programs for data science. EMC is leading the way in this category with their data science training program. Cloudera also offers lots of training related to hadoop and big data. Wolfram offers data science training with Mathematica. One of the problems with this category is the cost. Another problem is the companies have the tendency to teach and promote their own products. This may leave the student with numerous gaps in the full data science spectrum.
What are you thoughts about the above approaches? What are the positives and negatives? Also, later this week I will be posting some less-traditional approaches to learning data science.
Starting in the fall of 2012, the University of Washington will be offering a certificate in Data Science. The program has two sections: one located in Seattle and the other online. The certificate consists of three separate courses each lasting approximately 3 months. Thus the program can be completed in 9 months, and the cost is around $3000.
There are some information sessions later this summer. If you are in Seattle, there is an information session on July 19. If you are interested in the online program, a webinar is scheduled for August 29.
The program content looks quite good. Some of the topics to be covered include: hadoop, NoSQL, machine learning, statistics, graph algorithms, and more. If you are looking to become a data scientist, this just might be the program for you.
I also added this certificate program to my list of College’s offering data science degrees.
Just yesterday, MIT and Harvard University announced a new partnership to offer online education. The goal is to increase learning for students on-campus and others throughout the globe. Both schools plan to study the results of edX to better understand how students learn and how technology affects learning.
See the official announcement here.
EdX Video Announcement
How will this affect Data Science Learning?
It is too early to know exactly what courses will be offered, but given MIT’s strength in engineering, those courses would seem reasonable. I am guessing (and hopeful) that many courses pertinent to data science will be offered by edX. Also, the announcement is most likely a response by MIT and Harvard to compete with Coursera, a company started by 2 Stanford University faculty. Obviously, the elite schools do not want to be outdone by each other. In any case, I only see these new and different methods for education as a good thing. Happy Learning!
Since recently announcing $16M in funding, Coursera has been making quite a bit of noise. Last fall, Stanford University decided to freely offer a couple computer science classes online. The response was huge, and that led to the creation of Coursera.
The courses are no longer limited to computer science, and Stanford is no longer the only school involved. Here is a list of academic areas being offered and another list with the schools involved.
- Healthcare, Medicine, and Biology
- Economics, Finance, and Business
- Humanities and Social Sciences
- Mathematics and Statistics
- Computer Science
- Society, Networks, and Information
Although, not all of the courses will be directly related to data science, many of them are very close. Naturally Math, Statistics, and Computer Science areas have direct relations to data science. However, some of the other areas such as Networks, Biology, and Economics are some of the most popular application areas for data science. This is very exciting. My only concern is that the courses are a bit too much like traditional university courses with specific start/end dates and homework due dates. It will be interesting to see if the course structures change over time.
Anyhow, the following courses are starting today. Signup and start learning.
- Machine Learning – A major focus area of data science
- Computer Science 101 – probably a good starting point if you don’t know how to program
- Compilers – good for understanding how programming languages work
- Automata – hard to explain in 1 line, but it contains some fundamental principles in computer science
- Intro to Logic – learn to reason systematically
- Computer Vision – not sure of the relation to data science, but I am sure there is one, if you know, please leave a comment
Are you going to enroll in any of these courses?
This infographic displays the need for colleges and universities to start preparing more data science graduates.
With some of the top tech entrepreneurs in the U.S. either dropping out of college or not attending, there is some debate about whether college is the right choice or not. This post will focus on college for data science. However, for college in general, if you know what you want to study, then college or graduate school is a great option. If you are going to college because you do not know what else to do, I would say college is too expensive for that.
Most would agree that an undergraduate degree in some highly analytical field (math, CS, economics, physics) is definitely beneficial. Plus college has a strict set of guidelines and a specific order for the learning. A formal degree program often provides the necessary motivation for a person to continue learning. The U.S. college education system is not perfect, but if it keeps a person from quitting, it will help to reach the goal of becoming a data scientist.
All this leads to a second point. Only a few colleges offer undergraduate degree programs for data science. Thus, graduate school or more learning will still be required. College should provide the necessary prerequisites and many employers will pay for the continued learning.
A highly motivated person could probably learn most if not all the data science skills on the internet for free or very low cost. The key is being a highly motivated person. That person must have the drive to not quit when the learning becomes difficult. Also, there are no classmates or professors to help with difficult concepts. Sure, the internet can help there, but it requires a bit more work to find the help. Plus, knowing what topics to learn and in what order can be challenging. Already, this blog has much helpful content, but it is not organized based upon a sequence of learning. Not attending college presents some obstacles that only the most highly motivated students will overcome. As more and more learning resources appear online, the no college option may become more popular.
What is the Answer?
Strictly speaking, I would say the answer is NO. However, many people will not succeed without the rigor of school, and some companies will not hire a person without a degree. So, college is not 100% essential to being a data scientist, but for many it is probably the best option.
Yes, I just declared this week as Data Science Education Week. As far as I know, I have no authority to do such a thing. Hey, I did it anyway. All week I will be posting information related to data science education. So, pull up a chair and get ready for some serious data science education topics.
Having trouble keeping track of what schools offer what courses for free online? Problem solved!
Class Central maintains a updated list of courses from Coursera(Stanford), Udacity, MITx, and others as they become available. Not all of the courses are related to data science, but I still thought it was valuable to share the link.
Check it out and start learning.