The University of California at Berkeley is hosting AMP Camp, the Big Data Bootcamp, starting today. The conference is sold out for in-person attendace, but registration is free and live streaming is available. The agenda looks good (including machine learning, parallel programming, Mesos, and hands-on exercises), so this might be a good opportunity for some learning.
Thanks to Mark Nickel for the link.
During the Spring 2012, Alex Smola taught a course at Berkeley on Scalable Machine Learning. Alex is an Adjunct Professor at the University of California at Berkeley and a Visiting Scientist at Google.
Alex was kind enough to put all the course materials on the internet. That includes papers, slides, links, and video lectures. Like the title suggests, the course appears to focus on large-scale machine learning. Below is one of the lectures from the Statistics portion of the course.
A while back, Strata hosted a web conference titled Data in Motion. The slides and audio are now available online. The conference is focused on unique applications of data used for movement. Examples are: trains, aerospace, and even car racing. The first talk on formula one car racing was fascinating. I had never thought about the amount of data analysis that goes into racing.
Recently, I read an article titled, Why Online Education Won’t Replace College–Yet. The article is most likely a response the recent success of Massive Open Online Courses (MOOCs) such as those offered by Udacity and Coursera. The author, David Youngberg an Assistant Professor of Economics, presents 5 reasons why online education won’t replace college. I disagree with his reasons, so I thought I should share more details. I will go through each of his 5 reasons.
- It’s too easy to cheat. Cheating has always been an issue in education, and I think it always will be. Students even manage to cheat in ivory tower institutions. Online Colleges such as University of Phoenix have been very successful and cheating can easily exist in that scenario. I do think online classes make cheating easy, but I don’t really see that stopping the success of online education.
- Star students can’t shine. This is just simply not true. The star students are the ones answering questions in the forums and getting the assignments done first. This is very similar to the star students in a regular college setting. The brightest students have their work done first and are frequently found helping their peers. Udacity has even hired one of the former star students.
- Employers avoid weird people. Just because a person takes an online course does not make him/her weird. Taking an online course means a person is willing to find cheaper and easier ways to solve old problems. It also means the person has the initiative to go out and complete something. All of those traits are attractive to companies. The problem here is credentials. MOOCs have not yet solved the credential problem. MOOCs don’t offer degrees or widely-acknowledged certifications yet. Companies want to hire people with degrees, not people with a piece of paper stating “I completed an online course.” I think MOOCs will quickly figure out this problem. Also, many of the Coursera and Udacity students are former college graduates. Why are they now weird for taking an online course?
- Computers can’t grade everything. Not so fast. Earlier this year, Kaggle and the Hewlett Foundation sponsored a competition to see if technology could be created to automatically grade standardized test essays. Well, the competition was a big success. See the full press release. The competition results will probably not generalize to all essays, but the technology to automatically grades papers is not that far away. Also, Coursera is experimenting with crowd-sourced grading of papers. One student grades the papers of 4 unknown classmates, then a final score is calculated by a computer. See the Peer Assessments section on the Coursera website. This technique may even be more effective than grading by a single highly-trained person.
- Money can substitute for ability. The author argued that students will pay for tutors, buy dishwashers or anything else to help get better grades. I do not think banks are going to start handing out loans for dishwashers, so students can have more time for homework. I think MOOCs will allow students to learn without building massive amounts of debt.
Now, I cannot say with certainty whether or not MOOCs will replace traditional colleges. I just did not believe the above reasons are what will determine the outcome.
On a side note, this blog is focused on material about learning to become a data scientist. I think MOOCs are going to be hugely helpful for people wishing to obtain data science skills.
See this interactive map of Kaggle Submissions. The map is a nice example of data visualization. The data is much easier to see on a map than in a data table. Nice work by Ramzi Ramey of Kaggle.
Here is a nice infographic about some challenges of big data. It covers the problems that organizations face when dealing with the “three Vs” of big data.