After numerous semesters of teaching an introductory course on machine learning, Max Welling, Professor at University of California Irvine, decided to compile an introductory textbook titled, A First Encounter with Machine Learning (PDF link).
- BigML – A great interface. Just upload your data and it shows basic information for each column such as a histogram and mean values. See the Gallery for some examples of the final models.
- Wise.io – Just launched, but it looks to be a serious contender. It was started by a team from UC Berkeley.
- Precog – Taking a slightly different approach, Precog is both a platform and an online IDE for data science. The IDE supports Quirrel, hyped as R for big data.
- Ersatz – Ersatz is currently in private beta, but they are building a web platform for building deep neural networks.
- Predictron Labs – Cloud-based predictive analytics platform
Am I missing any startups on this list?
While not really startups, the following 2 links might also fit here.
In just the past month, a couple of great resources for learning python have been created.
- Getting started with Python: Tips, Tools and Resources – If you are new to python, this is a great place to start. It contains a brief description and links to books, tutorials, and MOOCs.
- Getting Started With Python for Data Scientists – This focuses more on tools specifically for data science.
Combined together, the previous links should provide a person all the resources necessary to begin doing some data science with the python language.
This is a great article and list of topics to remember when working on big data projects. Here is the list.
- Gather business requirements before gathering data
- Implementing big data is a business decision not IT
- Use Agile and Iterative Approach to Implementation
- Evaluate data requirements
- Ease skills shortage with standards and governance
- Optimize knowledge transfer with a center of excellence
- Embrace and plan your sandbox for prototype and performance
- Align with the cloud operating model
- Associate big data with enterprise data
- Embed analytics and decision-making using intelligence into operational workflow/routine
See the original article, 10 Big Data Implementation Best Practices, for details.
This post is notes from the Coursera Data Analysis Course.
Here are some R commands that might serve helpful for cleaning data.
- sub() replace the first occurrence
- gsub() replaces all occurrences
Quantitative Variables in Ranges
- cut(data$col, seq(0,100, by=10)) breaks the data up by the range it falls into, in this example: whether the observation is between 0 and 10, 10 and 20, 20 and 30, and so on
- cut2(data$col, g=6) return a factor variable with 6 groups
- cut2(data$col, m=25) return a factor variable with at least 25 observations in each group
- merge() for combining data frames
- sort() sorting an array
- order(data$col, na.last=T) returns indexes for the ordered row
- data[order(data$col, na.last=T),] reorders the entire data frame based upon the col
- melt() in the reshape2 package, this is for reshaping data
- rbind() adding more rows to a data frame
Obviously, these functions have other parameters to do a lot more. There are also a number of other helpful R functions, but these are enough to get you started. Check the R help (?functionname) for more details.
Here are a few of the recent Strata videos I would recommend.
- Video Games: The Biggest Big Data Challenge – Video Games generate Big Data
- Big Data on Small Devices: Data Science goes Mobile – How data can help build better mobile apps
- Distributed Environmental Data: On the Ground at the Data Sensing Lab – using sensors to track people at a conference, hard to explain and very interesting so just watch the video
The University of Chicago and Argonne National Labs are hosting Data Science for Social Good Summer Fellowship 2013. The Fellowship program is open to students at all levels whom are interested in working on real-world social problems. The program takes place in Chicago and the application deadline is April 1, 2013, so apply soon.
About a week ago I posted a link to a free data mining textbook. Hacker News got wind of the book as well, and I am guessing a flood of traffic hit the textbook’s site. The flood happened to take the site completely down for a couple of days. It was a shame because the book is really good.
If you frequently read this blog, you will notice it has quite a number of links to free online textbooks. Each free online textbook is available a bit differently. Most are PDF downloads (either by chapter or the entire book) hosted at some person’s personal website or somewhere on a university’s website.
Here is my question. Does the web have a publishing platform for textbooks? Is there a startup working on something like this?
I am aware of wikibooks, but I just don’t hear much about the quality of the books. As a matter of fact, I just don’t hear much about wikibooks.