The University of Waikato in New Zealand will be offering a free online course titled, Data Mining with Weka.
Weka is a widely-used toolkit for data mining and machine learning. The University of Waikato developed the toolkit.
Don’t wait too long to sign up, the course starts September 9, 2013.
Here is a video of the instructor of the course providing a brief overview.
It is back-to-school time, and here are some papers to keep you busy this school year. All the papers are free. This list is far from exhaustive, but these are some important papers in data science and big data.
- PageRank – This is the paper that explains the algorithm behind Google search.
- MapReduce – This paper explains a programming model for processing large datasets. In particular, it is the programming model used in hadoop.
- Google File System – Part of hadoop is HDFS. HDFS is an open-source version of the distributed file system explained in this paper.
These are 2 of the papers that drove/started the NoSQL debate. Each paper describes a different type of storage system intended to be massively scabable.
- Random Forests – One of the most popular machine learning techniques. It is heavily used in Kaggle competitions, even by the winners.
Are there any other papers you feel should be on the list?
The University of Washington is developing wireless devices that can operate without a battery. The devices operate by reflecting radio waves in the air. Although I can think of many uses for these devices, the article points out one in particular.
For example, sensors placed in a bridge could monitor the health of the concrete and steel, then send an alert if one of the sensors picks up a hairline crack.
After reading that, I was struck at the amount of data that could collected. Just think of all the bridges in your city/state/country. All this data is going to need analysis. Which alerts require immediate action? This sounds like a bigdata problem to me.
How can you imagine these devices being used?
Alteryx is offering the book, Big Data Analytics For Dummies, for free. If you are new to the term big data, this book provides a brief (about 40 pages) overview of the topic and what big data should be able to do for your company.
You have to register, but it is worth it for the free book.
I recently received the following information.
International School of Engineering is announcing their 3rd batch of live e-Learning certificate programs starting 4-Sep-2013 in “Engineering Big Data with R and Hadoop Ecosystem” and “Essentials of Applied Predictive Analytics” (http://goo.gl/kHckP).
These programs helped Engineers and Managers transform into Hadoop Developers/Data Scientists, get industry certifications, revolutionize their workspace and establish exciting careers.
•Taught by experts who are Carnegie Mellon, Johns Hopkins and Stanford University’s alumni with Fortune 50 experience
•Applied and interactive classes
•Classes ranked among the top 1% and 5% of all classes in the world in piazza
•1/3rd the cost of other similar programs
•95% Success with Cloudera and EMC2
For details visit http://goo.gl/bPJEF
For any queries mail us at firstname.lastname@example.org or call us at +91 9502334561/2/3
Three of the Top Data Scientists have recently changed jobs.
Do you ever have a regular expression you want to verify? If so, here are 3 quick sites to help you do that.
Regular Expression Editors Online
If you are unsure what regular expressions are, see Regular-Expressions.info for a tutorial.
Last week I posted about Coursera’s Introduction to Recommender Systems course. Well, I believe it is the first MOOC on the topic, but there are other material available online.
If you know of any other great recommender systems resources online, please feel free to share them.
Many of the top data scientists you will read about or hear speak have PhD degrees. Therefore, many people think a PhD is a requirement for becoming a data scientist. That is completely not true. There is a lot of work in the data science field that does not require a PhD. In all actuality, there is not a lot of data science work that does require a PhD.
What is a PhD and why would a person get one? A PhD degree is a research degree that usually takes between two and five years of study beyond a master’s degree. The majority of the program will be focused on researching and expanding upon a very specific topic. A PhD student will push the edge of known human knowledge.
In daily tasks, most data scientists do not go that far and do not need a PhD. Most of the necessary skills can be obtained at the bachelors or masters level. Combine that education with the amazing tools available and some experience and being a data scientist is definitely achievable.
The reasons many data scientists have PhD degrees are because of the curiosity and love for learning. Those are essential traits of both a data scientists and PhD students. However, you can be curious and love learning without attending enough school to obtain a PhD.
All of this is not to say that earning a PhD is bad. If you really love learning, thrive in the academic environment, and have the desire; then definitely go for the PhD. However, do not let a lack of a PhD stop you from doing data science.