A nice, short, 2 minute video from edCetra Training with some good facts about big data and data analysis.
- The digital universe is 10 times the size it was in 2006
- Greater literacy and cloud computing are helping fuel big data
- 80% of companies data is unstructured – difficult to analyze
- Employees spend 2 hours per day searching for the right information
The Coursera Probabilistic Graphical Models course officially starts today. Sign up and start learning.
Google with Incorrect Spelling
Just the other day, I was googling for “strata conference” information. I noticed that I had mistakenly typed “starta” instead of “strata“. I proceeded to backspace the incorrect letters and fix the spelling. Later in the day, I also noticed that I frequently mistype the letters “ar” and “ra”. That got me thinking.
Does Google Know How Poorly I Spell?
Since Google Instant was released in 2010, Google is now able to track every keystroke I type into the Google Search box. Thus, Google will know when I hit the backspace key. Using some data analysis, Google should be able to answer the following questions:
- What words are most commonly misspelled in Google searches? I would guess the answer would be a good indicator of the most commonly misspelled words in general.
- What words do I misspell the most often?
- How many letters get typed after the misspelling?
- What percentage of Google searches are completed without a backspace?
- Do people in certain parts of the world/country have better spelling?
- How often do people backspace a correctly spelled word, just to then spell it incorrectly? This could be amusing. I would also like to know what words.
Misspelling Don’t Matter
As it turns out, the misspellings don’t really matter that much. Google is smart enough to fix many spelling errors.
What other spelling questions could Google answer?
This posts provides a nice quick overview of 6 machine learning algorithms.
- Decision Trees
- Linear Regression
- Neural Networks
- Bayesian Networks
- Support Vector Machines (SVMs)
- Nearest Neighbor
The tag line for Kaggle is “We’re making data science a sport.” They have successfully created a way to turn data science into a competition. It is both fun, and it yields excellent results. There is also a portion of the site dedicated for classroom use. It is called Kaggle in Class.
Here is how it works. A company that needs some data analyzed can contact Kaggle and host a competition. Then data scientists all over the world can compete to find the best solution. The company benefits from having many algorithms and techniques applied to the same data set. Many more algorithms are applied than what the company could run without Kaggle. The contestants benefit from networking, pre-cleaned data, and learning from others. It is a win/win situation. Plus, the winner gets prize money.
Currently, the large featured competition is the Heritage Health Prize. It is a $3,000,000 competition to identify individuals that will be admitted to the hospital in the next year. The contest lasts until April 2013.
This is definitely a site I want to be involved with in the future. I just wish they could make it a spectator sport as well.
Yesterday, Visual.ly launched a new way of creating infographics. It claims to be able to “create free customizable infographics in seconds.” I have not tried it yet, but it is on my todo list.
The Coursera Natural Language Processing course officially starts today. Sign up and start learning.
Data Scientist is the hot new job for 2012. Does this job really exist? Who hires these people? Are companies currently hiring? The answers are: yes, lots of companies, and yes. I decided to spend last night looking for companies that are currently hiring data scientists. It did not take long to compile a pretty good list.
Data Scientist Job Openings
|Microsoft||Redmond, WA||Microsoft Sr. Data Scientist|
|Netflix||Los Gatos, CA||NetFlix Senior Data Scientist|
|Kaggle||San Francisco, CA||Kaggle Data Scientist|
|Greenplum||San Mateo, CA||Greenplum Data Scientist|
|Last.fm||London||Last.fm Data Scientist|
|Rackspace||San Antonio, TX||Rackspace Data Scientist|
|Amazon||Seattle, WA||Amazon Data Scientist/System Architect|
|Menlo Park, CA||Facebook Data Scientist|
|San Francisco, CA||Twitter Data Scientist|
|Mountain View, CA||LinkedIn Data Scientist|
|Cobalt/ADP||Cambridge, MA||Cobalt Data Scientist|
|Ebay/Paypal||San Jose, CA||Paypal Data Scientist|
|Bunchball||San Jose, CA||Bunchball Data Scientist|
|A9||Palo Alto, CA||Principal Engineer/Data Scientist|
|Acxiom||Little Rock, AR||Acxiom Data Scientist|
|Trulia||San Francisco, CA||Trulia Data Scientist – Data Science Lab|
Do you know of any other companies hiring Data Scientists right now?
Statistics – This is a topic that could use some more attention from the online community.
I would love to see Stanford (or Coursera) offer a free statistics course online much like the other free courses online.
I did find a series of Youtube videos by Daniel Judge, a Professor in the East Los Angeles College Mathematics Department. The videos start at the very beginning of statistics. I have watched a couple of the videos, and they seem quite good. Daniel does a nice job of explaining the information. Here is the first video in the series.
Stay tuned to the blog in case other stats options appear online. Also, please leave a comment if you know of some good online statistics resources.