As part of New York City Big Data Week, a startup showcase is being offered. It will consist of 14 startups. Each startup will get to give a quick demo/presentation. Then Tim O’Reilly and Fred Wilson will select 3 winners. Also, numerous investors and journalists will be present. A complete list of the startups presenting is available on the Startup Showcase page.
Which ones do you think will win?
Without seeing any of the presentations, here are my 3 picks.
This might be my darkhorse pick, but I think InfoActive has my vote. I also like Placed for location analytics, and TempoDB is very intriquing as it is simply a time-series database.
The 2012 edition of Hadoop World and Strata Conference is underway. The conference is in New York City and if you are not lucky enough to attend, then at least you can watch the live video feed.
Michael Koploy wrote 3 Secrets for Aspiring Data Scientists about what it takes to enter a career as a data scientist. He lays out 3 steps:
- Sharpen Your Scientific Saw – Hone your math and science skills
- Learn the Language of Business – Data Scientists need to explain the data in business terms
- Keep Adding to Your Technical Toolbelt – Learn all the tools you can (NoSQL, Excel, Hadoop,…)
The article is a nice read. http://blog.softwareadvice.com/articles/bi/3-career-secrets-for-data-scientists-1101712/
Think Bayes by Allen B. Downey is another free book available from Green Tree Press. Allen B. Downey is a computer science professor at Olin College. The book is currently available in PDF or HTML. The book is not yet complete, so it may contain some errors.
A nice short and sweet video about what a data scientist is. Josh Wills of Cloudera defines a data scientist as follows:
Person who is better at statistics than any software engineer and better at software engineering than any statistician.
I would say that definition is pretty good.
Michael Cutler, cofounder of TUMRA, gave a nice talk to the University of Oxford Computer Science Department. The following quote from his talk sums up his idea.
Given a choice between a “best guess” now, and a “marginally better” answer later, I’d take the best guess every time.
Many times, academic people focus a lot of attention on improving the accuracy of an algorithm, when the resulting solution is too slow for industrial purposes.
reference: TUMRA Blog
This is a great write-up that covers some of the very basics of social network analysis. Some of the topics are:
- Nodal Degree
An Introduction to Social Network Analysis – Data Science Central.