- Spark: Cluster Computing with Working Sets
- Ten Simple Rules for Effective Statistical Practice
- Probabilistic machine learning and artificial intelligence
- A survey on platforms for big data analytics
- Artificial Neural Network Computation on Graphic Process Unit
I teach data science courses thoughout the US. I enjoying asking attendees why they are in class. I get many good answers, but occassionally I get some funny answers. Here is a story with one of the more humorous answers.
While chatting with an attendee before class, I asked why he chose to attend this class. Here was his answer.
Well, my boss attended a conference and heard a talk on Big Data. Then, he came back to the office and bought hadoop for some of our systems. Next he heard about this training and told me to attend. When preparing to leave, the boss said, “Get me sum ‘dat big data”.
After a slight chuckle from both of us, I mentioned we would talk more about that in class.
While this story is somewhat humorous, it is not all that uncommon. Companies want to start using data science, they often just do not know where to start. If you are looking for a starting point, check out this post, You Want Data Science, Now What?.
Do you have a funny “data science” or “big data” story? If so, please share in the comments.
Microsoft has recently announced a machine learning competition platform. As part of the launch, one of the first competitions is the prediction of brain signals. It has $5000 in prizes, and submissions are accepted thru June 30, 2016.
Google and Tableau have teamed up to offer a big data visualization contest. The rules are fairly simple, just create an awesome visualization using at least the GDELT data set. Finalist will receive prizes worth over $5000 and even some will get tours of Tableau and Google facilities. The contest runs thru May 16, 2016.
I was honored to be able to provide the data science introductory article for the Special Data issue of AL MAGNET magazine. The article is titled, Data Science and the Essential Terms. It provides a description of data science and an example workflow. It also points out some of the key terms in data science and what they mean. The closing describes why now is the time to learn data science.
The magazine is open-access, so you can freely read and share the article. Thank you to the AL MAGNET team for the invitation.
Yahoo just released a 1.5 TB dataset of “anonymized user interactions on the news feeds”. If you have been looking for a new dataset to analyze, this just might be it. It contains approximately 110 billion rows of data regarding user-news interactions. Happy data exploring!
Recently, I was invited to speak about data science to the research department of a regional hospital system. I thought I would share my slides.
A clarification note on one of my quotes from the presentation,
“Data Science doesn’t need big data”
I am not trying to say big data is not important. I am just saying that lots of excellent data science can be performed on data that is not big data. So, don’t wait until you have big data before you start doing some data science.
For some reason, not all the links in the presentation are working. If you want to follow the links, go to SpeakerDeck and click “Download PDF”.