Tag Archives: big data

5 Data Science Research Papers to read in Summer 2017

In the past, the blog has included 7 Important Data Science Papers and 5 More Data Science Papers. Here is another list if you are looking for something to read over the summer.

Get me Sum ‘dat Big Data

I teach data science courses thoughout the US. I enjoying asking attendees why they are in class. I get many good answers, but occassionally I get some funny answers. Here is a story with one of the more humorous answers.

While chatting with an attendee before class, I asked why he chose to attend this class. Here was his answer.

Well, my boss attended a conference and heard a talk on Big Data. Then, he came back to the office and bought hadoop for some of our systems. Next he heard about this training and told me to attend. When preparing to leave, the boss said, “Get me sum ‘dat big data”.

After a slight chuckle from both of us, I mentioned we would talk more about that in class.

While this story is somewhat humorous, it is not all that uncommon. Companies want to start using data science, they often just do not know where to start. If you are looking for a starting point, check out this post, You Want Data Science, Now What?.

Do you have a funny “data science” or “big data” story? If so, please share in the comments.

A Couple of Current Data Science Competitions

Decoding Brain Signals

Microsoft has recently announced a machine learning competition platform. As part of the launch, one of the first competitions is the prediction of brain signals. It has $5000 in prizes, and submissions are accepted thru June 30, 2016.

Big Data Viz Challenge

Google and Tableau have teamed up to offer a big data visualization contest. The rules are fairly simple, just create an awesome visualization using at least the GDELT data set. Finalist will receive prizes worth over $5000 and even some will get tours of Tableau and Google facilities. The contest runs thru May 16, 2016.

The Human Face of Big Data – Debuts Tonight

The Human Face of Big Data makes its debut tonight, 2/24/2016, on PBS (if you are in the United States). Hopefully it will be available via the PBS website fairly soon.

Data Science and the Essential Terms

I was honored to be able to provide the data science introductory article for the Special Data issue of AL MAGNET magazine. The article is titled, Data Science and the Essential Terms. It provides a description of data science and an example workflow. It also points out some of the key terms in data science and what they mean. The closing describes why now is the time to learn data science.

The magazine is open-access, so you can freely read and share the article. Thank you to the AL MAGNET team for the invitation.

Yahoo Just Released a Huge Machine Learning Dataset

Yahoo just released a 1.5 TB dataset of “anonymized user interactions on the news feeds”. If you have been looking for a new dataset to analyze, this just might be it. It contains approximately 110 billion rows of data regarding user-news interactions. Happy data exploring!

Why Data Science? – Presentation

Recently, I was invited to speak about data science to the research department of a regional hospital system. I thought I would share my slides.

A clarification note on one of my quotes from the presentation,

“Data Science doesn’t need big data”

I am not trying to say big data is not important. I am just saying that lots of excellent data science can be performed on data that is not big data. So, don’t wait until you have big data before you start doing some data science.

For some reason, not all the links in the presentation are working. If you want to follow the links, go to SpeakerDeck and click “Download PDF”.

Learn Apache Spark this Summer with edX

edX has just announced a new series of Big Data courses. The series consists of 2 courses focused around Apache Spark. If you are not familiar with Spark, it is a very fast engine for large-scale data processing. It claims to perform up to 100 times faster than hadoop. Here are the 2 courses:

  1. Introduction to Big Data with Apache Spark
  2. Scalable Machine Learning

The first course starts June 1, 2015, and lasts four weeks. The second course starts in late June and lasts five weeks.

The courses are free but verifiable certificates can be purchased for $50 per course.

If you have been hoping to learn Spark, this might be just the opportunity your were waiting for.

Tomorrow is Data Innovation Day

Tomorrow, Jan 22, 2015, is Data Innovation Day 2015 and a free online conference will be held. A very strong group of speakers and panels are planned. The topics of the talks are:

  • Data For Public Good
  • OpenData
  • Internet of Things
  • Analytic Innovations
  • Startups

The conference is four hours from 12:00 PM – 4:00 PM EST on Jan. 22, 2015. Register now to attend the free conference.

Big Data is Better Data?

Kenneth Cukier, Data Editor of The Economist and co-author of Big Data: A Revolution That Will Transform How We Live, Work, and Think, gives a wonderful talk about data. It is well worth the 15 minutes.