Tag Archives: kaggle

Go After Your Data Science Dreams – Demystify Data Science Presentation 2017

Stick around for the Question and Answers at the end of the presentation where I talk about starting your own blog or entering Kaggle contests.

All of the other talks were great as well. You can register here Demystify Data Science Conference 2017 by Metis to get free access to all the other video presentations.

Recent Resources for Open Data

Recently, a number of resources for publicly available datasets have been announced.

  • Kaggle becomes the place for Open Data – I think this is big news! Kaggle just announced Kaggle Datasets which aims to be a repository for publicly available datasets. This is great for organizations that want to release data, but do not necessarily want the overhead of running an open data portal. Hopefully it will gain some traction and become an exceptional resource for open data.
  • NASA Opens Research – NASA just announced all research papers funded by NASA will be publicly available. It appears the research articles will all be available at PubMed Central, and the data available at NASA’s Data Portal.
  • Google Robotics Data – Google continues to do interesting things, and this topic is definitely that. It is a dataset about how robots grasp objects (Google Brain Robot Data). I am not overly familiar with this topic, so if you want to know more, see their blog post, Deep Learning for Robots.

For more options of open data, see Data Sources for Cool Data Science Projects Part 1 and Part 2.

Are you aware of any other resources that have been recently announced? If so, please leave a comment.

National Data Science Bowl

Kaggle and Booz | Allen | Hamilton have just launched the National Data Science Bowl. It is a data science competition hosted at Kaggle.

If you are interested in getting started, a tutorial is available in iPython format. Best of Luck!

10 Great R Packages

These slides are targeted at Kaggle competitions, but the R packages can be helpful to anyone using R for data analysis. The slides were created by Xavier Conort, a winner of multiple competitions.

Top 5 Places to Get a Data Scientist job

  1. LinkedIn They turn data into products better than anyone else.
  2. Facebook If you are the type of person that loves to analyze people’s lives, there is no better place.
  3. Twitter Duh, It’s Twitter. lots of data and lots of possibilities
  4. Cloudera Cloudera is a successful Hadoop-based startup. Build tools and explore huge datasets for a variety of industries.
  5. Kaggle If optimizing algorithms and really diving into the data to get every last ounce of information is your thing, then Kaggle is it. Plus, there is nowhere else you will get to work on so many important problems in such a wide range of domains. Unfortunately, Kaggle is not currently hiring any data scientists, but they most likely will be seeking more in the future.

There are many other companies hiring data scientists. Where would you like to be a data scientist?

Top 5 Data Startups

  1. Kaggle They make data science a sport, enough said.
  2. DataKind DataKind may not technically be a startup because it is a nonprofit, but they are doing cool stuff.  They match nonprofit organizations with people that love to analyze data and create visualizations.
  3. Cloudera They call themselves “The Platform for Big Data”.  They are working hard to make hadoop easier to use.
  4. Coursera  Coursera is an education startup, but with 2 Computer Science Professors as founders, you can bet they are crunching a lot of data about how people learn.
  5. BigML They are trying to make machine learning available to everyone.  Machine Learning as a Service!

blog.untrod.com: Engineering Practices in Data Science

This is a great post by Chris Clark of Kaggle. It explains some of the primary differences among engineers and statisticians.
Both groups have something to learn from each other.

blog.untrod.com: Engineering Practices in Data Science.

Map of Kaggle Submissions

See this interactive map of Kaggle Submissions. The map is a nice example of data visualization. The data is much easier to see on a map than in a data table. Nice work by Ramzi Ramey of Kaggle.

How To Learn Data Science? Part 2

Yesterday, I posted about some traditional strategies to acquire data science skills. Today, I will post a nontraditional strategy.

Internet Based

There is hoards of data science information available on the internet for free. With enough personal motivation, a person could learn all the skills necessary for free (or cheap) online. Coursera is probably a great place to start. There are also other good sites such as Udacity, the Kaggle Wiki, other blogs and websites.

The problem with this approach is knowing exactly what to learn. A course in machine learning is great, but data science is more than just machine learning. How do you know what to learn? It would be really nice to have a collection of data science topics and the associated online training materials.

Would this strategy work for you?

Kaggle Launches New Products

If you follow the blog, you probably know I am a big fan of Kaggle. Just last week, they announced the launch of 2 new products.

  1. Kaggle Recruit In this competition, the participants are not competing for a cash prize but rather a job interview with a specific company. Currently, Facebook is hosting the first such competition.
  2. Kaggle Prospect In this competition, the participants are trying to come up with the best question to ask. Participants are presented with various related datasets, and the goal is to find which data science question should be asked of the data. The winner gets a small cash prize, and the winning question becomes a regular kaggle competition.

What do you think? Are you excited to try out these new competitions?