Tag Archives: bootcamp

3 Questions with Metis: A Data Science Bootcamp

Do you need to build your skills to land that dream job as a data scientist? Well, Metis Data Science Bootcamp in New York City offers a 12 week program to do just that. The program sounds fantastic with a few of the highlights being: help landing a job, great guest speakers, no PhD required, and a strong curriculum. Tarlin Ray, Head of Admissions at Metis, was kind enough to answer 3 questions about the bootcamp.

If you are interested in applying, the next bootcamp starts in April and the deadline for early admission is next Monday, February 16, 2015.

metis bootcamp
Metis data science bootcamp in NYC

Can you describe a student who would be successful in the program?

If students do not have the technical skills, then they will struggle in the Metis Data Science bootcamp. Students must have some programming background and experience with statistics in order to get the most out of the bootcamp. We do not set a minimum score for what that means, rather we use our application, coding challenge and interview to asses someone’s skill level. The technical skills are definitely a determinant of success but really serve as one piece of the equation. Students with strong verbal and written communication skills will be very successful in the program. A student must be adept at listening to instructors, fellow students and even speakers in the program. This skill will enable the student, as a data scientist, to get to the heart of an issue and to create a question or set of questions to help uncover insights. Students should have an innate curiosity. The thirst for the answer (1+1=2) is often only half of what a data scientist is tasked to do. The other half is the search for the why something is the way it is. Students should not be afraid to tap into their creative sides. The output for many data science projects is a visual representation of numbers, statistics and business insight. Students need to feel comfortable with a blank canvas and muddy data with the goal of creating a new work of art, digestible by a wide audience. Lastly, a student should have a ton of grit. There should be no give up in the student. They should be comfortable with the unknown, googling for answers and just figuring things out. The life of a data scientist is one of a life long learner. Successful students will have this quality. (for a more in depth answer to this question please check out a blog post written by Laurie Skelly, co-creator of Metis Data Science Curriculum and Data Scientist at Datascope Analytics.)

What sets your program apart from other data science bootcamps?

Our bootcamp is focused on providing accessibility, practical training and creating pathways to a career as a data scientist.

Accessibility: There are bootcamps in the market that will only accepts students coming out of a PhD program. We fundamentally believe that there is a much bigger population of students that have the skills and the ability to become data scientist. Over our first 2 cohorts we had 44% bachelors, 36% Masters, 17% PhD and 3% with Professional degrees. If you have the skills, we can help you meet your data science goals.

Practical Training: The program is hyper-focused on exposing students to what it takes to design and deliver a data science project. This means teaching people how to think like a data scientist, which starts with how to ask the right questions that will drive business value. Once students know what question they’re trying to answer, only then can they use the many technologies, algorithms and tools that we teach in the bootcamp to find the data, clean the data, analyze the data, and communicate the data. In our bootcamp, students go through this design-thinking process five separate times, creating a portfolio of five distinct projects, all using real data, that they can share with prospective employers. In addition to the project based approach, we also know that Data Scientists do not work alone and that collaboration is key to transitioning into a role as a data scientist. We ensure this is embedded in the 12 week curriculum.

Pathways to a Career: see next question

How does your program prepare a student for life after the bootcamp?

Pathways to a Career: The whole Metis organization focuses on the ultimate outcome, helping students secure a job. In order to provide guidance, Metis has a full time Talent Placement team that works with the students even prior to the start of the program. In parallel with the Bootcamp curriculum the Talent placement team has a career curriculum to help students prepare for their job search. In addition to on going support Metis is very committed to bringing in speakers from the industry to expose students to various roles and companies. Last Metis holds a Career Day at the end of the program to allow for students to present their final projects and to engage with hiring partners. The Metis talent placement team maintains contact with the graduates beyond the program up until the students are placed.

Data Sources for Cool Data Science Projects: Part 2 – Guest Post


I am excited for the first ever guest posts on the Data Science 101 blog. Dr. Michael Li, Executive Director of The Data Incubator in New York City, is providing 2 great posts (see Part 1) about finding data for your next data science project.

At The Data Incubator, we run a free six week data science fellowship to help our Fellows land industry jobs. Our hiring partners love considering Fellows who don’t mind getting their hands dirty with data. That’s why our Fellows work on cool capstone projects that showcase those skills. One of the biggest obstacles to successful projects has been getting access to interesting data. Here are some more cool public data sources you can use for your next project:

Data With a Cause:

  1. Environmental Data: Data on household energy usage is available as well as NASA Climate Data.
  2. Medical and biological Data: You can get anything from anonymous medical records, to remote sensor reading for individuals, to data of the Genomes of 1000 individuals.

Miscellaneous:

  1. Geo Data: Try looking at these Yelp Datasets for venues near major universities and one for major cities in the Southwest. The Foursquare API is another good source. Open Street Map has open data on venues as well.
  2. Twitter Data: you can get access to Twitter Data used for sentiment analysis, network Twitter Data, social Twitter data, on top of their API.
  3. Games Data: Datasets for games, including a large dataset of Poker hands, dataset of online Domion Games, and datasets of Chess Games are available.
  4. Web Usage Data: Web usage data is a common dataset that companies look at to understand engagement. Available datasets include Anonymous usage data for MSNBC, Amazon purchase history (also anonymized), and Wikipedia traffic.

Metasources: these are great sources for other web pages.

  1. Stanford Network Data: http://snap.stanford.edu/index.html
  2. Every year, the ACM holds a competition for machine learning called the KDD Cup. Their data is available online.
  3. UCI maintains archives of data for machine learning.
  4. US Census Data
  5. Amazon is hosting Public Datasets on s3
  6. Kaggle hosts machine-learning challenges and many of their datasets are publicly available
  7. The cities of Chicago, New York, Washington DC, and SF maintain public data warehouses.
  8. Yahoo maintains a lot of data on its web properties which can be obtained by writing them.
  9. BigML is a blog that maintains a list of public datasets for the machine learning community.
  10. Finally, if there’s a website with data you are interested in, crawl for it!

 

While building your own project cannot replicate the experience of fellowship at The Data Incubator (our fellows get amazing access to hiring managers and access to nonpublic data sources) we hope this will get you excited about working in data science. And when you are ready, you can apply to be a Fellow!

Got any more data sources? Let us know or leave a comment and we’ll add them to the list!

 

Additional Sources (added via comments since the post was published)

Data Sources for Cool Data Science Projects: Part 1 – Guest Post


I am excited for the first ever guest posts on the Data Science 101 blog. Dr. Michael Li, Executive Director of The Data Incubator in New York City, is providing 2 great posts (see Part 2) about finding data for your next data science project.

At The Data Incubator, we run a free six week data science fellowship to help our Fellows land industry jobs. Our hiring partners love considering Fellows who don’t mind getting their hands dirty with data. That’s why our Fellows work on cool capstone projects that showcase those skills. One of the biggest obstacles to successful projects has been getting access to interesting data. Here are a few cool public data sources you can use for your next project:

Economic Data:

  1. Publically Traded Market Data: Quandl is an amazing source of finance data. Google Finance and Yahoo Finance are additional good sources of data. Corporate filings with the SEC are available on Edgar.
  2. Housing Price Data: You can use the Trulia API or the Zillow API.
  3. Lending data: You can find student loan defaults by university and the complete collection of peer-to-peer loans from Lending Club and Prosper, the two largest platforms in the space.
  4. Home mortgage data: There is data made available by the Home Mortgage Disclosure Act and there’s a lot of data from the Federal Housing Finance Agency available here.

Content Data:

  1. Review Content: You can get reviews of restaurant and physical venues from Foursquare and Yelp (see geodata). Amazon has a large repository of Product Reviews. Beer reviews from Beer Advocate can be found here. Rotten Tomatoes Movie Reviews are available from Kaggle.
  2. Web Content: Looking for web content? Wikipedia provides dumps of their articles. Common Crawl has a large corpus of the internet available. ArXiv maintains all their data available via Bulk Download from AWS S3. Want to know which URLs are malicious? There’s a dataset for that. Music data is available from the Million Songs Database. You can analyze the Q&A patterns on sites like Stack Exchange (including Stack Overflow).
  3. Media Data: There’s open annotated articles form the New York Times, Reuters Dataset, and GDELT project (a consolidation of many different news sources). Google Books has published NGrams for books going back to past 1800.
  4. Communications Data: There’s access to public messages of the Apache Software Foundation and communications amongst former execs Enron

Government Data:

  1. Municipal Data: Crime Data is available for City of Chicago, and Washington DC. Restaurant Inspection Data is available for Chicago and New York City.
  2. Transportation Data: NYC Taxi Trips in 2013 are available courtesy of the Freedom of Information Act. There’s bikesharing data from NYC, Washington DC, and SF. There’s also Flight Delay Data from the FAA
  3. Census Data: Japanese Census Data. US Census data from 2010, 2000, 1990. From census data, the government has also derived time use data. EU Census Data. Checkout popular male / female baby names going back to the 19th Century from the Social Security Administration.
  4. World Bank: they have a lot of data available on their website.
  5. Election Data: Political contribution data for the last few US elections can be downloaded from the FEC here and here. Polling data is available from Real Clear Politics.

 

While building your own project cannot replicate the experience of fellowship at The Data Incubator (our fellows get amazing access to hiring managers and access to nonpublic data sources) we hope this will get you excited about working in data science. And when you are ready, you can apply to be a Fellow!

Got any more data sources? Let us know or leave a comment and we’ll add them to the list!

 

Data Science Bootcamps

The list of Data Science Bootcamps is now live at http://datascience.community/bootcamps

The list currently contains 11 programs. The programs range from full-time 12 week programs to part-time online training.

Data Science is one field that has definitely adopted the newer, innovative forms of learning. MOOCs are full of data science related courses and the List of Data Science Bootcamps definitely shows the variety of new techniques being used. For example, Zipfian Academy uses a 12-week immersive program to train students and work on projects together. Insight, Persontyle, and The Data Incubator focus on filling in the gaps of recent PhDs, and other programs such as Statistics.com and Leada are focusing on online programs. Leada will be an interesting program to watch in the coming months and years. The program is definitely different and could be a game-changer if it continues to grow.

Go see the full list of Data Science Bootcamps.

As always, if you know of another program that is not on the list, feel free to leave a comment.

Got a PhD? Want to spend 6 weeks in NYC to become a Data Scientist?

The Data Incubator in New York City is a new data science bootcamp.

  • The program is run by Michael Li, a Princeton PhD and former Foursquare data scientist
  • Located in New York City
  • Tuition is Free
  • Help with job placement
  • Session runs from January 5, 2015 to February 13, 2015

Don’t delay! It is highly recommended to apply to The Data Incubator before the early deadline of October 7, 2014.

UC Berkeley Big Data Bootcamp

The University of California at Berkeley is hosting AMP Camp, the Big Data Bootcamp, starting today. The conference is sold out for in-person attendace, but registration is free and live streaming is available. The agenda looks good (including machine learning, parallel programming, Mesos, and hands-on exercises), so this might be a good opportunity for some learning.

Thanks to Mark Nickel for the link.