Tag Archives: data

Zipfian Academy Launches New Fellowship and Data Engineering Programs

Zipfian Academy, the company that offers the 12-week immersive training for data science, has just announced 2 new programs.

  1. Data Fellows 6-Week Fellowship – A free and intensive program to fill in your knowledge gaps and match you up with a top company. Hurry, applications are due today (June 16, 2014).
  2. Data Engineering 12-Week Immersive – A 12-week program to prepare software engineers to handle big data.

See the original announcement on the Zipfian blog or attend an upcoming virtual information session.

Last week, I got the opportunity to visit the Zipfian Academy office and sit down with the team (Ryan, Jonathan, and Katie). The programs are going well, and they strongly believe in the immersive nature of their program. I would have to agree; the program appears to be working well and graduates have a 91% placement rate and an average starting salary of $115,000. They even referred to it as a new alternative to graduate school. The future of Zipfian is exciting as the team hinted at some plans in the coming months and years. Stay tuned to this blog or join the Zipfian mailing list for future information.

Here is a video of Ryan Orban, one of the Zipfian Academy cofounders, explaining the new programs.

Scientific Data: A new publisher of Data

Nature.com is starting a new publication titled, Scientific Data. The goal is to help researchers publish and discover data. The publication content is called a Data Descriptor. It describes the data, explains the data collection methods, lists the columns, and states other essential information about the dataset.

Unfortunately, the site does not host any of the data. I think it will be interesting to watch how a site like this develops. The publication is currently accepting submissions.

Data Gotham 2013 Videos

In case you do not live in New York City or you did not attend Data Gotham, do not worry because nearly all the videos and talks are posted on the Data Gotham 2013 Youtube page.

Below is a panel discussion on data and privacy.

Beyond Alphabet Soup: 5 Guidelines For Data Sharing

Beyond Alphabet Soup: 5 Guidelines For Data Sharing is an excellent post by Markets for Good and Palantir Technologies. Below are the 5 Guidelines.

  1. Release structured raw data others can use
  2. Make your data machine-readable
  3. Make your data human-readable
  4. Use an open-data format
  5. Release responsibly and plan ahead

See the full blog post with detailed descriptions here: Beyond Alphabet Soup: 5 Guidelines For Data Sharing

Mixpanel Data Conference 2013

Mixpanel is hosting Data Driven Conference 2013 on July 31, 2013. It is not exactly a conference, but more of an “fireside chat” with some the leading data technologists. The conference will take place on the afternoon of July 31, 2013, at the Mixpanel Headquarters in San Francisco. The cost of attendance is $100. The 3 speakers are:

  1. Max Levchin, Co-founder of Paypal
  2. Dave Morin, Co-founder of Path
  3. Aaron Levie, Co-founder of Box

Open Data Festival

Launching in the autumn of 2013, Open Data Festival will be hosting a global data festival. The details are quite vague at this point, but they are looking for volunteers, cities, and speakers. Feel free to sign up.

The festival is being organized by the same team that organizes Big Data Week.

R Commands for Cleaning Data

This post is notes from the Coursera Data Analysis Course.

Here are some R commands that might serve helpful for cleaning data.

String Replacement

  • sub() replace the first occurrence
  • gsub() replaces all occurrences

Quantitative Variables in Ranges

  • cut(data$col, seq(0,100, by=10)) breaks the data up by the range it falls into, in this example: whether the observation is between 0 and 10, 10 and 20, 20 and 30, and so on
  • cut2(data$col, g=6) return a factor variable with 6 groups
  • cut2(data$col, m=25) return a factor variable with at least 25 observations in each group

Manipulating Rows/Columns

  • merge() for combining data frames
  • sort() sorting an array
  • order(data$col, na.last=T) returns indexes for the ordered row
  • data[order(data$col, na.last=T),] reorders the entire data frame based upon the col
  • melt() in the reshape2 package, this is for reshaping data
  • rbind() adding more rows to a data frame

Obviously, these functions have other parameters to do a lot more. There are also a number of other helpful R functions, but these are enough to get you started. Check the R help (?functionname) for more details.

Pizza Delivery: A video Infographic

This is a video infographic about pizza delivery in Manhattan. This is another good way to make data tell a story.

Top 5 Data Startups

  1. Kaggle They make data science a sport, enough said.
  2. DataKind DataKind may not technically be a startup because it is a nonprofit, but they are doing cool stuff.  They match nonprofit organizations with people that love to analyze data and create visualizations.
  3. Cloudera They call themselves “The Platform for Big Data”.  They are working hard to make hadoop easier to use.
  4. Coursera  Coursera is an education startup, but with 2 Computer Science Professors as founders, you can bet they are crunching a lot of data about how people learn.
  5. BigML They are trying to make machine learning available to everyone.  Machine Learning as a Service!

Startup Showcase – How did I do?

Yesterday, I made some predictions about the startups I thought would win at the Strata Startup Showcase. Here are the winners.

So how did I do? Well, I got one of the winners correct. I selected Placed. Hopefully videos of the demos will be available. If I find them, I will post some of them to the blog.