Learn Apache Spark this Summer with edX

edX has just announced a new series of Big Data courses. The series consists of 2 courses focused around Apache Spark. If you are not familiar with Spark, it is a very fast engine for large-scale data processing. It claims to perform up to 100 times faster than hadoop. Here are the 2 courses:

  1. Introduction to Big Data with Apache Spark
  2. Scalable Machine Learning

The first course starts June 1, 2015, and lasts four weeks. The second course starts in late June and lasts five weeks.

The courses are free but verifiable certificates can be purchased for $50 per course.

If you have been hoping to learn Spark, this might be just the opportunity your were waiting for.

Scoring A Software Development Organization With A Single Number

I just finished my PhD in the Computational Science and Statistics program at South Dakota State University. My dissertation focused on the area of software analytics, sometimes called Data-Driven Software Engineering. Specifically, how does a Software Development Organization evaluate itself? Students have a G.P.A. (Grade Point Average), but organizations do not have a similar evaluation method.

The dissertation introduces the C.R.I. (Cumulative Result Indicator) to provide a single number to evaluate the performance of a software development organization. The C.R.I. focuses on 5 primary elements of a Software Development Organization.

  1. Quality
  2. Availability
  3. Satisfaction
  4. Schedule
  5. Requirements

C.R.I. demonstrates what data needs to be calculated, and how that data can be used to create a score. Naturally, this solution will not work in every situation, but it does provide a consistent method for evaluation, and it is flexible to allow only some of the elements or even additional elements.

There is the brief 1-minute overview of the dissertation. Feel free to read more of the details in the document below.

The source and data files are available on Github, Dissertation Scoring SDO.

You can also see results of the analysis on Sense, Scoring an SDO.

This is the first in a series of posts on Data-Driven Software Engineering. In the next few weeks, I will be posting more about the topic. Some posts will be excerpts from the dissertation, and others will be new thoughts on the topic. Stay Tuned!

The New Open Data Handbook

Originally published in 2012, the Open Data Handbook has released an second edition. The handbook is to be used as a guide for organizations or individuals interested in publishing and/or utilizing open data. The goal is ensuring data is open and that data is applied as often as possible.

The second edition now includes 3 parts.

  1. Open Data Guide – The Why?, What? and How? of open data
  2. Value Stories – Stories of how open data is making a difference
  3. Resource Library – Videos, presentations, and publications about open data

Following the theme of open, the Open Data Handbook is open sourced on Github. You are free and encouraged to contribute. There is even an extensive contribution guide if you are interested.

Read the official announcement from the Open Knowledge Foundation.

Data Science Wars: R vs. Python

The great team over at DataCamp, an online site for learning R , has put together another wonderful infographic. This time, the topic is Data Science Wars (R versus Python). This has been a rather hot topic for quite some time. I even wrote about the debate back in 2013, R vs Python, The Great Debate.

DataCamp did an amazing job packing information into the infographic. Honestly, it is impressive they were able to pack so much information into a single infographic. Some of the topics covered are:

  • History
  • Who uses the language?
  • Community
  • Purpose of the language
  • Popularity
  • And way more great stuff

Enough about the description. Have a look for yourself. It is packed with great arguments for your next “R vs Python” debate.

R vs Python for data analysis
R vs Python for data analysis

Data Science Tech Institute Visiting Faculty

The Data ScienceTech Institute (DSTI) in France is starting 2 new master’s degree programs in data science. Both programs are highly innovative and offer a strong industry focus. Classes begin in October 2015, and each program is limited to 30 students. Therefore, if you are interested, it is important to apply as soon as possible.

The other day, the faculty at DSTI were announced. I am honored to say I was selected as one of the faculty. Thus, I will serve as a visiting faculty member for portions of the program.

DSTI offers 2 master’s degree programs:

  1. Data Scientist Designer – Located in Paris, this 2-year program is part-time and focused on working professionals looking to transition or enhance skills in the data science field. The course will rotate between 2 and 3 days a week.
  2. Executive Big Data Analyst – Located in Nice along the French Riviera, this program is a more traditional intensive 16-month program targeting full-time students.

If you are in France or Europe or interested in studying in France, the programs from DSTI are definitely worth a look.

Follow

Get every new post delivered to your Inbox.

Join 5,653 other followers

%d bloggers like this: