Microsoft has recently announced a machine learning competition platform. As part of the launch, one of the first competitions is the prediction of brain signals. It has $5000 in prizes, and submissions are accepted thru June 30, 2016.
Google and Tableau have teamed up to offer a big data visualization contest. The rules are fairly simple, just create an awesome visualization using at least the GDELT data set. Finalist will receive prizes worth over $5000 and even some will get tours of Tableau and Google facilities. The contest runs thru May 16, 2016.
Professor Norm Matloff from the University of California, Davis has published From Algorithms to Z-Scores: Probabilistic and Statistical Modeling in Computer Science which is an open textbook. It approaches statistics from a computer science perspective. Dr. Matloff has been both a professor of statistics and computer science so he is well suited to write such a textbook. This would a good choice of a textbook for a statistics course targeted at primarily computer scientists. It uses the R programming language. The book starts by building the foundations of probability before entering statistics.
Don’t Start with the Data
Do Start with a Good Question
Don’t think one person can do it all
Do build a well-rounded team
Don’t only use one tool
Do use the best tool for the job
Don’t brag about the size of your data
Do collect relevant data
Don’t ignore domain knowledge
Do consult a subject matter expert
Don’t publish a table of numbers
Do create informative charts
Don’t use just your own data
Do enhance your analysis with open data
Don’t do all the work yourself
Do partner with local universities
Don’t always build your own tools
Do use lots of open source tools
Don’t keep all your findings to yourself
Do share your analysis and results with the world!
Got any to add? Please leave a comment.
While preparing a for a recent talk I gave to an undergraduate audience, I started compiling some tips for future data scientists. The tips are intended for students (undergraduate and graduate) or anyone else planning to enter the field of data science.
I asked a few of my data science friends and posted a question on Quora, As a data scientist, what tips would you have for a younger version of yourself?
What follows is a summary of the many tips.
Tips for Data Science
- Be flexible and adaptable – There is no single tool or technique that always works best.
- Cleaning data is most of the work – Knowing where to find the right data, how to access the data, and how to properly format/standardize the data is a huge task. It usually takes more time than the actual analysis.
- Not all building models – Like the previous tip, you must have skills beyond just model building.
- Know the fundamentals of structuring data – Gain an understanding of relational databases. Also learn how to collect and store good data. Not all data is useful.
- Document what you do – This is important for others and your future self. Here is a subtip, learn version control.
- Know the business – Every business has different goals. It is not enough to do analysis just because you love data and numbers. Know how your analysis can make more money, positively impact more customers, or save more lives. This is very important when getting others to support your work.
- Practice explaining your work – Presentation is essential for data scientists. Even if you think you are an excellent presenter, it always helps to practice. You don’t have to be comfortable in front of an audience, but you must be capable in front of an audience. Take every opportunity you can get to be in front of a crowd. Plus, it helps to build your reputation as an expert.
- Spreadsheets are useful – Although they lack some of the computational power of other tools, spreadsheets are still widely used and understood by the business world. Don’t be afraid to use a spreadsheet if it can get the job done.
- Don’t assume the audience understands – Many (non-data science) audiences will not have a solid understanding of math. Most will have lost their basic college and high school mathematics skills. Explain concepts such as correlation and avoid equations. Audiences understand visuals, so use them to explain concepts.
- Be ready to continually learn – I do not know a single data scientist who has stopped learning. The field is large and expanding daily.
- Learn the basics – Once you have a firm understanding of the basics in mathematics, statistics, and computer programming; it will be much simpler to continue learning new data science techniques.
- Be polymath – It helps to be a person with a wide range of knowledge.
Thanks to Chad, Chad, Lee, Buck, and Justin for providing some of the tips.
I frequently ask young people, particularly undergraduates, what they plan to do with their future. I am often less than enthused with the responses which sound something like this:
- I hope to get a job doing statistics.
- I just want to work with computers.
- I want to be a data scientist.
- I just want a job.
The responses are typically vague and void of direction. Most responses involve waiting for someone else to provide the guidance. You do not have to wait. You can get started today.
If you are just interested in getting a job, the rest of this post is not for you. If you want to make an impact with your data science career, the remainder of this post is for you.
Below is an explanation of numerous specialties in data science. You don’t need to learn them all. Just pick one and follow the first step. You will learn more along the way. Don’t stress about which one to pick, there is no wrong answer. Just pick one and start building.
Data visualization is all about telling a story with data. Do you have a keen eye for color and design? Can you summarize complex data in a few simple charts? If you answer yes to those questions, then you just might be a good fit for data visualization.
First Step: Go to Data.gov and make an infographic
Data Science Educator
Are you the person always explaining your homework to others? This specialty might be for you. You can take a few different paths. One is the traditional university faculty approach. Another is more of a corporate training professional. The world needs both. Plus, if you are entrepreneurial, there are ample opportunities to consult as a data science educator. Businesses realize they need to know data science, and they are looking for training.
First Step: Start a video or blog with tutorials
A data engineer is typically more interested in systems than just the machine learning. Data engineers are typically strong with computer science fundamentals. They love to build things that themselves and others can use. A good data engineer can also spend a lot of time cleaning data as well.
First Step: Build a solution (hint: Cortana Intelligence Solutions)
Do you love to program? If so, you just might fall into this category. Data science has many needs for programmers. Everything from cleaning data to building data products needs programming.
First Step: Be on Github
Statistical Modeling (Machine Learning)
Some people just love the statistical modeling and machine learning. They love to tune models and squeeze the last bit of predictive power from a data set. If you love talking about regression, trees, random forests, AUC, cross-validation and boosting; then this specialty is most likely for you.
First Step: Enter Kaggle competitions.
Data Science Manager
If you are bossy, it does not mean you will make a good manager. The best managers know how to build strong teams and get out of the way. Managers will provide help and overall direction for projects. Plus, he/she should have a solid understanding of how data can help shape a team’s decisions.
First Step: Organize a group to help a non-profit analyze data (Similar to what DataKind does)
Data Science Researcher
A researcher is interested in pushing the boundaries of data science. Are you interested in creating your own machine learning algorithms? Do you want to build the next great data framework? Do you think data science can achieve something no one else has thought to try? If so, being a researcher is for you.
First Step: Go to graduate school
Data Science Unicorn
A data science unicorn is someone that knows all the specialties above and more. A unicorn understands all the topics of data science. Being a unicorn is not attainable for everyone, but a few people have become unicorns. If you think you can be a unicorn, go for it.
First Step: Start at visualization above
Simple: Pick a specialty and Go Make a Difference!
This post is based upon a talk I gave at Winona State University just before MUDAC. The original title was Go After Your Data Science Dreams.
I think this has been previously happening, but now Google has an official location for these public data sets stored in BigQuery. You can:
- Access and use the data in your applications
- Request Google to host your own public data set
It will be fun to watch this site expand with more public datasets. Happy Exploration!
The 2016 Midwest Undergraduate Data Analytics Competition (MUDAC) will be held at Winona State University in Winona, Minnesota on April 2 and 3.
- What is MUDAC?
MUDAC is an intense 2-day analytics competition aimed at undergraduate students. Teams compete to solve a problem posed by an external organization.
- Who can compete?
Teams of 3 to 4 undergraduate students attending a school in Minnesota, Wisconsin, Iowa, Illinois, North Dakota, or South Dakota
- Why attend MUDAC?
- A fun learning experience
- Friendly competition
- Meet others with similar inteests
- Learn about data science/analytic careers
- Practice preparing and giving a presentation
- Cash prizes for winning
- Door prizes
The competition also includes a panel discussion with some local data professionals. I am honored to be one of those panelists.
If you attend or teach at a university in the upper Midwest and you are interested in data science, you should strongly consider bringing a team to MUDAC. I hope to see you there.
If you are looking to learn data science but do not have the time or money for a full master’s degree, Data Society might be your answer. Data Society is an online data analysis skills training program that is designed by educators and curated by data science experts. The learning experience is online and includes:
- Training Videos
- Printable step-by-step guides
- Reusable Coding Templates
- Data Sets
- Opportunities to build a Portforlio
There is one other completely awesome feature of Data Society. For every membership purchased, they provide a free membership to help someone in need. Data Society is currently running a Kickstarter to build a community for learning data science. Your support would be greatly appreciated (I am not involved in the project but I am always happy to share innovative educational opportunities for data science).
Recently, I was able to get a brief interview with Merav Yuravlivker, one of the founders of Data Society.
There are many data science learning resources on the web, how is Data Society different?
We understand that most people do want to learn these skills, but don’t feel like they have the time, the money or the background. We eliminate all those barriers to entry by providing short lessons that are taught intuitively with real data sets. It’s the first platform that’s designed with working professionals in mind. Not only do we teach our students how to analyze data, but we also have a separate track for managers that teaches them how to implement data-driven strategies in their teams, how to hire a data scientist, and how to communicate effectively with their employees. Our courses are not just videos, each course includes ready-made data analysis templates in R that decrease the time it takes to do the work, a step-by-step printable guide that can be used as a reference for every stage of the analysis, and live, dynamic forums where students can get all of their questions answers by the Data Society team as well as other students. In short, we provide everything someone needs to learn new skills in a much shorter amount of time.
What is the Kickstarter about?
Our Kickstarter campaign is about building that community around learning data science and helping others solve problems – we’ve already released the first three courses in our curriculum, and we’re excited to give our supporters an opportunity to see exactly how their contributions can make an impact. Our mission is to increase data literacy across the workforce – we know that data analysis skills are widely valued and sought-after, which is why we’re partnering with non-profits who help veterans and low-income individuals get back to work. For every membership bought off of Kickstarter, we will give one to someone who can use these skills to become more marketable and improve their life.
Is there anything else you would like to tell me about Data Society?
The most frequent compliment we get from students is that they didn’t feel intimidated to learn data analysis skills. As an educator, that is the biggest reward to me because we’re opening up possibilities for individuals who didn’t think that they had the ability to analyze data and pull insights from it. Our goal is not to turn everyone into a data scientist, but rather to give everyone the ability and confidence to get new data, look at the data they already have, ask “How can this data help me solve this problem?”, and then discover those insights that will help them make better decisions.