Occasionally a product in Microsoft Azure will go down. Luckily, Azure has a status page to tell you which servers and services are down. Here is a quick video to help you find that status page.
Here is the latest data science news for the week of April 29, 2019.
From Data Science 101
- The Go Programming Language for Data Science
- Quick Video Tutorial for Find Updates in Azure
- Two-Minute Papers, One Pixel attack on NN
General Data Science
- How to Go into Data Science – tons of Q&A for becoming a data scientist
- Uber submits Hudi to the Apache Software Foundation – Open source project providing stream processing for big data
- 3 Awesome Visuals with Python Code -Categorical Correlation Graphs, Pairplots, and Swarmplots
- What is the difference between AI and Machine Learning – ML makes decision based upon historical data, AI just tries to make smart decision (not necessarily with data)
- Software development best practices in a deep learning environment – Deep Learning adds some new challenges to traditional software engineering. This article covers some tips for just that.
- Tukey, Design Thinking, and Better Questions – In short, spend some time trying to optimize the question
- Faster multiplication – 2 mathematicians discovered a technique to multiply numbers in O(n log n) time
What do you think? Did I miss any big news in the data science world?
While it is not one of the popular programming languages for data science, The Go Programming Language (aka Golang) has surfaced for me a few times in the past few years as an option for data science. I decided to do some searching and find some conclusions about whether golang is a good choice for data science.
Popularity of Go and Data Science
As the following figure from Google Trends demonstrates, golang and data science became trendy topics at about the same time and grew at a similar rate.
The timely trends may have created the desire to merge the two technologies together.
Golang Projects for Data Science
Some internet searching will reveal a number of interesting Golang/Data Science projects on Github. Unfortunately, many of the projects had good initial traction but have dwindled in activity over the last couple years. Below is a listing of some of the data science related projects for Golang.
- Gopher Data – Gophers doing data analysis, no schedule events, last blog post was 2017
- Gopher Notes – Golang in Jupyter Notebooks
- Lgo – Interactive programming with Jupyter for Golang
- Gota – Data frames for Go, “The API is still in flux so use at your own risk.”
- qframe – Immutable data frames for Go, better speed than Gota but not as well documented
- GoLearn – Machine Learning for Go
- Gorgonia – Library for machine learning in Go
- Go Sklearn – Port of sci-kit learn from Python, still active but only a couple committers, early but promising
- Gonum – Numerical library for Go, very promising and active
Golang Data Science Books
There have even been a couple books written about the topic.
- Go Machine Learning Projects (2018) – this book uses gonum and gorgonia in the examples
- Machine Learning with Go (2017)
Thoughts from the Community
The “Go for Data Science” debate has been discussed numerous times over the past few years. Below is a listing of some of those discussions and the key take aways.
- Machine Learning with Go? on Reddit
“and once you know what you are going to do, implementing the training and deploying in Go is much better”
- Golang for Data Science on Reddit
“most likely it won’t be go and be one of more academia adopted languages like Python, MatLab or Julia”
- Data Science Gophers – O’Reilly Blog Post
- Moving From Python to Go – Towards Data Science blog post
not data science specific, but helpful
- Go for Data Science
by the author of gorgonia and one of the books, includes a great presentation slidedeck
- Why we switched from Python to Go
also, not data science specific
- Go vs Python 3 Benchmarks
Go performs much better than Python on benchmarks
- Can Go really be that much faster than python? Stackoverflow
“Go really can be that much faster than python”
Reasons to use Golang for Data Science
- Strong Developer Ecosystem
- Basic Data Science packages are available
Reasons Not to use Golang for Data Science
- Limited support from the data science community for Golang
- Significantly increased time for exploratory analysis
- Less flexibility to try other optimization and ML techniques
- The data science community has not really adopted golang
In short, Golang is not widely used for exploratory data science, but rewriting your algorithms in Golang might be a good idea.
Microsoft Azure has an abundance of data science capabilities (and non-data science capabilities). It can be challenging to keep up with the latest updates/releases. Luckily, Azure has a page to let you know exactly what has changed. You just need to know where to find it, and the following video will help you find that page.
LinkedIn’s 2017 report had put Data Scientist as the second fastest growing profession and it’s number one on 2019’s list of most promising jobs. There are three main reasons why data science has been rated as a top job according to research. Firstly, the number of available job openings is rapidly increasing and the highest in comparison to other jobs, data science has an extremely high job satisfaction rating, and the median annual salary base is undeniably desirable.
While data science is unquestionably a fantastic career path regarding the impressive ratings and the fact that it is such an in-demand job, statistics show that there will be no slowing down for the surprisingly rapid increase for the demand of data scientists around the globe.
Checkout the top 5 companies to work for if you are a data scientist based on employee reviews, job satisfaction ratings, and CEO approval.
Dataiku is a top-rated computer software company that was founded in 2013 and its headquarters can be found in New York. This company develops collaborative data science software and according to Glassdoor reviews, 99% of the employees that work for Dataiku would recommend working at this company and 100% approve of the CEO. This shows that the vast majority of the employees are satisfied with the company and they are also a top choice for data science and machine learning positions based on annual pay packages.
Checkout: Dataiku Careers
StreamSets was founded in 2014, its headquarter is located in San Francisco, California. The company develops a DataOps platform that can allow business to manage streaming data flows. An impressive 98% of individuals employed at this company would recommend it to their friends and 100% of the employees here also approve of the CEO. StreamSets is a top option for data management and integration.
Checkout: StreamSets Careers
#3 1010 Data
1010 Data has its headquarter in the New York and the company has over 15 years of experience in handling data analytics with over 850 clients across various industries. It is ranked as the third best company to opt for as a data scientist, 1010 Data is also a great option with 96% of employees recommending the company and 99% of employees approving of the CEO.
Checkout: 1010 Data Careers
Reltio is based in Redwood Shores, California and the company was founded in 2011. This top-rated company is recommended by 96% of its employees and a top choice for data management and integration. Even though it is fourth on the list according to statistics, it is still a fantastic company to expand your experience as a data scientist.
Checkout: Reltio Careers
Looker was founded in 2012 and its headquarters are located in Santa Cruz, California. Looker is suggested as a great company to opt for by 95% of their happy employees and 93% of the employees that work at Looker approve of the CEO. This company is great for business analytics.
Checkout: Looker Careers
How can you get a job as a data scientist?
Having a degree in Data Science, Computer Science, Mathematics, Statistics, Social Science, Engineering with additional knowledge of Python, R Programming, Hadoop increases the possibility of getting a starting position job. Plenty of universities offer specialized data science program both online and offline. In recent times, we have observed a rise in online masters in data science, because of the convenience it offers to professionals, especially those looking to switch careers.
Build a portfolio using real data to complete projects that can showcase your abilities as a data scientist. You could also opt for an internship to further develop your skills and knowledge as a data scientist.
Recently updated, is the March 2019 Machine Learning Study Path. It contains links and resources to learn Tensorflow and Scikit-Learn.
If you are interested in details on the study path and how to best use the resources. There is a livestream on Facebook, Sunday March 17 on the Math for Data Science Facebook page.