Tag Archives: programming

GoLang for Data Science

While it is not one of the popular programming languages for data science, The Go Programming Language (aka Golang) has surfaced for me a few times in the past few years as an option for data science. I decided to do some searching and find some conclusions about whether golang is a good choice for data science.

Popularity of Go and Data Science

As the following figure from Google Trends demonstrates, golang and data science became trendy topics at about the same time and grew at a similar rate.

The timely trends may have created the desire to merge the two technologies together.

Golang Projects for Data Science

Some internet searching will reveal a number of interesting Golang/Data Science projects on Github. Unfortunately, many of the projects had good initial traction but have dwindled in activity over the last couple years. Below is a listing of some of the data science related projects for Golang.

  • Gopher Data – Gophers doing data analysis, no schedule events, last blog post was 2017
  • Gopher Notes – Golang in Jupyter Notebooks
  • Lgo – Interactive programming with Jupyter for Golang
  • Gota – Data frames for Go, “The API is still in flux so use at your own risk.”
  • qframe – Immutable data frames for Go, better speed than Gota but not as well documented
  • GoLearn – Machine Learning for Go
  • Gorgonia – Library for machine learning in Go
  • Go Sklearn – Port of sci-kit learn from Python, still active but only a couple committers, early but promising
  • Gonum – Numerical library for Go, very promising and active

Golang Data Science Books

There have even been a couple books written about the topic.

Thoughts from the Community

The “Go for Data Science” debate has been discussed numerous times over the past few years. Below is a listing of some of those discussions and the key take aways.

Reasons to use Golang for Data Science

  • Performance
  • Concurrency
  • Strong Developer Ecosystem
  • Basic Data Science packages are available

Reasons Not to use Golang for Data Science

  • Limited support from the data science community for Golang
  • Significantly increased time for exploratory analysis
  • Less flexibility to try other optimization and ML techniques
  • The data science community has not really adopted golang

Summary

In short, Golang is not widely used for exploratory data science, but rewriting your algorithms in Golang might be a good idea.

Data Science and IoT Course

Want to learn about data science and the Internet of Things (IoT)? Futuretext is about to start Data Science for Internet of Things. It is a course aimed at people looking to learn the topics and transition into IoT and data science careers. Here are some quick highlights about the course.

  • Starts Mid March 2016 and lasts through December 2016
  • Personalized Course
  • Available Online

Below is a list of topics.

  • Data Science
  • IoT
  • Machine Learning
  • Spark
  • Data Science for IoT methodology
  • Deep Learning

Do check out Data Science for Internet of Things for more details.

5 Free Programming Languages for Data Science

  1. R There is a package for nearly any algorithm you will ever need. That is where R really excels. It is widely used and has a strong community. The only slight downfall (in my opinion) is the cumbersome syntax.
  2. Python A very good language for beginning programmers. The syntax is quite readable and intuitive. With the NumPy and SciPy packages, python has many of the tools/algorithms necessary to do data science.
  3. Octave Octave was created to be very similar to the commercial product, Matlab. Octave is used and highly recommended in Dr. Andrew Ng’s Coursera machine learning course.
  4. Java While I don’t read a lot about people using Java for quickly testing new statistical models, a couple of the larger open-source data science products are built with Java, Hadoop and Storm to name a couple. Plus, Java does have libraries for just about everything, and it has proved itself to be a fairly descent production environment.
  5. Julia This is the newcomer on the list. Julia claims to have really great performance along with built-in support for parallelism and cloud computing. I am not too familiar with Julia, but it will be interesting to see how the Julia community grows over the coming months and years. Julia is currently lacking some of the libraries/algorithms that the others on the list support.

Learn Computer Science For Data Science

Many aspects of computer science are fundamental to data science. A good data scientist has to be able to transform/extract/manipulate lots of data. Computer programming is the main technique for such operations. Here are numerous resources to help you learn the fundamentals of computer science.

Online Computer Science Courses: Introductory Level

If you are not familiar with computer programming, this list is a good place to start.

Online Computer Science Courses: More Advanced

Two More Helpful Resources

Stack Overflow is a great site for answering all of your programming questions. It is good for beginners as well as more advanced programmers. Also, if you start writing a lot of code, Github is a great place to store that code.