Tag Archives: Java

Deep Learning in Java

Deep Learning is the hottest topic in all of data science right now. Adam Gibson, cofounder of Blix.io, has created an open source deep learning library for Java named DeepLearning4j. For those curious, DeepLearning4j is open sourced on github.

Below is a video of Adam introducing deep learning and DeepLearning4j. Also, if you are interested in learning more about deep learning. Here are a couple more very help links.

R vs Python, The Great Debate

Recently I have seen blogs/articles claiming Python is the best choice for data science and R is the new language for business. Honestly, both articles are truthful and good. Both Python and R are good. Why do we have to choose? Let’s use both.

Here is my opinion. I prefer R to Python when performing exploratory data analysis. R has so many packages for every possible statistical technique. The plots, although not beautiful by default, are quick and easy to create. However, I prefer Python when I need to pull data from an API or build a software system or website. Python is more than just a statistical analysis tool; it is a complete programming language. I might even end up using Java for a project in the near future.

There does not have to be a clear winner or one single language to use. Use the best tool for the job and get on with your data science. In the end, the world cares more what you produced not whether you used R or Python or something else.

5 Free Programming Languages for Data Science

  1. R There is a package for nearly any algorithm you will ever need. That is where R really excels. It is widely used and has a strong community. The only slight downfall (in my opinion) is the cumbersome syntax.
  2. Python A very good language for beginning programmers. The syntax is quite readable and intuitive. With the NumPy and SciPy packages, python has many of the tools/algorithms necessary to do data science.
  3. Octave Octave was created to be very similar to the commercial product, Matlab. Octave is used and highly recommended in Dr. Andrew Ng’s Coursera machine learning course.
  4. Java While I don’t read a lot about people using Java for quickly testing new statistical models, a couple of the larger open-source data science products are built with Java, Hadoop and Storm to name a couple. Plus, Java does have libraries for just about everything, and it has proved itself to be a fairly descent production environment.
  5. Julia This is the newcomer on the list. Julia claims to have really great performance along with built-in support for parallelism and cloud computing. I am not too familiar with Julia, but it will be interesting to see how the Julia community grows over the coming months and years. Julia is currently lacking some of the libraries/algorithms that the others on the list support.

Java and MongoDB Webinars

10gen, the company behind MongoDB, will be offering some free webinars this fall. This webinar series is targeted at using MongoDB with Java. 10gen has been running successful webinars for a long time, so I would high recommend any/all of the following sessions.

Title Date
Building your first Java Application with MongoDB Oct. 18, 2012 and Nov. 22, 2012
Building Web Applications with MongoDB and Spring Nov. 1, 2012
MongoDB on the JVM Nov. 29, 2012
Simplifying Persistence for Java and MongoDB Dec. 13, 2012

Neo4j and Bioinformatics Webinar

Neo Technology, the company behind the graph database Neo4j, is hosting a webinar on Thursday. Pablo Pareja from the Bio4j project will provide an overview of bioinformatics and neo4j, as well as some applications.

Bioinformatics can be viewed as data science for biology. Bioinformatics was cool before data science was even a term.

If you are interested in learning more about bioinformatics and graph databases, the register for this webinar and start learning.