Are you ever confused about some of the terms dealing with big data? If so, here is a link for you. It provides countless big data terms and a brief definition.
If you follow data science news, you will know that the New York Times puts out some very good content. Well, recently they ran a section called BIG DATA 2013. There are many articles and tons of information. Happy Reading!
This infographic explains the Three V’s of big data, contains a nice list of analytical techniques, related trends and some other information.
Working with big data can often mean doing some cloud computing. If a public cloud like Amazon AWS is not an option, there are some open source alternatives. They all offer some level of compatibility with the AWS API for both EC2(compute) and S3(storage).
Not too long ago, I was in an airport looking for a book to help pass the time. I was thrilled to find a book with Big Data in the title, so I took it and started reading.
The book is titled, Big Data: A Revolution That Will Transform How We Live, Work, and Think. The authors are Viktor Mayer-Schonberger, professor at Oxford, and Kenneth Cukier, data editor at the Economist.
The book is well-written and entertaining. I would like to specifically point out 2 chapters that really stuck out to me. Chapter 4 on correlation provided an excellent description of why correlation is not the same as causation. It then went on to state that with all the data available, correlation might be all that is needed. Here is a quote from Chapter 4.
The correlations show what, not why, but as we have seen, knowing what is often good enough.
The final chapter, Chapter 10, is about what is next with big data. It provides a look into the future of where big data will make a difference: global problems, medicine, climate change, physics, sensors, and nearly all other parts of our lives. It also mentions that big data is only going to get bigger.
Also, chapter 5 introduced me to a new word, datafication. I am still not exactly sure what the definition is. Chapter 9 has a great discussion about privacy because people are losing control that information IS being collected. People can only hold others accountable for how the information is used.
The book will not help you master machine learning algorithms (it is not intended for that). It is not a technical book. However, if you are interested in what types of questions can be answered with all your data, this book is great. I believe the book is targeted at business people that are hoping to get a grasp of all the big data talk.
This is an excellent write-up for the differences between:
- Machine Learning
- Data Mining
- Big Data
- Predictive Analytics
- Data Science
As of yesterday, I was completely new to the term work-force science. Essentially, work-force science is data analysis applied to Human Resources. It makes more sense than the old gut feeling approach. If you want to know more, see this excellent article from the New York Times, Big Data, Trying to Build Better Workers. The following quote sums up one of the key findings.
An applicant’s work history is not a good predictor of future results.
Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.
A nice, short, 2 minute video from edCetra Training with some good facts about big data and data analysis.
- The digital universe is 10 times the size it was in 2006
- Greater literacy and cloud computing are helping fuel big data
- 80% of companies data is unstructured – difficult to analyze
- Employees spend 2 hours per day searching for the right information