Tag Archives: bigdata

Free Big Data Virtual Meetup

The First Virtual Expo for the Big Data Community

What: Big Data eMeetup, an online conference with speakers, vendors, and attendees. All in an interactive game-like setting.
When: July 23, 2015, all day
Where: Online, www.bigdataemeetup.com
Cost: Free

I believe this is a promising, low cost, and fun way to host a meetup. Please join Big Data eMeetup for the innovative, inaugural launch.

The New Trend of Big Data E-Meetup – Official Announcement

The BIG DATA Expo is a unique platform that provides a B2B experience in the form of MMORPG (mass multiplayer online role playing game) featuring live presentations, virtual exhibition halls with virtual booths and enhanced networking capabilities.

Who will be there? The event is dedicated to host the BIG DATA industry key players, among them:

  • Engineers
  • Data Scientists
  • Developers
  • Architects
  • Managers
  • Executives
  • Students
  • Professional Services
  • Architects
  • Networking specialists
  • Data Analysts
  • BI Developers/Architects
  • QA
  • Data Warehouse Professionals
  • Sales
  • Technical Marketing
  • Teaching Staff
  • Performance Engineers

Why virtual?

Our unique and innovative platform will allow you to participate in a virtual worldwide event and do business matching as if you were there in person Network with people from around the world easily with no traveling and very low costs

Why Big Data?

The industry is characterized as an industry of prominent ongoing developments, trends, and innovations. Working in this industry requires frequent traveling to exhibitions around the world every few weeks. We believe that “The Big Data E-Meetup” is the optimal solution for many key industry players to explore new way to network and do business without the need travel so much and have endless budgets for exhibitions.

The Big Data Virtual Expo will be open to everyone, and will offer free access to the presentations and to the virtual exhibition halls, where companies will showcase their products via virtual booths and have their representative chat with the attendees via a video conference, the virtual expo will function just like a regular MMORPG (Mass Multiplayer Online Role Playing Game) and will offer a mix of business and fun.

Entering the event is as simple as opening a browser and joining online. There are no traveling costs involved for the attendees and no booth design costs for the exhibitors. Today, in the digital age, networking and doing business is just a click away.

We are honored to invite the industry to join us at “The Big Data E-Meetup” and to experience this unique opportunity to explore, mingle, and network with the Data community – right from your own computer!

NIST defines Big Data and Data Science

The National Institute of Standards and Technology (NIST) is attempting to create standards for Big Data. They just released the NIST Big Data interoperability framework, which is a huge set of documents aimed at creating standards around everything in big data from definitions to architectures.

Big Data Definitions

In case you are wondering, and I know you are, what are the definitions. The framework includes many more definitions.

Big Data consists of extensive datasets – primarily in the characteristics of volume, variety, velocity, and/or variability – that require a scalable architecture for efficient storage, manipulation, and analysis.

Data science is the empirical synthesis of actionable knowledge from raw data through the complete data lifecycle process.

Don’t like the definitions? Great, NIST would love to hear your opinions/comments. Comments are being collected until May 21, 2015.

The NIST Big Data interoperability framework is a massive work consisting of 7 volumes. All are open for comments.

  1. Definitions
  2. Taxonomies
  3. Use Case & Requirements
  4. Security and Privacy
  5. Architectures White Paper Survey
  6. Reference Architecture
  7. Standards Roadmap

The process to submit a comment appears rather old-school (hint: NIST, Github might be a good place to collect comments/edits), but it is not difficult.

Free Big Data Analytics Handbook

Brian Liou from Leada was kind enough to provide a guest post about their latest handbook, The Data Analytics Handbook: Big Data Edition.

Data Analytics Handbook
Data Analytics Handbook
Have you ever wondered what the deal was behind all the hype of “big data”? Well, so did we. In 2014, data science hit peak popularity, and as graduates with degrees in statistics, business, and computer science from UC Berkeley we found ourselves with a unique skill set that was in high demand. We recognized that as recent graduates, our foundational knowledge was purely theoretical; we lacked industry experience; we also realized that we were not alone in this predicament. And so, we sought out those who could supplement our knowledge, interviewing leaders, experts, and professionals – the giants in our industry. What began as a quest for the reality behind the buzzwords of “big data” and “data science,” The Data Analytics Handbook, quickly turned into our first educational product of our startup Leada (see www.teamleada.com). Thirty plus interviews and four editions later, the handbook has been downloaded over 30,000 times by readers from all over the world In them, you’ll discover whether “big data” is overblown, what skills your portfolio companies should look for when hiring a data scientist, how leading “big data” and analytics companies interview, and which industries will be most impacted by the disruptive power of data science. We hope you enjoy reading these interviews as much as we enjoyed creating them!
Download all 4 handbooks at www.teamleada.com/handbook

Strata 2015

The annual Strata Conference in California is this week. The workshops have already started, but the conference does not begin until Thursday, February 19, 2015. At that time, a number of the keynote speeches will be live streamed for free. The keynotes are always great so be sure to tune in.

3 Great Data Science Books You Can Read Now…for free

Just this week, I have become aware of 3 free online books for data science.

Data Visualization with Javascript

If you are looking for a tutorial to teach you how to make wonderful visualizations on the web, look no further. Data Visualization with JavaScript is a free online book for learning data visualization with Javascript. It provides tons of examples and step by step instructions for how to create the graphs, charts, and other visualizations. Here is a quick list of the topics:

  • Graphs
  • D3.js
  • Interactive Charts
  • Geographic Plots
  • Timelines

Frontiers in Massive Datasets

Frontiers in Massive Datasets is a report all about how science, business, communications, national security and others need to learn to handle massive amounts of data. Whether the data has been sitting in a database for years or it is now just screaming into the systems, massive data is now a problem for almost every industry. This report covers many of the topics that need to be addressed when dealing with big data. Here is a very brief overview of the topics:

  • Limitations
  • Sampling
  • Building Models from Massive Data
  • Real-time Algorithms
  • 7 Computational Giants of Massive Data Analysis

Foundations of Data Science

Foundations of Data Science is a draft of textbook written by John Hopcroft and Ravindran Kannan. It is intended to be a text for computer science with an emphasis more on probability and statistics rather than discrete mathematics. The authors argue that knowledge of working with data is a necessary skill for computer scientists of the future. This is clearly the most technical and academic of the 3 books, but if that is your thing, your should really enjoy browsing through this book. Here are some of the topics.

  • High-Dimensional Space
  • Clustering
  • Algorithms for Massive Data Problems
  • Singular Value Decomposition
  • Graphical Models

Strata + Hadoop World 2014 Videos

John Rauser from Pinterest gives one of the more popular talks from the Recent Strata Conference + Hadoop World. The following quote from his talk might peak your interest enough to get you to watch the entire video. Remember, he is speaking to a room with some of the leading data scientists in the world.

Many of the people in this audience are faking it….when it comes to statistics

Many other keynotes, talks, and interviews during the Strata + Hadoop World videos are available on the Youtube playlist.

Stanford Releases Large Network Datasets

Stanford University has just released a collection of large datasets of network data. When I say network data, I am referring to the mathematical term of networks (think of a collection of nodes and edges). Here are just a few of the possible categories.

  • Citation Networks
  • Road Networks
  • Web graphs
  • Social Networks such as twitter
  • and many more
  • If you are looking to study network data, or just want some practice analyzing big data, this just might be a good place to start.

Data Scientist vs Data Engineer

As the field of data science continues to grow and mature, it is nice to begin seeing some distinction in the roles of a data scientist. A new job title gaining popularity is the data engineer. In this post, I lay out some of the distinctions between the 2 roles.

Data Scientist vs Data Engineer Venn Diagram
Data Scientist vs Data Engineer Venn Diagram

Data Scientist

A data scientist is responsible for pulling insights from data. It is the data scientists job to pull data, create models, create data products, and tell a story. A data scientist should typically have interactions with customers and/or executives. A data scientist should love scrubbing a dataset for more and more understanding.

The main goal of a data scientist is to produce data products and tell the stories of the data. A data scientist would typically have stronger statistics and presentation skills than a data engineer.

Data Engineer

Data Engineering is more focused on the systems that store and retrieve data. A data engineer will be responsible for building and deploying storage systems that can adequately handle the needs. Sometimes the needs are fast real-time incoming data streams. Other times the needs are massive amounts of large video files. Still other times the needs are many many reads of the data.
In other words, a data engineer needs to build systems that can handle the 3 Vs of big data.

The main goal of data engineer is to make sure the data is properly stored and available to the data scientist and others that need access. A data engineer would typically have stronger software engineering and programming skills than a data scientist.

Conclusion

It is too early to tell if these 2 roles will ever have a clear distinction of responsibilities, but it is nice to see a little separation of responsibilities for the mythical all-in-one data scientist. Both of these roles are important to a properly functioning data science team.

Do you see other distinctions between the roles?

International School of Engineering Programs Beginning Soon

I recently received the following information.

International School of Engineering is announcing their 3rd batch of live e-Learning certificate programs starting 4-Sep-2013 in “Engineering Big Data with R and Hadoop Ecosystem” and “Essentials of Applied Predictive Analytics” (http://goo.gl/kHckP).

These programs helped Engineers and Managers transform into Hadoop Developers/Data Scientists, get industry certifications, revolutionize their workspace and establish exciting careers.

Highlights:

•Taught by experts who are Carnegie Mellon, Johns Hopkins and Stanford University’s alumni with Fortune 50 experience
•Applied and interactive classes
•Classes ranked among the top 1% and 5% of all classes in the world in piazza
•1/3rd the cost of other similar programs
•95% Success with Cloudera and EMC2

For details visit http://goo.gl/bPJEF

For any queries mail us at elearning@insofe.edu.in or call us at +91 9502334561/2/3

42 Big Data Startups – Vote for the Top 10

Startup50’s list of 42 Big Data Startups.

The voting the done, but the list contains plenty of startups working in the data science field.