Big Data – Streaming

There are many different areas of the architecture to design when looking at a big data project. Do you need to account for a large amount of data streaming into your warehouse or can you mostly focus on processing the data coming in and just need to pick the right data store or warehouse? Here are the major elements we look at in an architecture with a focus on Streaming in this section.

Data now comes from more places than ever. With all of the sensors generating reading while computers and people generating even more information, it can be critical to make the right decision on which tool to select. There are some thoughts below on the pros and cons. Advanced inSight has experience with many of the products below with an emphasis on Amazon Kinesis Streaming. Let us help you make the decision.

bigdata_data_streaming

Pros and Cons of Data Streaming Tools

Flume

  • Pros: Reliable
  • Cons: Does not manage multiple streams

Kafka

  • Pros: scalable and reliable – adopted in many cloud based offerings
  • Cons: setup and support time consuming

Amazon Kinesis Streaming

  • Pros: Set up and management tools from AWS
  • Cons: Doesn’t scale quite as well as Kafka

Azure Event Hubs

  • Pros: Set up and management tools
  • Cons: Not as mature as others

Hortonworks Data Flow

  • Pros: Powerful user interface and management capabilities unity
  • Cons: Just released Q4 2015 by Hortonworks