Thursday, July 2, 2015

Analyze Azure SQL Data Warehouse with ClearStory Data

As soon as Azure SQL Dataware House was announced  there was an announcement of a data loading business(Attunity CloudBeam) and now we have a high speed on the fly data analysis solution ready.

ClearStory Data pairs with recently announced Azure SQL Data petabyte Warehouse to provide business insights on the fly, at ultra-speeds  for large data volumes. This highly scalable data analysis platform is based on Apache Spark Speed.

Key features of this pairing:
1) Out-of-the-box fast data access and data prep – ClearStory provides highly scalable and fast, Spark-based access to data in Azure SQL Data Warehouse. Upon accessing data from these sources, ClearStory’s Data Inference Engine determines attributes in the source data to accelerate data prep and data harmonization, eliminating traditional, lengthy, complex data prep operations.
2) Rapid diagnostic, exploratory and ad-hoc analysis – Users can ask more questions, blend and harmonize new data on the fly, and uncover deeper insights. Insights evolve as new data is discovered and update as your data updates without needing human intervention or more data modeling cycles. The business benefits from deeper insights, on more data, that update intraday, daily, weekly or whatever speed
3) Business consumption of insights via Interactive, Collaborative StoryBoards™ – Business users view StoryBoards to spot insights as data updates, capture and maintain analytic context, and collaborate in real-time to speed consistent, data-driven decisions. At any point, a StoryBoard can be augmented with more data to answer a bigger question, without requiring old-style data modeling and data wrangling

Read more here:

About Apache Spark from here.
"Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming."