Spark on Azure HDInsight (public preview) is now available!
The following components are included as part of a Spark cluster on Azure HDInsight.
- Spark 1.3.1 Comes with Spark Core, Spark SQL, Spark streaming APIs, GraphX, and MLlib.
- Anaconda. A collection of powerful packages for python.
- Spark Job Server, which allows your to submit jars or python scripts remotely.
- Zeppelin Notebook for interactive querying.
- Ipython Notebook for interactive querying.
- Spark in HDInsight also provides an ODBC driver for connectivity to Spark clusters in HDInsight from BI tools such as Microsoft Power BI and Tableau.
Below are articles and documentation on Spark on Azure HDInsight to get you started!
Article | Link |
Overview: Apache Spark on Azure HDINSIGHT | https://azure.microsoft.com/en-us/documentation/articles/hdinsight-apache-spark-overview/ |
Provision Apache Spark clusters in HDInsight using custom options | https://azure.microsoft.com/en-us/documentation/articles/hdinsight-apache-spark-provision-clusters/ |
Quick Start: Provision Apache Spark on HDInsight and run interactive queries using Spark SQL | |
Use BI tools with Apache Spark on Azure HDInsight | https://azure.microsoft.com/en-us/documentation/articles/hdinsight-apache-spark-use-bi-tools/ |
Spark Streaming: Process events from Azure Event Hubs with Apache Spark on HDInsight | |
Build Machine Learning applications using Apache Spark on Azure HDInsight | |
Manage resources for the Apache Spark cluster in Azure HDInsight | https://azure.microsoft.com/en-us/documentation/articles/hdinsight-apache-spark-resource-manager/ |
Spark Job Server on Azure HDInsight clusters | https://azure.microsoft.com/en-us/documentation/articles/hdinsight-apache-spark-job-server/ |