Most SSIS Developers know that they can use Integration Services for more than just moving data around. I want to spread the idea that we can also use Integration Services to create an Azure compute environment.
This combination has several advantages:
- You can provide azure resources in parallel. (For example you can create an HDInsight Cluster and several VMs at the same time.)
- Metadata about your environment can be stored inside SQL Server and generated using T-SQL.
- A nice looking user interface shows the progress of provisioning, data movement and computing tasks.
- The SSIS Package is a visual documentation of your environment and workflows.
- Data movement, processing and environment provisioning is orchestrated in a single place.
- Visual Studio can be used as development environment. (including TFS Source Control)
- You pay only the resources you need. The final step in your package could be to remove the environment.
- No money tied up in potentially suboptimal hardware investments. The environment will be extremely flexible. You can create and test a new configuration every day.
- (Practically) infinite scalability. You can copy and paste parts of your SSIS Package to provide additional resources.
Here is an example:
The basics are quite simple. The only thing you need to know is how to execute Powershell inside SSIS Packages:
For this purpose you can use the “Execute Process Task”. Here is an example on how to execute a single Powershell Command in Integration Services:
You can also execute an entire Powershell Script file and also pass parameters to it:
It might happen that your azure credentials expire one day. In this case just open a Powershell ISE Window, execute the AddAzureAccount commandlet and enter your Subscription Credentials. This should fix execution issues. Btw if you have an MSDN Subscription you get around 100$ azure usage for free every month.
A cool use case for this budget is to create several SQL Server VMs with slightly different server and database settings and run a workload script against them. This allows you to figure out which configuration works best. It’s a brute force approach to performance tuning. The data can be collected from all VMs using SQL Server Extended Events and can be stored in a SQL Azure Datawarehouse. It usually is a good idea to create a Tabular Model and use Power BI to explore the results.