Archive

Archive for the ‘Uncategorized’ Category

Azure Data Factory V2– Pipeline Functional Testing

November 18, 2018 Leave a comment

This article helps you to understand functional testing with pipelines in Azure Data Factory.

Note:-

This article applies to Azure Data Factory V2.

This article assumes that you have gone through Introduction to Azure Data Factory and the QuickStart Tutorial.

Credits – Thanks to ADFv2 Product Group (Anudeep Sharma, Hermine Vermishyan & Abhishek Narain ) for reviewing and all assistance with my ADFv2 and my partner in awesomeness David Kuo for testing the script

Overview

During development phase of an Azure Data Factory pipeline you have a debugging functionality in UX which you can use to debug a pipeline and find out if anything is failing due to reasons like incorrect connection string, object not present in database, wrong query, etc.

Keeping same idea in mind I have extended the functionality of debugging from pipelines’ development stage to release stage, where you can use PowerShell script to perform pipelines’ functional testing in Test or Integration or UAT environment ensuring all issues are caught at that stage instead of hitting in Production environment and in this article series will share the process of achieving same with PowerShell script and good news is that you can use same script in your environment without making any code change.

Concept

Function test with ADF can be used to validate if pipeline is working and if anything related to functionality is failing then stop release, fix issue and iterate. It’s more aligned with build verification ensuring objects like linked service, tables, storage, etc. which are required for a pipeline are successfully validated before releasing to production environment.

Scope

Testing functionality of pipelines ensuring configurations like databases, server, storage, etc. are configured correctly and pipeline is not failing due to misconfiguration like database not exist or something on same lines. Errors due to data quality, performance or anything related to data sanctity is outside the scope of functional testing.

How it Works

Let’s go in details of understanding working of functional testing with ADFv2.

First of all you need to understand how debugging works in pipeline and good news is that there is very informative video done by Gaurav Malhotra from ADF team (highly recommended if you want to get insight of pipeline debugging): Iterative development and debugging using Data Factory

As you have seen in this video that for debugging you need to run a pipeline and for running a pipeline it requires parameters defined for your pipeline and run the pipeline in debug mode with an output showing whether pipeline succeeded or failed, keeping same approach in mind I attempted to write this functionality with following flow:-

Functional Testing Flow: –

clip_image002

Now let’s look into each step in this flow:-

1. Azure Active Directory (AAD) Authentication – AAD is required for silent authentication of PowerShell script which is used during automation of testing and PowerShell script needs to access Azure Data Factory for getting list of pipelines so we need to ensure Application ID should also have access to Azure Data Factory. There are few handy articles which you can follow for creating Application ID mentioned as follow:-

a. How to create an AAD application and service principal

b. How to give access to Application ID

i. Go to Azure Data Factory in Azure Portal

ii. Under All settings select Users and click Add as shown in following figure: –

Settings: –

clip_image004

iii. After hitting Add it will open another windows which will allow you to add Application ID which you had crated in Step1, as there is need of only getting list of pipelines so we are making Application member of Reader Role as shown in following figure where Application name is FunctionalTesting and added member of Reader Role:-

clip_image006

iv. In PowerShell we will be leveraging Application ID for silent authentication and its recommended to use Key Vault for pulling secrets in this case we need a Application ID secret (password), in PowerShell I have parameterized secret field so you can either pass it as plain text or make a call to Key Vault during release process and pass along the secret. Code snippet as follow in which we are doing silent authentication using Application ID which is also knowns as Service Principal, you can refer this article for getting TenantID

clip_image008

2. Functional Testing Logic –

a. Get list of Pipeline from ADF

clip_image010

b. Check for Pipeline Exclusion

clip_image009

Example of exclusion file:-

clip_image012

c. Pipeline Test file – Following is an example of test file, in my case I have a pipeline named as pl-testValidation in data factory and that pipeline expects a parameter known as SubFolder, I need to ensure then I am creating a parameter file with respect to my pipeline which will be used a functional test file by testing logic, also need to ensure that test file name matches with the pipeline name and should have all parameters stated in a pipeline, and if this file is missing then it will be marked as failed with missing test file

Example of test file as the pipeline name is pl-testValidation so I have created test file as pl-testValidation.json (PipelineName.json) with a parameter consumed by pipeline and TestTimeout value as I want to run this test for 80 seconds, if I will miss this value then it will default to 30 seconds.

clip_image014

d. In following code if you notice we are iterating through pipelines and checking if PipelineName.json exist and also checking what is the TestTimeout value if missing then defaulting it to 30 seconds.

clip_image016

e. Running Pipeline test by using following command

clip_image018

f. Using value of $runId to get current status (refer GIT – PS script for full code)

clip_image020

g. Using value of $runId to get status (refer GIT – PS script for full code)

clip_image021

h. Fail test if it exceeds timeout setting

clip_image023

i.If you are integrated with Release Manager then it will generate exception in-case of failure resulting in failed release and give an output of failed pipelines which needs to be fixed before deploying to production environment but don’t be dis-heartened you can still code even though you are not using Release Manager by using following command.

.\AzureDatafactoryFunctionalTesting.ps1 -dataFactoryName "Your Factory Name" -resourceGroupName "Factory Resource Group Name" -azureSubscription "Subscription ID" -applicationSecret "Application Secret from Point 1" -applicationID "Application ID from Point 1" -tenantID "Tenant ID from Point 1" -excludedPipelinesFilePath ".\" -functionalTestJsonFilePath ".\"

Conclusion

As you have seen in this article I have gone through step by step of achieving functional testing with ADFv2 and in subsequent article I will be discussing more about how to integrate same with Release Manager so that you can have this testing framework as a part of Pipeline Release process to Production.

Code Location

Git – feel free to download and contribute

Advertisements
Categories: Uncategorized

ADF – Azure Data Factory Learning Path

If you are looking to learn Azure Data Factory (ADF) it’s a good path to start, as it covers various components of ADF like all kind of supported flows with code samples.

I would suggest to start with Portal as you don’t require any coding experience on same and if you are coming from DevOps or DBA background then you can follow PowerShell path, further if you are coming from pure development background then you can use APIs provided by ADF through Nugget.

Here is link of learning path:-
https://azure.microsoft.com/en-us/documentation/learning-paths/data-factory/

Following link shows how Microsoft Studios (XBOX) team is using Azure Data Factory:-
https://channel9.msdn.com/Blogs/raw-tech/Halo-Minecraft-and-More-How-Microsoft-Studios-Processes-Gaming-Event-Data

Happy Learning

Categories: Uncategorized

HDInsight: How to find Long Running Queries

January 25, 2017 Leave a comment

Coming from SQL background where you have dependency on SQL Profiler Trace for understanding what’s going on SQL Server like who is running non optimized long query, where its scanning through all partitions and yes who is using non indexed columns for filter right? So coming to HDInsight world wondering do we have something like SQL Profiler for tracking bad jobs – Yes we do, as a part of Apache Ambari Project – (as per wiki) Whole objective of Ambari is to make Hadoop management easy by providing a mechanism of managing or monitoring Hadoop  clusters with it’s easy to use  UI, in this Post I will show how to use Ambari for getting long running or resource intensive jobs.

Objective: How to find long running or resource intensive Hive query?
Steps:-

clip_image001[4]

 

  • Click Yarn > Quick Links > Active Head Node >>Resource Manager UI

clip_image002[5]

Or if you don’t want to go through the whole hassle just type https://<HDInsightClusterName>.azurehdinsight.net/yarnui/hn/cluster, in popup provide security information admin and its password – and it would open Resource Manager – it provides you all information about jobs (running/pending/finished/etc.) on HDI cluster

  • In following screenshot – you can notice multiple options for checking Jobs like in this case I have this specific job which is consuming:-
    • A – Memory – 2.2 TB
    • B – % of Queue – 64%
    • C – % of Cluster – 60%

image

  • In this case filtered job may be problem making query which is causing other jobs to wait in queue as its consuming 60% of resource and 2.2 TB of total 3.75TB allocated for this cluster. I can further look into job by clicking into job id (Job ID > Logs and Pull query from there dag) and check how’s query look like and what’s its doing under the hood – is it scanning TBs of partitions or filtering on non-partitioned fields or what.

clip_image004

 

  • Like SQL You can Kill this job from Resource Manager it self – Click on Job and on top you will find a button for killing same query, like this job is running for last 7 hours, you can Kill it by clicking Kill Application

image

 

Hope this will come handy.

Feel free to add comments, may be in next post I can take a deep dive into each log and action.

Happy Learning.

(Published on my  blogs https://karanspeaks.com/ & https://blogs.msdn.microsoft.com/)

Temporary Post Used For Theme Detection (45286bb8-790b-401a-869b-3660278f5b19 – 3bfe001a-32de-4114-a6b4-4005b770f6d7)

January 25, 2017 Leave a comment

This is a temporary post that was not deleted. Please delete this manually. (0cc580d6-e4cf-4859-8a96-2556a3c75393 – 3bfe001a-32de-4114-a6b4-4005b770f6d7)

Categories: Uncategorized

SSAS Monitoring Tool–SQL 2014 ASTrace

November 30, 2016 Leave a comment

 

Refer SSAS Monitoring Tool– ASTrace to learn more about ASTrace, this post was pending for a long time as I got tons of email and comment around SQL 2014 and 2016 ASTrace, whith this blog like to mention that I have shared a source code on git so anyone can contribute to same.

 

Code for ASTrace 2014 is tested with latest build of SQL Server 2014 but for 2016 I had tested with SQL 2016 Preview and didn’t get chance to test with RTM so in coming few days will test and update the code as well if required, but feel free to update if you like as its Git so anyone can contribute.

 

Here is Git Link – https://github.com/karanspeaks/SQL-Server-Analysis-Services-Samples

Categories: Uncategorized

Halo, Minecraft, and More! How Microsoft Studios Processes Gaming Event Data

February 18, 2016 Leave a comment
Categories: Uncategorized

Halo, Minecraft, and More! How Microsoft Studios Processes Gaming Event Data

February 18, 2016 1 comment

My first video on Channel 9, it talks about how we Handle various Events generated from Games………..

https://channel9.msdn.com/Blogs/raw-tech/Halo-Minecraft-and-More-How-Microsoft-Studios-Processes-Gaming-Event-Data/player

Categories: Uncategorized