Disclaimer: The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

ADF – Azure Data Factory Learning Path

If you are looking to learn Azure Data Factory (ADF) it’s a good path to start, as it covers various components of ADF like all kind of supported flows with code samples.

I would suggest to start with Portal as you don’t require any coding experience on same and if you are coming from DevOps or DBA background then you can follow PowerShell path, further if you are coming from pure development background then you can use APIs provided by ADF through Nugget.

Here is link of learning path:-
https://azure.microsoft.com/en-us/documentation/learning-paths/data-factory/

Following link shows how Microsoft Studios (XBOX) team is using Azure Data Factory:-
https://channel9.msdn.com/Blogs/raw-tech/Halo-Minecraft-and-More-How-Microsoft-Studios-Processes-Gaming-Event-Data

Happy Learning

Categories: Uncategorized

HDInsight: How to find Long Running Queries

January 25, 2017 Leave a comment

Coming from SQL background where you have dependency on SQL Profiler Trace for understanding what’s going on SQL Server like who is running non optimized long query, where its scanning through all partitions and yes who is using non indexed columns for filter right? So coming to HDInsight world wondering do we have something like SQL Profiler for tracking bad jobs – Yes we do, as a part of Apache Ambari Project – (as per wiki) Whole objective of Ambari is to make Hadoop management easy by providing a mechanism of managing or monitoring Hadoop  clusters with it’s easy to use  UI, in this Post I will show how to use Ambari for getting long running or resource intensive jobs.

Objective: How to find long running or resource intensive Hive query?
Steps:-

clip_image001[4]

 

  • Click Yarn > Quick Links > Active Head Node >>Resource Manager UI

clip_image002[5]

Or if you don’t want to go through the whole hassle just type https://<HDInsightClusterName>.azurehdinsight.net/yarnui/hn/cluster, in popup provide security information admin and its password – and it would open Resource Manager – it provides you all information about jobs (running/pending/finished/etc.) on HDI cluster

  • In following screenshot – you can notice multiple options for checking Jobs like in this case I have this specific job which is consuming:-
    • A – Memory – 2.2 TB
    • B – % of Queue – 64%
    • C – % of Cluster – 60%

image

  • In this case filtered job may be problem making query which is causing other jobs to wait in queue as its consuming 60% of resource and 2.2 TB of total 3.75TB allocated for this cluster. I can further look into job by clicking into job id (Job ID > Logs and Pull query from there dag) and check how’s query look like and what’s its doing under the hood – is it scanning TBs of partitions or filtering on non-partitioned fields or what.

clip_image004

 

  • Like SQL You can Kill this job from Resource Manager it self – Click on Job and on top you will find a button for killing same query, like this job is running for last 7 hours, you can Kill it by clicking Kill Application

image

 

Hope this will come handy.

Feel free to add comments, may be in next post I can take a deep dive into each log and action.

Happy Learning.

(Published on my  blogs https://karanspeaks.com/ & https://blogs.msdn.microsoft.com/)

Temporary Post Used For Theme Detection (45286bb8-790b-401a-869b-3660278f5b19 – 3bfe001a-32de-4114-a6b4-4005b770f6d7)

January 25, 2017 Leave a comment

This is a temporary post that was not deleted. Please delete this manually. (0cc580d6-e4cf-4859-8a96-2556a3c75393 – 3bfe001a-32de-4114-a6b4-4005b770f6d7)

Categories: Uncategorized

SSAS Monitoring Tool–SQL 2014 ASTrace

November 30, 2016 Leave a comment

 

Refer SSAS Monitoring Tool– ASTrace to learn more about ASTrace, this post was pending for a long time as I got tons of email and comment around SQL 2014 and 2016 ASTrace, whith this blog like to mention that I have shared a source code on git so anyone can contribute to same.

 

Code for ASTrace 2014 is tested with latest build of SQL Server 2014 but for 2016 I had tested with SQL 2016 Preview and didn’t get chance to test with RTM so in coming few days will test and update the code as well if required, but feel free to update if you like as its Git so anyone can contribute.

 

Here is Git Link – https://github.com/karanspeaks/SQL-Server-Analysis-Services-Samples

Categories: Uncategorized

Halo, Minecraft, and More! How Microsoft Studios Processes Gaming Event Data

February 18, 2016 Leave a comment
Categories: Uncategorized

Halo, Minecraft, and More! How Microsoft Studios Processes Gaming Event Data

February 18, 2016 1 comment

My first video on Channel 9, it talks about how we Handle various Events generated from Games………..

https://channel9.msdn.com/Blogs/raw-tech/Halo-Minecraft-and-More-How-Microsoft-Studios-Processes-Gaming-Event-Data/player

Categories: Uncategorized

Azure Data Factory: Detecting and Re-Running failed ADF Slices

November 13, 2015 Leave a comment

 

Recently I came across  a scenario where I need to detect failed slices of all Datasets in Azure Data Factory, in my case I need to detect for last 3 months and the number of slices was around 600+, they failed due to validation error as the source data wasn’t present and after a number of re-try slices were marked as failed.

In such cases its difficult to perform re-run from the  Portal as you need to right click on each slice and run it explicitly.

Solution: I wrote a following PowerShell Script, this script will detect all failed slices in a given Azure Data Factory and re-run same with your consent.

You can use same script not only for failed slices but for any status, you just need to change the Dataset status in filtering of slices, shown in following script.

I am also planning to write a solution which will run as a service in a worker role and automatically detect failed slices in a given time and re-run same.

Question can be asked, that in ADF you already have re-run logic they why you need to go through the hassles of writing and running script.

Yes we do have but after x number of re-runs a slices is marked as failed and only way is to run is through portal or programmatically.

So, here is my contribution to ADF Community.

Pre-requisite – Azure Resource Manager PowerShell (https://azure.microsoft.com/en-us/blog/azps-1-0-pre/)

Copy following code in text file and save it as file.ps1

You can also download the script and save it as PS1 – https://karangulati.files.wordpress.com/2015/11/re-run-failed-slices-ps11.docClick Here

#Begin Script

Login-AzureRmAccount
$slices= @()
$tableName=@()
$failedSlices= @()
$failedSlicesCount= @()
$tableNames=@()

$Subscription="Provide Subscription ID" 
 
  Select-AzureRMSubscription -SubscriptionId  $Subscription   
$DataFactoryName="Provide Data Factory Name"
$resourceGroupName ="Porvide Resource Group Name for Data Factory"
 
$startDateTime ="2015-05-01" #Start Date for Slices
$endDateTime="2015-08-01" # End Date for Slices

#Get Dataset names in Data Factory – you can explicitly give a table name using $tableName variable if you like to run only for an individual tablename
$tableNames = Get-AzureRMDataFactoryDataset -DataFactoryName $DataFactoryName -ResourceGroupName $resourceGroupName | ForEach {$_.DatasetName}

$tableNames #lists tablenames

foreach ($tableName in $tableNames)
{
    $slices += Get-AzureRMDataFactorySlice -DataFactoryName $DataFactoryName -DatasetName $tableName -StartDateTime $startDateTime -EndDateTime $endDateTime -ResourceGroupName $resourceGroupName -ErrorAction Stop
}

$failedSlices = $slices | Where {$_.Status -eq ‘Failed’}

$failedSlicesCount = @($failedSlices).Count

if ( $failedSlicesCount -gt 0 )
{

    write-host "Total number of slices Failed:$failedSlicesCount"
    $Prompt = Read-host "Do you want to Rerun these failed slices? (Y | N)"
    if ( $Prompt -eq "Y" -Or $Prompt -eq "y" )
    {

        foreach ($failed in $failedSlices)
        {
               write-host "Rerunning slice of Dataset "$($failed.DatasetName)" with StartDateTime "$($failed.Start)" and EndDateTime "$($failed.End)""
            Set-AzureRMDataFactorySliceStatus -UpdateType UpstreamInPipeline -Status Waiting -DataFactoryName $($failed.DataFactoryName) -DatasetName $($failed.DatasetName) -ResourceGroupName $resourceGroupName -StartDateTime "$($failed.Start)" -EndDateTime "$($failed.End)"
            $failed.DatasetName

        }
    }
           
}
else
{
    write-host "There are no Failed slices in the given time period."
}

#End Script

Categories: Uncategorized