Archive

Posts Tagged ‘HDInsight’

HDInsight: How to find Long Running Queries

January 25, 2017 Leave a comment

Coming from SQL background where you have dependency on SQL Profiler Trace for understanding what’s going on SQL Server like who is running non optimized long query, where its scanning through all partitions and yes who is using non indexed columns for filter right? So coming to HDInsight world wondering do we have something like SQL Profiler for tracking bad jobs – Yes we do, as a part of Apache Ambari Project – (as per wiki) Whole objective of Ambari is to make Hadoop management easy by providing a mechanism of managing or monitoring Hadoop  clusters with it’s easy to use  UI, in this Post I will show how to use Ambari for getting long running or resource intensive jobs.

Objective: How to find long running or resource intensive Hive query?
Steps:-

clip_image001[4]

 

  • Click Yarn > Quick Links > Active Head Node >>Resource Manager UI

clip_image002[5]

Or if you don’t want to go through the whole hassle just type https://<HDInsightClusterName>.azurehdinsight.net/yarnui/hn/cluster, in popup provide security information admin and its password – and it would open Resource Manager – it provides you all information about jobs (running/pending/finished/etc.) on HDI cluster

  • In following screenshot – you can notice multiple options for checking Jobs like in this case I have this specific job which is consuming:-
    • A – Memory – 2.2 TB
    • B – % of Queue – 64%
    • C – % of Cluster – 60%

image

  • In this case filtered job may be problem making query which is causing other jobs to wait in queue as its consuming 60% of resource and 2.2 TB of total 3.75TB allocated for this cluster. I can further look into job by clicking into job id (Job ID > Logs and Pull query from there dag) and check how’s query look like and what’s its doing under the hood – is it scanning TBs of partitions or filtering on non-partitioned fields or what.

clip_image004

 

  • Like SQL You can Kill this job from Resource Manager it self – Click on Job and on top you will find a button for killing same query, like this job is running for last 7 hours, you can Kill it by clicking Kill Application

image

 

Hope this will come handy.

Feel free to add comments, may be in next post I can take a deep dive into each log and action.

Happy Learning.

(Published on my  blogs https://karanspeaks.com/ & https://blogs.msdn.microsoft.com/)

HDInsight on Windows: Building PowerPivot Report from Hive in a Few Easy Steps

January 4, 2013 2 comments

What are we Covering:

  • Hive
    • Create table
    • Load Data
    • Query Data
  • Configure ODBC
  • Create PowerPivot
    • With DataSource as HDInsight
    • Import Data from HDInsight 
  • Create Pivot Chat using imported data

 

(1)Open Hadoop Command Line

After installing HDInsight for Windows  you will get Hadoop Command Line Shortcut on desktop

Click on Hadoop Command Line Shortcut and it will open Cmd Line for Hadoop

clip_image002

(2)Type Hive and it will change to Hive Command

clip_image004

(3)Create Tables

CREATE TABLE emp(id int,name string)ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’;

CREATE TABLE sales(id int,sales int)ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’;

clip_image005

(4)Load Data using CSV Files

LOAD DATA LOCAL INPATH ‘c:/data/emp.csv’ OVERWRITE INTO TABLE emp;

clip_image006

LOAD DATA LOCAL INPATH ‘c:/data/sales.csv’ OVERWRITE INTO TABLE sales;

clip_image007

(5) Open HDInsight Dashboard (You will find shortcut on desktop) and click on Download

clip_image010

(6)Click on the appropriate link for your Excel.

clip_image013

(7)Configure ODBC

For configuring ODBC, Click start and type in ODBC. This will bring up the Data Sources dialog. Go to the System DSN tab and click Add. Configure it with a Data Source Name, Your Host Information, and a Username. In Single Node HDInsight Installation you don’t need to provide Username but if you are using HDInsight on Azure you can provide Username .

clip_image014

clip_image016

(8) Provider Data Source Name and Default Port is 10000

clip_image018

(9)Open Excel Sheet, select PowerPivot in ribbon and click on Manage

clip_image021

(10)Click on Get External Data

clip_image025

(11)Select – Get External Data and Data Import window and Select Others(OLEDB/ODBC)

clip_image027

(12)Click Build

clip_image029

(13)Select OLE DB Provider for ODBC

clip_image030

(14)Select DSN which we have created earlier in step number 8, Select Database (In Single Node HDInsight installation you don’t need to provider User Name and Password in current release but in case of HDInsight on Azure you need to provide User Name and Password) and follow the wizard for selecting tables in our case its EMP and Sales

clip_image031

clip_image034

clip_image035

clip_image036

(15)If you Notice Relationship between Emp and Sales is absent so with one click we can create relationship

clip_image038

clip_image040

(16)Added feature like Calculated Measure name Sales and created a KPI

clip_image041

(17)Select Pivot of your Choice from PivotTable

clip_image042

(18) Finally – you are good to go with PowerPivot Report based on Hive

clip_image044

Refer

HDInsight – What is it?

Configuring: PowerPivot for Excel 2013

HDInsight – What is it?

January 4, 2013 4 comments

HDInsight is Microsoft’s Hadoop-based distribution.

HDInsight comes in two flavors:

HDInsight Server for local on-premise installation of the Hadoop distribution, this will allow to build local cluster with your own Hadoop Hive able to run Hadoop Jobs, on clip_image001top of that fully integration with Microsoft BI Stack

HDInsight on Azure Service is the easiest way to deploy, manage and scale Hadoop based solution. Current release includes:

    • HDFS and Map/Reduce
    • Pig
    • Hive
    • Sqoop

HDInsight:

Categories: HDInsight Tags: , ,