Archive for April, 2013

HDInsight Whitepapers –

April 15, 2013 1 comment

Informative Whitepapers covering operation of HDInsight including:

1) Compression in Hadoop

When using Hadoop, there are many challenges in dealing with large data sets. The goal of this document is to provide compression techniques that you can use to optimize your Hadoop jobs, and reduce bottlenecks associated with moving and processing large data sets.

In this paper, we will describe the problem of data volumes in different phases of a Hadoop job, and explain how we have used compression to mitigate these problems. We review the compression tools and techniques that are available, and report on tests of each tool. We describe how to enable compression and decompression using both command-line arguments and configuration files.

To review the document, please download the Compression in Hadoop Word document.

2) Hadoop Performance in Hyper-V

Compelling use-cases from industry leaders are quickly changing Hadoop from an emerging technology to an industry standard. However, Hadoop requires considerable resources, and in the search for computing power, users are increasingly asking if it is possible to virtualize Hadoop—that is, create clusters on a virtual machine farm—to build a private cloud infrastructure .

This paper presents the result of internal benchmarks by Microsoft IT, in which the performance of a private cloud using virtual machines was compared to the same jobs running on servers dedicated to Hadoop. The goal was to determine whether Hadoop clusters hosted in Microsoft Hyper-V can be as efficient as physical clusters.

The results indicate that the performance impact of virtualization is small, and that Hadoop on Microsoft Hyper-V offers compelling performance as well as other benefits.

To review the document, please download the Performance of Hadoop on Windows in Hyper-V Environments Word document.

3)Job Optimization in Hadoop

The Map/Reduce paradigm has greatly simplified development of large-scale data processing tasks. However, when processing data at the terabytes or petabyte scale in Hadoop, jobs might run for hours or even days. Therefore, understanding how to analyze, fix, and fine-tune the performance of Map/Reduce jobs is an extremely important skill for Hadoop developers.

This paper describes the principal bottlenecks that occur in Hadoop jobs, and presents a selection of techniques for resolving each issue and mitigating performance problems on different workloads. The paper explains the interaction of disk I/O, CPU, RAM and other resources, and demonstrates with examples why efforts to tune performance should adopt a balanced approach.

It includes the results of extensive experiments with performance tuning, which resulted in significant differences in the speed of the same Map/Reduce job before and after.

To review the document, please download the Hadoop Job Optimzation Word document.

4) Leveraging a Hadoop cluster from SQL Server Integration Services (SSIS)

With the explosion of data, the open source Apache™ Hadoop™ Framework is gaining traction thanks to its huge ecosystem that has arisen around the core functionalities of Hadoop distributed file system (HDFS™) and Hadoop Map Reduce. As of today, being able to have SQL Server working with Hadoop™ becomes increasingly important because the two are indeed complementary. For instance, while petabytes of data can be stored unstructured in Hadoop and take hours to be queried, terabytes of data can be stored in a structured way in the SQL Server platform and queried in seconds. This leads to the need to transfer data between Hadoop and SQL Server.

This white paper explores how SQL Server Integration Services (SSIS), i.e. the SQL Server Extract, Transform and Load (ETL) tool, can be used to automate Hadoop + non Hadoop job executions, and manage data transfers between Hadoop and other sources and destinations.

To review the document, please download the Leveraging a Hadoop cluster from SQL Server Integration Services (SSIS) Word document

Categories: Uncategorized

SSAS Crashing Intermittently: Caused by Monitoring / AV Scans

April 8, 2013 1 comment

Problem Description:

Analysis Services is crashing intermittently and also producing min-dumps.


This could be one of the issues. For getting full analysis of Mini-Dump you can involve Microsoft Customer Support Services and ask them to analyze dumps.

In this case we found:

The issue is SMS client (CcmExce) does software inventory periodically, and it scans the data folder of Analysis Services. When a job needs to commit, it has to delete older version of data files. At the moment, the SMS client has a file handle on some of the database files, causing SSAS unable to delete the older version of the database, so commit fails and crashes SSAS

As you can see in this Process Monitor – CcmExec is browsing through SQL Folders:



We have resolved issue by making an exception in SMS client not to browse / scan SSAS Folders
– Data
– Temp
– Config
– Log


Exclude Analysis Services folders from Virus Scans, File Monitoring Tools, Systems Management Server Client – CcmExec.exe or any other 3p File Monitoring or File Backup Tool.

For SQL engine follow recommendations are given in this link –

Categories: Uncategorized