Friday, February 05, 2021

ADF Data flows - Minimize Cluster Warm-up Times

Azure Data Factory Data Flows is a cool feature that implements SSIS like data flow transformations within ADF that harvests the power of Apache Spark. It provides familiar interface to set Source and Target and other transformations such as Joins, Derived Columns and Filter, but with scalability you can achieve to process billions of rows using spark clusters.

If you have worked with mapping data flows, the type of data flows that implements SSIS like transformations, you should already know that each data flow requires it's own spark cluster to run. By default, once a data flow finishes executing it's job, it will be deallocated from the cluster and the next data flow requires to be allocated to an available cluster (if not spin up a new cluster) which is the warm-up time that you see in the beginning of each data flow execution.

This warm up time can vary from 5 to sometimes 7 minutes. If you have sequentially connected data flows in your ADF pipeline, these warm-up times that are in between every data flow can affect the overall execution time of the pipeline. (Check below screenshot)



One way to overcome this is to connect data flows in parallel so that they will only have that initial warm-up time as all the parallel data flows will be allocated to their own clusters.

However if that is not an option, the solution is to set "Time To Live" setting in the Azure Integration Runtime that you assign to the data flows in the ADF pipeline. This will make sure the cluster stays alive after the data flow finishes executing and the next data flow can be allocated to the same cluster. This definitely brings down the cluster warm-up time down to less than 3 minutes.

Something to note is that this setting cannot be modified for the default AutoResolveIntegrationRuntime. Therefore, you will need to create a new Azure Integration Runtime in the correct region and select this setting (see screenshot). Once created, select this integration runtime for each data flow in the ADF pipeline.

Hope this helps!


 


Thursday, February 01, 2018

Power BI and Slowly Changing Dimension Type 2, selecting the right member based on the date

A recent BI project, that had a Slowly Changing Dimension (SCD) type 2 for their customers, used Power BI as it's visualization tool and the client wanted to get counts that were true based on the dates they selected. After trying out various methods, I finally used the following,

I've created a sample dimension table to show how it's done. Below is how it looks like,



The SQL Query that would select the correct records based on the date would look like below,



The client wanted to select a date range and Power BI should show distinct count of members that are current based on the end date selected. Below are the steps I used to achieve this,

Note: There is a seperate date dimension called dimDate that's used in date filters.

Step 1: Create a measure that will get the max date selected


I have used two functions, MAXX and ALLSELECTED. ALLSELECTED will disregard any context set by the dimCustomer dimension.

Step 2: Create another measure for the counts.

I will be using the CustomerNumber column to count the customers.

I'm using a variable here to avoid context transition and the max date value will stay unchanged for all the customer records.

Step 3: using the Date slicer and the new measure in a table


To look at the detailed records, i included the CustomerNumber column as well























Hope this is helpful. Please let me know what you think and if there are better ways of doing this.

Cheers!

Saturday, January 27, 2018

Installing prerequisites for Master Data Services (MDS) failing with error code 0x800F081F

Recently I wanted to try out SQL Server 2017 Master Data Services and started to install the pre-requisites. It requires IIS feature/ role to be added before configuring MDS databases and the web application. I was using Windows 10 and when I selected all the components required from add features dialog and started to install using Windows Update option (it now defaults to downloading required files from Windows Update), it fails with the error code: 0x800F081F (I had an active internet connection).

After searching for sometime I found a work around to use Windows Media disk or ISO file as the source, instead of connecting to internet and downloading the required files.

Step 1 : Create Windows Media ISO using Windows Media Creation tool

Download the tool from here fore Windows 10
https://www.microsoft.com/en-ca/software-download/windows10

Run the exe file and follow the instructions to create an ISO file of the currently installed windows version.

Step 2: Mount the media ISO
Once the ISO is created, mount it.

Step 3: Change group policy to add Alternate Source File Path

Follow the instructions in the following article under the section "Set Group Policy" to specify an alternate source file path that points to the mounted ISO.

https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/configure-a-windows-repair-source

It usually is,

:\sources\sxs

Step 4: Update the group policy
Run the following command to apply changes to the group policy,

gpupdate /force

Note: if the computer is in a domain, make sure you have access to the domain controller, usually through a VPN connection.

Step 5: try adding the features again
This time, hopefully, you are not prompted to select windows update, instead, it will locate the files from the location you specified and install the features.

Once the features are successfully installed, don't forget to reset the policy settings.

Cheers!

Wednesday, March 12, 2014

SSDT 2012 Database Project Limitation with Change Tracking - using CHANGETABLE in inline table valued function

I recently worked on a SQL Server Data Tools (SSDT) 2012 database project that had Change Tracking enabled. There were a lot of stored procedures and functions written using the CHANGETABLE() function to retrieve changes. Also, there were a few inline table valued functions written to select records from CHANGETABLE(), that would look like something below,

CREATE FUNCTION [dbo].[fn_MsBuildTest]
()
RETURNS TABLE AS RETURN
(
select ID
from CHANGETABLE(changes dbo.TBL1_DTL , 1) as CT
)

It basically selects IDs of dbo.TBL1_DTL table from the CHANGETABLE() function.

When I built the project within visual studio, it built without any errors or warnings. However the issue came up with the project was built using TFS build. The build failed with the following error,

"F:\Documents\Personal\Projs\DB\RecoverTest\RecoverTest\RecoverTest.sqlproj" (r
ebuild target) (1) ->
(SqlBuild target) ->
  F:\Documents\Personal\Projs\DB\RecoverTest\RecoverTest\dbo\Functions\fn_MsBui
ldTest.sql(5,9,5,9): Build error SQL71501: Computed Column: [dbo].[fn_MsBuildTe
st].[ID] has an unresolved reference to object [ID]. [F:\Documents\Personal\Pro
js\DB\RecoverTest\RecoverTest\RecoverTest.sqlproj]

It cannot identify the column ID and throws a unresolved reference error.

After digging we found out that the issue is with MSBuild. When the project was built locally, on the same computer where I successfully built the project using visual studio 2012, using MSBuild utility, the same error was thrown. So it turns out that MSBuild has a limitations building inline functions with CHANGETABLE() function used in the select statement.

To verify that, I changed the function to a multi-lined table valued functions, and here it goes. MSBuild successfully built it. Below is how it was after it has been changed to multi-lined table valued function.

CREATE FUNCTION [dbo].[fn_MsBuildTest]
()
RETURNS @changes TABLE(ID bigint)
AS
BEGIN

insert @changes
select ID
from CHANGETABLE(changes dbo.TBL1_DTL , 1) as CT

return
END

According to this experience, it turns out that if CHANGETABLE can be used in a multi-lined table valued function, but not inside a inline table valued function.


Thursday, February 06, 2014

Resolving SSRS windows authentication issue, continuous pop up of user authentication dialog and the user never gets authenticated.

Recently there has been a issue where in a test environment, one of my colleagues was having a scaled out setup of SSRS. He had dedicated virtual servers for Database, SSRS and SharePoint (SSRS was installed in SharePoint Integrated mode). After the installation he could connect to the Report Server using the default url within the same server where SSRS was installed. However, from any of the other servers, Database and SharePoint, this was not possible, as when accesses the url, the user authentication dialog continuously pops up and it never gets authenticated even though the user name and the password was confirmed to be correct.

After hours of struggling, it turned out that the issue was with the authentication settings of SSRS in the Reporting Services Config file. The below link can be used as a guide to do this.

http://technet.microsoft.com/en-us/library/cc281253.aspx

Sunday, February 02, 2014

SQL Server Data Tools - Resolving self-refereces (referencing local database objects with three part name) issue

I have been allocated to stream line the database solution management of one of the projects we had that consisted of several databases. The primary database was a Navision CRM database, and there was a Data Warehouse project consisted of several staging databases.
In some of the databases, there were database objects created such as views, that references tables from the same database using three part name,

[Current Database Name].[schema].[Local Table Name]

As the first step of the solution management, I had to create SSDT database projects by importing the schema of these databases. However I came across this issue, where you should resolve all the references to objects of different databases. Since three part names were used for local tables, SSDT treats them as external objects, and requires these references to be resolved. After reading a few articles, that explained the only work around was to remove Database Name from the table names used, which is the preferred option, I came across this workaround to resolve these self-references.

In general, to resolve external database references in SSDT database projects, we should add database reference to the project, and change the code to add SQLCMD variable in place of the database name of the three part name. SSDT provides a set of pre-defined SQLCMD variables, such as $(DatabaseName) that gives option to provide current database name.

My workaround was to replace database names of the self-references to $(DatabaseName), so that it would look like below,

[$(DatabaseName)].[schema].[Table Name]

This resolved the self-referencing issue. However, it is always a best practice to avoid three part names for local objects!

Monday, April 22, 2013

Appending to Existing Backup Set in SQL Server

When you are taking a backup of SQL Server database, you can append the backup to an existing backup set, which is the default option selected. This can be done by selecting an existing backup file (if you are using a disk backup) as the destination. Appending to an existing backup set is also possible with tape backups. Once you select the same backup file, and choose to append to an existing backup set, the new backup will be appended to the end of the previous backup.

When it comes to restoring, if you select a backup file, it will select, by default, the last backup set in the backup media. But how can we select a different backup set to be restored? Once you select the backup media for restoring, the last backup set will be selected. You can see that in the "Backup sets to restore" list, the name of the last taken backup set is shown. However, you can select a previous backup set by clicking  "Timeline" button in the "Destination" section, right above the backup set list. There you get a timeline of backup sets available in the selected media, and provides the option to select which backup set should be restored. You can simply click on the backup set, or select the date/time of the backup set based on which the correct backup set will be selected from the timeline. Once the backup set is selected, SQL Server will restore it instead of the latest.