Azure Data Factory Data Flows is a cool feature that implements SSIS like data flow transformations within ADF that harvests the power of Apache Spark. It provides familiar interface to set Source and Target and other transformations such as Joins, Derived Columns and Filter, but with scalability you can achieve to process billions of rows using spark clusters.
If you have worked with mapping data flows, the type of data flows that implements SSIS like transformations, you should already know that each data flow requires it's own spark cluster to run. By default, once a data flow finishes executing it's job, it will be deallocated from the cluster and the next data flow requires to be allocated to an available cluster (if not spin up a new cluster) which is the warm-up time that you see in the beginning of each data flow execution.
This warm up time can vary from 5 to sometimes 7 minutes. If you have sequentially connected data flows in your ADF pipeline, these warm-up times that are in between every data flow can affect the overall execution time of the pipeline. (Check below screenshot)
One way to overcome this is to connect data flows in parallel so that they will only have that initial warm-up time as all the parallel data flows will be allocated to their own clusters.
However if that is not an option, the solution is to set "Time To Live" setting in the Azure Integration Runtime that you assign to the data flows in the ADF pipeline. This will make sure the cluster stays alive after the data flow finishes executing and the next data flow can be allocated to the same cluster. This definitely brings down the cluster warm-up time down to less than 3 minutes.
Something to note is that this setting cannot be modified for the default AutoResolveIntegrationRuntime. Therefore, you will need to create a new Azure Integration Runtime in the correct region and select this setting (see screenshot). Once created, select this integration runtime for each data flow in the ADF pipeline.
Hope this helps!
No comments:
Post a Comment