Wednesday, July 27, 2011

SSIS: Issue when waiting for 2 tasks completion

Here is a case in SSIS I got. We have a package that performs a first task, then do 2 parallel actions (in screenshot there are data flows, but it doesn’t matter), and then a 3rd step that must wait for 2 parallel actions ending.
The critical fact is that one of 2 parallel actions is done only if condition is true.
Apparently it must be built like that:
The issue: In case the condition is not true, the package will never continue to “Next task”!!
In fact it’s very logic: “Next task” is waiting for 2 DFs to be done, and the first one doesn’t run because condition is false. So it will never send “completed” answer!

The solution is simple:
A new conditioned path was added with the contrary of first DF condition. It’s connected to dummy task (drug a script task without editing it at all).
Data flow is also connected to dummy task with OR logical connection between 2 sources. Then this task and DF2 are connected with AND connection.
This solution is equal to this logical expression:
([Condition] AND [Data Flow 1])
(NOT [Condition])
[Data Flow 2]

BIDS: “Insufficient Memory” error on running SSIS

On one of our servers, I got this error in Visual Studio BIDS 2005 when trying to run a SSIS package I built:
“Insufficient Memory to continue the execution of the program” in a message box.
It was solved by installing Visual Studio 2005 Service Pack 1.

Monday, July 25, 2011

MDX: Rewrite query with CurrentMember for multiple selection

One of famous MDX issues is that you cannot use “CurrentMember” keyword when you have multiple selection on an attribute. Read Mosha’s article to understand the full picture.
In a few words, the solution is based on using EXISTING and EXISTS keywords.
EXISTING filters  a set of members to selected members only, it’s in fact the “currentmembers” missing keyword.
EXISTS is a function that returns members of first set that are included in the second set.
I’d like to show 2 concrete samples how to rewrite your query. You can run them on Adventure Works DW OLAP DB. Anyway it's important to notice that not every query/measure can be rewritten in this way, and sometimes the best way is to change the approach to the solution.

1)      Count days in selection
We want to count how many days are included in selected period. If 2 months are in selection, we expect to get the sum of each month values.
The “Single” version takes all descendants of current member in Date (day) level and counts them.
The “Multiple” version calls to an internal measure, similar as single version, for everyone of all selected members.

MEMBER Measures.[CountDays_Single] AS

MEMBER Measures.[CountDays_Multiple] AS
sum(existing [Date].[Calendar].[Month].MEMBERS,Measures.[CountDaysM])
MEMBER Measures.[CountDaysM] AS

SELECT {Measures.[CountDays_Single],Measures.[CountDays_Multiple]} ON COLUMNS
FROM [Adventure Works]
where ({[Date].[Calendar].[Month].&[2003]&[10],[Date].[Calendar].[Month].&[2003]&[11]})

2)      Check member is selected
In this case we want to check if a specific member is selected. In multiple version it means that we want to check if it’s one of selected members.
The solution is to check if the specific member EXISTS in the set of EXISTING members.

MEMBER Measures.[RightMemberIsSelected_Single] AS
iif( [Account].[Account].currentmember is [Account].[Account].&[48],'Yes','No')

MEMBER Measures.[RightMemberIsSelected_Multiple] AS
iif( COUNT(EXISTS ([Account].[Account].&[48],{EXISTING [Account].[Account].[All Accounts].children}))>0,'Yes','No')

SELECT {Measures.[RightMemberIsSelected_Single],Measures.[RightMemberIsSelected_Multiple]} ON COLUMNS
FROM [Adventure Works]
WHERE ({[Account].[Account].&[48],[Account].[Account].&[2]})

SSIS: Performance in loading many Excel Files to SSIS

On my last project, I tested different methods in SSIS 2008 package to load 9 small Excel files to the same target table (about 10 rows in each file).
Each file is loaded in a  simple dataflow. My objective was to find the which method will run with best performance.
I tested:
·         Parallel running of all 9 dataflows (full parallel).
·         Serial connection between all dataflows (no parallel).
·         Make each dataflow as separate package, master package runs all in parallel.
·         Organize dataflows as 2 or 3 series (half parallel).
Well, here are the results:
Runtimes of ALL of above mentioned method were the same!! All run times were in the same small range, between 2:07-2:13 min.
Every dataflow separately took about 15s, in 2 series it took 25-40s for everyone, etc.. In full parallel design, first dataflow finished after 1:30 min only..
The conclusion is clear: Excel Provider works as one-instance process, that doesn’t use more resources to run faster.
Nice to know..