I have an Excel file that I need to process three times in integration services, once for projects, once for persons and once for time tracking data.
At each step I have the excel source and I do need to do some data clean up and type conversions (same in all three steps).
Is there an easy way of creating a step that does all this and that allows me to use the output as input to the other "real" steps?
I am starting to think about importing it into SQL server in a temp table, which is by all means ok, but it would be nice if I could skip that step.
-
You could use three separate data flow tasks with a file operation task first. The File Operation would be to copy the original Excel file to a temporary area. Each of the three Data Flow tasks would start with the temp file and write to the temp file (I think they may need to write to a copy).
An issue with this is that this makes the data flows operate sequentially. This might not be an issue for your Excel file, but would be an issue for processing larger numbers of rows. In such a case, it would be better to process the three "steps" in parallel, and join the results at the final stage.
Fredrik Jansson : Thanks for the answer!! -
Hi Fredrik,
This can actually be achieved using a single data flow.
You can read the Excel data source once and then use Multicast Transformation to create copies of the data set in memory. You can then process each of your three data flow branches accordingly and can also make use of parallel processing!
See the following reference for details:
http://msdn.microsoft.com/en-us/library/ms137701(SQL.90).aspx
I hope what I have detailed is clear and understandable but please feel free to contact me directly if you require further guidance.
Cheers, John
[Added in response to comments]
With regard to your further question, you can specify the precedence/flow control of your package using more than one flow. So for example, you could use the multicast task to create three data flows however and then subsequently define your precedence flow control so that all transformation tasks in flow 1 must be completed before the transformations in flow two can begin.
Fredrik Jansson : This seems to be what I am looking for, but beeing a SSIS newbie, it got me into a followup question. If I put the flows into one Data Flow, how can I make sure that the person- and project-load is executed before the TT-data load? Or maybe with a smarter design I don't have to...Fredrik Jansson : Forgot in my first comment; thanks for the answer!!Fredrik Jansson : Sorry for being slow here... Can I set precedence in a data flow? I will have one source=>filter=>multicast=>"three flows". But if I put source=>filter=>multicast into it's own data flow task, how can I get that output flow into the subsequent data flow tasks?
0 comments:
Post a Comment