Datastage: Join or Lookup or Merge or CDC

Many times this question pops up in the mind of Datastage developers.

All the above stages can be used to do the same task. Match one set of data (say primary) with another set of data(references) and see the results. DataStage normally uses different execution plans (hmm… i should ignore my Oracle legacy when posting on Datastage). Since DataStage is not so nice as Oracle, to show its Execution plan easily, we need to fill in the gap of Optimiser and analyze our requiremets. Well I have come up with a nice table , 

Most importantly its the Primary/Reference ratio that needs to be considered not the actual counts.

Primary Source Volume

Reference Volume

Preferred Method

Little (< 5 million)

Very Huge ( > 50 million)

Sparse Lookup

Little ( < 5 million)

Little (< 5 million)

Normal Lookup

Huge (> 10 million)

Little (< 5 million)

Little (< 5 million)

Huge ( > 10 million)

Normal Lookup

Huge (> 10 million)

Huge (> 10 million)

Join

Huge (> 10 million)

Huge (> 10 million)

Merge, if you want to handle rejects in reference links.

About these ads

2 Comments

Filed under Websphere DataStage

2 responses to “Datastage: Join or Lookup or Merge or CDC

  1. smarty

    what is the unit you are talking about. I assumpt you mean 5million rows. But should we use the byte size to better illustration the breakdown in selection.

  2. cristty

    very useful!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s