Sunday, September 04, 2011

Execution orders for Stage Variables, Constraints and Column Derivations in Transformer stage


For people that work in DataStage ETL tools, using Transformer stage in a Server/Parallel is very common and personally I would say transformer stage will be used at least once most of the time. Have anyone clear about the execution order for Stage variables, Constraints and Column derivations in Transformer? 

Today what I wish to share is about the execution order for stage variables, constraints and column derivations.

In transformer stage, execution order for the three components I mentioned above is stage variables, constraints and column derivations. Basically, DataStage executes these from top to bottom. This is clearly shown when you double click on the Transformer stage and you would see stage variable is located the top, followed by constraints and column derivations.

Please always remember the following characteristics for each component:
  • Stage variables - Executed for every rows that we processed/extracted
  • Constraints - Can be treated as a filter condition which limits the number of rows/records coming from our input based on the business rules we defined. Stage variable can be used in constraints.
  • Column derivations - Used to get or modify our input values, i.e. concatenation of two values from inputs, set the column to constant value, etc.

Saturday, April 02, 2011

Data Warehouse Glossary

I came across this document that contains all the data warehouse related glossary. The document is compiled by Father of Data Warehouse, Bill Inmon. The link to the Data Warehouse Glossary is here. 

Hope the document will be beneficial to everyone especially those that currently working in Data Warehouse and Business Intelligence space. :) Feel free to share with me if you guys have good data warehouse and business intelligence related docs/articles. 

Sunday, February 27, 2011

Default Null in DataStage

In DataStage, the way to default null in transformer stage for Server job and Parallel job are different.

In transformer stage in Server job, @NULL is used.
In transformer stage in Parallel job, SetNull() is used.

Monday, July 26, 2010

DataStage job aborted due to ulimits - Need to be able to open at least 16 files; please check your ulimits settings for number of file descriptors

I am running a job on AIX 64-bit with DataStage V8.1 installed. The DataStage job failed with following error messages (refer to screen attached):

Fatal Error: Need to be able to open at least 16 files; please check your ulimits settings for number of file descriptors

Solutions: 

1. Check the value for ulimits in dsenv is set to 102400 or higher. If not, change the value of ulimits and do a restart on DataStage engine. Command to restart DataStage engine as below:
          • cd $DSHOME/
          • . ./dsenv
          • bin/uv admin –stop
          • bin/uv admin -start
2. Login using root and check /etc/security/limits. Look for value of nofiles and nofiles_hard in that file. If the value is lower than 102400 then modify it to 102400 for both of the entries. Add the entries if there are not in the limits file.

To resolve the problem, both of the steps must be applied in dsenv and root level.

References
http://www-01.ibm.com/support/docview.wss?uid=swg21322302
http://www-01.ibm.com/support/docview.wss?uid=swg21385849