SCD Type 6
Dimensional data that change slowly or unpredictably are captured in Slowly Changing Dimensions (SCD) analyses. In a data warehouse environment, a dimension table has a primary key that uniquely identifies each record and other pieces of information that are known as the dimensional data.
All the update methods for the different SCD types can be accomplished using the SortCL program in IRI CoSort. In IRI Voracity there is a wizard to aid in creating the SCD job scripts used when running the SortCL program. Most variations use a full outer join to match records from the original data source with records in the updated source based on equating a key from each. Records in the update source that do not have a match need to be added to the master.
Here is an overview of how to update a dimensional file or table using SCD Type 6 where I am maintaining product costs. The update is accomplished by joining with respect to the field ProductCode.
Type 6 SCD
Type 6 is a hybrid that is a combination of Type 1, Type 2, and Type 3. It is called Type 6 because 1+2+3=6. Each record will have will have the following fields:
- ProductCode: This is the identifier key field.
- Cost: Current cost of the product.
- HistoricalCost: Cost that became effective on the StartDate for that record.
- StartDate: Date on which the HistoricalCost became effective.
- EndDate: Date on which the HistoricalCost for the record was no longer the current CurrentCost. If the HistoricalCost is still the current Cost, then the EndDate is 99991231.
- Current: Y if Cost is still current, N if it is not
For now let’s just look at the records for ProductCode J245. Let’s start when the earliest HistoricalCost was the current Cost. The HistoricalCost is the same as the Cost. There is then one record for the ProductCode J245 and it has the values shown below:
ProductCode | Cost | HistoricalCost | StartDate | EndDate | Current |
---|---|---|---|---|---|
J245 | 385.25 | 385.25 | 20100215 | 99991231 | Y |
When a new Cost needs to be implemented, the values in the update record are used to add a new current record. The CurrentCost in all records with the Product Code J245 will be changed to the Cost value from the update record while the HistoricalCost stays the same for the existing records. In addition, the EndDate for the old current record is changed to the StartDate for the new current record. In the records with ProductCode J245, the CurrentCost field in the updated Master file now has the below values:
ProductCode | Cost | HistoricalCost | StartDate | EndDate | Current |
---|---|---|---|---|---|
J245 | 425.25 | 425.25 | 20101001 | 99991231 | Y |
J245 | 425.25 | 385.25 | 20100215 | 20101001 | N |
Now we will update with the next new cost and get the following values for the J245 records:
ProductCode | Cost | HistoricalCost | StartDate | EndDate | Current |
---|---|---|---|---|---|
J245 | 450.50 | 450.50 | 20110430 | 99991231 | Y |
J245 | 450.50 | 425.25 | 20101001 | 20110430 | N |
J245 | 450.50 | 385.25 | 20100215 | 20101001 | N |
The final update creates records that have these values:
ProductCode | Cost | HistoricalCost | StartDate | EndDate | Current |
---|---|---|---|---|---|
J245 | 550.50 | 550.50 | 20120701 | 99991231 | Y |
J245 | 550.50 | 450.50 | 20110430 | 20120701 | N |
J245 | 550.50 | 425.25 | 20101001 | 20110430 | N |
J245 | 550.50 | 385.25 | 20100215 | 20101001 | N |
Here are the values in the master file prior to updating:
ProductCode | Cost | HistoricalCost | StartDate | EndDate | Current |
---|---|---|---|---|---|
C123 | 125.50 | 125.50 | 20110228 | 99991231 | Y |
F112 | 2365.00 | 2365.00 | 20120101 | 99991231 | Y |
G101 | 19.25 | 19.25 | 20110930 | 99991231 | Y |
G101 | 19.25 | 21.25 | 20110501 | 20110930 | N |
J245 | 450.50 | 450.50 | 20110430 | 99991231 | Y |
J245 | 450.50 | 425.25 | 20101001 | 20110430 | N |
J245 | 450.50 | 385.25 | 20100215 | 20101001 | N |
S022 | 98.75 | 98.75 | 20110515 | 99991231 | Y |
The update data will all have the same StartDate. The update.dat source contains the following values:
ProductCode | Cost | StartDate |
---|---|---|
F112 | 2425.00 | 20120701 |
J245 | 550.50 | 20120701 |
M447 | 101.75 | 20120701 |
S022 | 101.75 | 20120701 |
In IRI Workbench, there is a Voracity wizard to assist in the creation of scripts for updating Dimensional files and tables. This wizard is located in the Voracity dropdown on the navigation bar. First you pick the SCD type. Then the window where you select the sources that are used for processing the update is displayed. For Type 6, the target is normally the original master file or table.
With the next screen, you determine how the update data is mapped and how other field or column values are set. The update.Cost field will map to master6.CurrentCost and the update.StartDate field will map to master6.StartDate. The Flag Field is the field that is used to determine whether the record contains the most recent field values. In this case that field is master6.current. If it is the most current record, then the CurrentCost field and the HistoryCost field will have the same values. The Flag Positive Value is the value in master6.Current that determines that the field values are the most recent values for a ProductCode. When that is the case, the value is “Y” in our example and the Flag Negative Value is “N”. End Field contains the name of the field that holds the value used to determine when the Cost for the record is no longer effective and Master.EndDate holds that value. End Value is used as the value for End Field when the record is current.
The next screen is for defining the join performed with the master and update sources. You should note that both sources must be ordered with respect to ProductCode. If they are not, then you need to select NOT_SORTED in the dropdown for Sort Order Option under the data source that needs to be sorted.
Here is the first job script:
/INFILE=master6.dat /PROCESS=DELIMITED /ALIAS=master6 /FIELD=(PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"') /FIELD=(CURRENTCOST, TYPE=NUMERIC, POSITION=2, SEPARATOR=",", FRAME='\"') /FIELD=(HISTORICALCOST, TYPE=NUMERIC, POSITION=3, SEPARATOR=",", FRAME='\"') /FIELD=(STARTDATE, TYPE=ASCII, POSITION=4, SEPARATOR=",", FRAME='\"') /FIELD=(ENDDATE, TYPE=ASCII, POSITION=5, SEPARATOR=",", FRAME='\"') /FIELD=(CURRENT, TYPE=ASCII, POSITION=6, SEPARATOR=",", FRAME='\"') /INFILE=update.dat /PROCESS=DELIMITED /ALIAS=update /FIELD=(PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"') /FIELD=(COST, TYPE=ASCII, POSITION=2, SEPARATOR=",", FRAME='\"') /FIELD=(STARTDATE, TYPE=ASCII, POSITION=3, SEPARATOR=",", FRAME='\"') /JOIN FULL_OUTER NOT_SORTED master6 update WHERE MASTER6.PRODUCTCODE == UPDATE.PRODUCTCODE /OUTFILE=master6.dat # Make changes to records that have the same ProductCode # as records in the update file /PROCESS=DELIMITED /FIELD=(MASTER6.PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"') /FIELD=(UPDATE.COST, TYPE=NUMERIC, POSITION=2, SEPARATOR=",", FRAME='\"') /FIELD=(MASTER6.HISTORICALCOST, TYPE=NUMERIC, POSITION=3, SEPARATOR=",", FRAME='\"') /FIELD=(MASTER6.STARTDATE, TYPE=ASCII, POSITION=4, SEPARATOR=",", FRAME='\"') /FIELD=(ENDDATE_NEW, TYPE=ASCII, POSITION=5, SEPARATOR=",", FRAME='\"', IF MASTER6.CURRENT EQ "Y" THEN UPDATE.STARTDATE ELSE MASTER6.ENDDATE) /FIELD=(CURRENT_NEW="N", TYPE=ASCII, POSITION=6, SEPARATOR=",", FRAME='\"') /INCLUDE WHERE MASTER6.PRODUCTCODE == UPDATE.PRODUCTCODE /OUTFILE=master6.dat # Keep the records that have no updates /PROCESS=DELIMITED /FIELD=(MASTER6.PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"') /FIELD=(MASTER6.CURRENTCOST, TYPE=NUMERIC, POSITION=2, SEPARATOR=",", FRAME='\"') /FIELD=(MASTER6.HISTORICALCOST, TYPE=NUMERIC, POSITION=3, SEPARATOR=",", FRAME='\"') /FIELD=(MASTER6.STARTDATE, TYPE=ASCII, POSITION=4, SEPARATOR=",", FRAME='\"') /FIELD=(MASTER6.ENDDATE, TYPE=ASCII, POSITION=5, SEPARATOR=",", FRAME='\"') /FIELD=(MASTER6.CURRENT, TYPE=ASCII, POSITION=6, SEPARATOR=",", FRAME='\"') /OMIT WHERE MASTER6.PRODUCTCODE == UPDATE.PRODUCTCODE /OMIT WHERE MASTER6.PRODUCTCODE EQ "" /OUTFILE=master6.dat # Add the records with new product codes /PROCESS=DELIMITED /FIELD=(UPDATE.PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"') /FIELD=(UPDATE.COST, TYPE=NUMERIC, POSITION=2, SEPARATOR=",", FRAME='\"') /FIELD=(UPDATE.COST, TYPE=NUMERIC, POSITION=3, SEPARATOR=",", FRAME='\"') /FIELD=(UPDATE.STARTDATE, TYPE=ASCII, POSITION=4, SEPARATOR=",", FRAME='\"') /FIELD=(ENDDATE_NEW="99991231", TYPE=ASCII, POSITION=5, SEPARATOR=",", FRAME='\"') /FIELD=(CURRENT_NEW="Y", TYPE=ASCII, POSITION=6, SEPARATOR=",", FRAME='\"') /OMIT WHERE UPDATE.PRODUCTCODE EQ "" /INCLUDE WHERE UPDATE.PRODUCTCODE
To review,
- The value for CurrentCost will be the same for all records that have a common ProductCode
- StartDate is the date when the HistoricalCost became effective
- For the most current record of a ProductCode,
- the values for CurrentCost and HistoricalCost are the same
- the EndDate is 99991231
- The value in Current is Y
- For the records that are not the current record
- EndDate is the date when the next more recent Cost became effective
- The field Current has N for the value
The new master file will not be sorted because the new current master records were added to the bottom of the master file which will have values like this:
ProductCode | Cost | HistoricalCost | StartDate | EndDate | Current |
---|---|---|---|---|---|
C123 | 125.50 | 125.50 | 20110228 | 99991231 | Y |
F112 | 2425.00 | 2365.00 | 20120101 | 20120701 | N |
G101 | 19.25 | 19.25 | 20110930 | 99991231 | Y |
G101 | 19.25 | 21.25 | 20110501 | 20110930 | N |
J245 | 550.50 | 450.50 | 20110430 | 20120701 | N |
J245 | 550.50 | 425.25 | 20101001 | 20110430 | N |
J245 | 550.50 | 385.25 | 20100215 | 20101001 | N |
S022 | 101.75 | 98.75 | 20110515 | 20120701 | N |
F112 | 2425.00 | 2425.00 | 20120701 | 99991231 | Y |
J245 | 550.50 | 550.50 | 20120701 | 99991231 | Y |
M447 | 139.25 | 139.25 | 20120701 | 99991231 | Y |
S022 | 101.75 | 101.75 | 20120701 | 99991231 | Y |