In an integrated IT landscape data is often replicated on different systems. It is for example not uncommon that an order system some customer information needs from the CRM system. Online interfaces are often used to push out changes of the data to the interested (subscribed) systems, so that in theory all systems are in sync. In the real world however, it is possible that some of this replicated data gets lost, either by system errors, interfacing faults, etcetera. Checking whether or not the systems are in sync, and what data is missing (if any), can be a daunting task. In this blog post I’ll demonstrate a simple way of using Oracle Data Integrator 11g (ODI) to check which data entities are not in sync.
ODI is a great tool for batch-wise data integration. It is very powerful because the so called Knowledge Modules are fully customisable. In my (very) simple example I’ll focus on the synchronisation of HR data: employee data to be precise. The same principle can also be applied to other areas like stock comparison of course. The overall approach is easily extendible.
The basic idea of what I’ll do in ODI is the following:
The comparison is always done against the data master (that is: the system that contains the single source of truth about a data entity). The ODI Interface is build from the source table in the source system to the temporary sync result table on the target. During the Control phase, a custom Check Knowledge Module (CKM) will be used to check for differences in the master table, and will store the result in the sync result table.
In my example I’ll only send the common identifier. Therefore my sync result table (and its temporary edition) is pretty simple:
CREATE TABLE "SYNC_RESULT" ( "SRC" VARCHAR2(20 BYTE), "SRC_ID" NUMBER, "DIFF" NUMBER )
The DIFF column indicates what kind of difference it is: a “1″ indicates that the item is not in the master system, and a “0″ indicates that it is not in the source system. The Interface flow simply maps the identifier to the target table, with a fixed value for the SRC column, which indicates the originating system. This value is also used later on in the CKM.
Basically there is nothing really special here. The trick is now to use a custom CKM to check the entries in the temporary sync result table with the master table, and put the results in the actual sync result table. The CKM has two OPTONS, one is a flag whether or not to delete the old entries, and a string indicating the source system (equal to the SRC column): I called it the “SOURCE_APP” OPTION.
In my simple example, this CKM basically performs the following steps: remove the old data in the sync result table; check the new values in the temporary sync result table against the master table en report the missing entries in the source and master table as the sync results.
The master table is the well known EMPLOYEES table from the HR schema that comes out of the box with an Oracle database. The CKM steps are therefore:
“Delete the old sync results”:
DELETE FROM SYNC_RESULT WHERE SRC = <%=odiRef.getOption("SOURCE_APP")%>
“Check missing in TARG”:
INSERT INTO SYNC_RESULT SELECT <%=odiRef.getOption("SOURCE_APP")%>, S.SRC_ID, 1 FROM SYNC_RESULT_TMP S WHERE S.SRC_ID not in (SELECT EMPLOYEE_ID FROM EMPLOYEES) AND S.SRC = <%=odiRef.getOption("SOURCE_APP")%>
“Check missing in SRC”:
INSERT INTO SYNC_RESULT SELECT <%=odiRef.getOption("SOURCE_APP")%>, E.EMPLOYEE_ID, 0 FROM EMPLOYEES E WHERE E.EMPLOYEE_ID NOT IN (SELECT SRC_ID FROM SYNC_RESULT_TMP WHERE SRC = <%=odiRef.getOption("SOURCE_APP")%>)
“Remove old values in TMP result”:
DELETE FROM SYNC_RESULT_TMP WHERE src = <%=odiRef.getOption("SOURCE_APP")%>
The result of this ODI Interface is that the sync result table now contains the rows which are not in sync. These results can be by used in reports, or used in another ODI Interface(s) to sync the missing entities to the different systems by joining the results with the master table. Also, adding new systems for comparison is pretty easy: just create a new ODI Interface, and reuse the CKM. Just make sure the SOURCE_APP option and SRC column in the result table are correctly set and unique.Quickly compare datasets using ODI,