In my previous blog focusing on the hurdles of real-time, two-way data integration, I highlighted the hurdle of how to discover the data you need translated. In most cases, discovering the net-change data is all you want to translate on a real-time basis. Remember, a two-way integration means net-change data going in two directions or more (depending on the number of systems you’re integrating) at a rate determined by how many end-users there are on each of the systems being integrated. Business systems with a significant amount of end-users can create substantial volumes of net-change data.


In order to manage the volume of data flowing between systems, you must choose a net-change model to discover just the data that needs to translate. There are traditionally four basic models to choose from, one of which will discover the net-change data. They are not all equal though, so choosing will depend somewhat on the system(s) you’re working with and somewhat depend on the data model of the system(s) with which you’re working.

The four methods are as follows, in best practice order:

  1. Application Method – this method is the most reliable and system efficient.  It relies on the system, or application, to find the net-change data and produce it in a format that is easily translated and consumed by other systems.  Some of the advantages of the Application Method are; Application Net-Change
    a)  It typically can support deletes.  In other words, records deleted by end-users can be traced to the other integrated systems, and if the business rules permit, will delete those records as well.
    b)  It is more likely to produce the net-change data in the order in which it was changed.  This could become important if the net-change data is being produced in logical but separate extracts.  One example would be if the system that is producing  the net-change data, splits a sales order into a header file and a detail file.  The consuming system will likely have referential integrity rules that would not allow details to be consumed before the header.
    c)  It has, typically, very little impact on the producing system, providing an un-noticeable effect to the end-users.
    d)  Reliability is a key factor.  If the application is running, then net-change is being discovered and made consumable for other systems. 
     
  2. Tracker-Table Method – Also known as the shadow table method, the tracker-table method is somewhat invasive, but is very useful in cases where the data model does not always supply a last modified date/time stamp on each record.  The tracker-table method does just what it says, it tracks the records that have had a change made to them.  It is invasive because it requires creating objects in the database of the system.  The idea being, create a table with at least three columns (ObjectName, RecordId,ChangeType).  This is the tracking table.  Then, place triggers on the tables which you are interested in seeing the net-change.  The triggers are designed to modify the tracker-table in the following ways;
    a)  If there is an insert on the table (a new record created),  the trigger will add a record to the tracker-table with three values; 1 - the table name where the changes came from, 2 – the recordId that was created when the record was committed, 3 – an identifier of the type of action that was performed (i.e. I = insert, U = update, D = deleted).
    b)  If there is an update made on the table, the trigger will search the tracker-table to find  the ObjectName/RecordId that corresponds to the record that was updated.  When the match is made, the trigger  then adds a ‘U’ to the ChangeType column indicating that the record corresponding to the recordid in the tracker-table, has been updated..

    Now that the net-change has been identified in the tracker-table, a mechanism needs to be developed (or a third-party tool employed i.e. Scribe or Informatica), to poll the tracker table for records with an ‘I’, ‘U’ or ‘D’, then join the recordid to the base table and retrieve the record that was changed.  Once retrieved, set the ‘ChangeType’ in the tracker table back to NULL.  Then simply choose the frequency that you want to poll the tracker–table for ‘I’, ‘U’ or ‘D’. 
     
  3. Last Update Date Method – This method is non-invasive and can be as reliable as the Application and Tracker-Table methods. With this method you simply create a mechanism (or a third-party tool employed i.e. Scribe or Informatica) to poll the data-object for records that are new, or have an last update date/time stamp that is greater than the last time you polled the data-object.  There are a few other caveats to the WHERE clause but, essentially that’s all there is to it.  Here’s the kicker.  The reason that this method is 3rd in the best-practice ranking is because it is dependent on the clocks being used in the comparison, being synchronized.  If for some odd and perplexing reason one of your applications is getting it’s date/time from the desktop clock (or some other unreliable source), you would most likely end up with records falling through the cracks.  You must make sure the clocks you are using to compared and stamp are reliable.
     
  4. Comparison Method – If you must discover your net-change records, and none of the first three methods can be applied, this may be your last option.  The Comparison Method (or snapshot method) consists of making a copy of the recordset in which the net-change data resides.  Then at some point after that first copy, take another copy of the object and compare each row/column for the differences.  This is an extremely laborious method, but at times, desperate men must do desperate things.

Lastly, what to do with concurrent updates to a record.  End-user #1  in sales, opens John Smith’s contact record to add a new cell number.  At the same moment End-user #2 in finance, opens John Smith’s record to update the billing address.  With applications like Dynamics CRM, Salesforce and SharePoint, the application logic will manage the concurrent changes and will make sure both are recognized and able to be discovered as net-change, and translated.  Other than that, it’s the ‘last out wins’ method.

In my next post in this series, we’ll take a look at working the business process rules and application rules, into the integration process, and, at what frequency should the integration run.