After a recent migration, we had a number of duplicate account records in CRM. We were loading hundreds of thousands of records for performance testing, and then we wanted to continue to use the same database for integration development, so we wanted to clean up the data and get rid of duplicates to minimize errors with the integration. You can use the built-in Duplicate Detection to identify duplicate records, but this doesn't help you get rid of them - you would still have to manually merge or delete the records that the Duplicate Detection identifies.

One of the challenges is that, while it's relatively easy to locate duplicates, you don't want to delete all of them - you want to keep one of the potential duplicates. After doing some searching and testing, here's what I came up with. It's a very simple (and entirely unsupported) SQL query to identify the duplicate accounts.

UPDATE Account
SET deletionstatecode = 2
WHERE accountid IN 
 ( SELECT a.accountid 
  FROM Account a, Account b 
  WHERE a.accountid != b.accountid 
  AND a.accountnumber = b.accountnumber 
  AND a.statecode = b.statecode 
  AND a.accountid < b.accountid )

Here's how this works: The inner SELECT identifies accounts where the accountnumber and statecodes match, and, by using two aliases for the account table, it compares the ID of the accounts to each other. The UPDATE statement sets the deletionstatecode to a value of '2' which is a hard delete. The next time the deletion procedure runs on the database, it will clean out all the account records that have deletionstatecode set to 2.

Again, this is completed unsupported and should not be used in a production environment.

(NOTE: CRM 4.0 uses the Asynchronous Service to manage deletions in the database. The process runs every 24 hours. If you want to force the Asynch service to do the clean-up earlier, you can download a handy utility called the "Scale Group Job Editor" which lets you schedule when the deletion service will run.)