Frequently Asked Questions (FAQ)

 

 

 

 

 

Home

Introduction

Features

FAQ

Technical

Screenshots

Download

Site Map

Pricing

Purchase

Privacy

Contact Us

 

 

 

 

 

 

 

Q

What is a duplicate?

A

The answer to this may surprise you. A duplicate is a database record/row which is identical in some way to another record/row, but it needn't necessarily be completely identical to it. Consider the following two records...

 

 

 

ID

Name

Address

Date Added

98

Mr A Person

1 Acacia Avenue

10 Oct 2003

99

Mr A Person

1 Acacia Avenue

11 Oct 2003

 

 

 

The rows contain identical data in two of the four available columns. That is, they are duplicates of each other on the 'Name' and 'Address' columns. However they are not completely identical as you can see, because they have different values in the 'ID' and 'Date Added' columns.

In this particular example, there is a high probability that one record is indeed a duplicate of the other, but on the other hand, the first record might refer to a father, and the second to his son.

A set of columns used in this way ('Name', 'Address' in this example) is called an Analysis Key, and the Wizard allows you to specify up to 256 of these. But more of that later. What this example clearly illustrates is how logical duplicates can be 'sniffed out' by the Wizard once it has been told which columns are likely to be significant.

Duplicate data can be a serious problem for data users. For example, if a company mails out several copies of a letter to the same individual - the individual becomes annoyed and the company ends up looking like they don't know what they're doing.

 

 

Q

What causes duplicate data?

A

The following are probably among the most common reasons -

 

 

 

The merging of multiple databases into one - this can happen when two companies merge, for example.

Naive software that adds new data without checking for a match with existing data.

Data entry by inexperienced operators.

Data entry by un-motivated or disinterested operators.

Data entry errors which are genuine mistakes.

 

 

Q

Does the Wizard delete duplicate data from my database tables?

A

No, the Wizard will never modify or delete any data in your target table.

Consider this: if the Wizard has identified two records as being duplicates, how does it know which one should be deleted? There are two main reasons why automatic deletion is not performed -

Records will be identified as duplicates because they have one or more column values the same (in whatever columns you specified as 'analysis keys'). This doesn't mean the two records are identical, it means they are duplicates under the specified key. Therefore values in the other columns will likely be of crucial importance in deciding which record (if any) needs to be deleted.

Records will commonly be linked to records in other tables in the database, so a sensible decision on which duplicate record to delete can in general only be made by a human operator.

 

 

Q

Will the Wizard modify my database tables in any way?

A

No, as explained above, your table data is never changed.

 

 

Q

OK, does the Wizard write to my database at all then?

A

Yes, it does, but it does so away from your tables. It creates its own small admin table in the database, plus one or sometimes two result tables, and it also creates indexes on columns in those tables, as part of its normal processing. The Wizard also offers to create indexes for you on your target table, in order that it can work efficiently. Any indexes you elect to have created in this way can also be dropped after the analysis, if you so choose.

Generally speaking, you don't need to be concerned about the tables the Wizard creates or uses, because the relevant information - the analysis results - are simply presented to you as data. But you may be curious to know how they work and what they contain, and this is all covered in Help.

 

 

Q

What SQL Server™ permissions do I need to have under my login, in order to be able to use the Wizard?

A

You must have sufficient database privileges to do the following -

Create tables

Drop tables

Alter tables/columns

Create indexes

Drop indexes

To achieve this, we recommend as a minimum that your user belongs to the public, db_ddladmin and db_datareader database roles. If you are already a member of the sysadmin role, that should cater for all of these things, however you may just want to confirm this with your Database Administrator (DBA).

   

Q

Basic Analysis, Advanced Analysis... What do these terms mean?

A

This is a summary -

 

 

 

A Basic analysis is where the Wizard runs duplicates processing (an analysis) against your target table, using the analysis keys plus the other settings you specified in the various Wizard pages. The result is essentially a per-key listing of duplicates found, with each key's results ordered as you specified.

An Advanced analysis (also called a Linked analysis) includes a Basic analysis as its starting point, but it then performs a sophisticated analysis of the result data, attempting to link up the results across all of the analysis keys. This not only shows you the most significant duplicates (as determined by your specifications), but most importantly it clearly illustrates the relationships that exist (and further, some that are suggested) between duplicate records.

 

 

 

You can find full details of these analyses and the tables they use in Help.

 

 

Q

What operating systems does the Wizard run on?

A

Microsoft® Windows® 95, 98, Me, NT4 (Service Pack 6 or later), 2000, and XP

 

 

 

For answers to other, specific technical questions, please check out our Technical page.