Duplicate Record Resolution

The most effective way to eliminate duplicates from a database composed of OCLC records is to use the OCLC control number found in the 001 field. The same applies for any database in which the records have a unique control number.

Two other control number deduping keys are sometimes used to eliminate multiple occurrences of the same record. LTI uses enhanced LCCN and ISBN keys to identify and eliminate duplicates from databases lacking unique control numbers. The LCCN and ISBN control number keys are supplemented with information taken from the title field and date of publication. Both are designed to reduce false matches.

For records lacking a control number, it may be necessary to adopt a non-numeric deduping key. Non-numeric deduping relies on the creation of a composite identification key. The more sophisticated the key, the greater the probability that only duplicate records will be detected and eliminated.

LTI's non-numeric deduping key contains 52 characters. It combines fixed and variable field information, including data extracted from the title, imprint, and physical description. Non-numeric deduping means making trade-offs between precision and recall and is not as effective as control number deduping. It is most useful in merging records created from different sources that have failed to match on one of the standard library control number fields.