KB1011
Identify duplicate records in an import source file using Excel
Summary
If an import fails due to duplicate records in the source file, the duplicate records must be identified and addressed in order to run the import successfully. This article provides guidance on how to identify and resolve those duplicates by viewing the import source file in Excel.
Details
Tables in Axiom Contract Management can only have one record per unique combination of the table’s key columns. Import utilities insert new records or update existing records based on these keys. If multiple records share the same key values, then by default, the import attempts to aggregate these records into a single row.
To perform this aggregation, numeric data columns are summed. However, certain column data types cannot be summed, such as strings (text), Booleans (true/false), and dates. If the records with duplicate keys have different string, date, or Boolean values, the import will fail with an error such as the following:
Example import error when source data contains duplicate key values
NOTE: Even though the error specifies a row number, that row number is in reference to the import temp table and not the source file, so it may not coincide with the row number in the source file.
Additionally, in some cases imports do not allow aggregation of duplicate key values—such as when the destination table for the import is a reference table, or when the option to aggregate values is disabled. If an import does not allow aggregation, then duplicate values cannot be present in the source data at all, even if it is possible to successfully aggregate the values. The option Aggregate rows on final save on the General tab determines whether aggregation is allowed when importing to a data table.
Import configured to allow aggregation of duplicate keys
Resolution
In order to run the import successfully, the rows in the source file with duplicate keys must be identified and addressed. You can use Excel for this purpose if your source file is an XLSX or CSV file, or some other delimited file format that can be loaded into Excel.
Before you begin
In order to evaluate the source data, you must know the destination table for the import and its key columns. If you already know this information, skip to the next set of steps.
-
To identify the destination table for the import, check the Mapping tab. This step assumes that you have security permission to access the import source file.
-
To identify the key columns for the destination table, open the table using Tables > TableName > Open in Spreadsheet and note the columns with the blue background. Access to this feature requires a certain level of table administration permissions.
TIP: Alternatively, you can view the key columns on the Mapping tab of the import, by clicking the down arrow on the Destination Column list. The key columns of the table have a yellow key before their name. When opening and closing the Destination Column list, take care not to accidentally change the destination column assignment.
Find and resolve duplicate records
Once the key columns have been identified, review the source file to find the rows with duplicate key values, and correct them as appropriate.
- Open the import source file in Excel.
-
To the right of the data, add a new column to concatenate the key values for each row. In this example, columns A and B are the key columns. A formula has been added to column E to concatenate these values together.
- Copy the concatenation formula down the worksheet to the last row of data.
-
To the right of the concatenation column, add another new column to count the number of occurrences of each unique concatenated value using a COUNTIF function. For example:
- Copy the COUNTIF formula down the worksheet as well. Any row with a value in this column greater than 1 indicates duplicate key values.
-
To easily view the rows with duplicate keys, apply an Excel column filter to the header row. Select the top row, then go to the Data tab and click Filter (or as a shortcut, press Ctrl+Shift+L).
- Filter the COUNTIF column to only show results greater than 1, and then sort the concatenation column in ascending order.
-
Compare the duplicate rows and look for any string (text), Boolean (true/false), or date fields that don’t match. These are the rows that need to be addressed in order to run the import successfully. In this example, the employee names do not match for the duplicate key combination of Dept 28000 and EmpID 1010.
-
Evaluate the duplicate rows to determine how to resolve the issue. In the example above, there could be several different explanations for the duplicate data:
-
Either Jon or John could be a misspelled name for the same person, and the misspelling needs to be corrected (which would allow the rows to be aggregated once the strings match).
-
Jon and John could be two different people, which means that likely the EmpID on one of the rows is incorrect and needs to be fixed (which would result in two unique rows instead of duplicate keys).
-
There could be an issue with the process that generates the import file from the source system, causing duplicate rows to be created when they should not be.
Once you identify the issue, you may be able to correct it directly in the source file, or it may need to be corrected in the source system (in which case a new source file must be generated for the import after the correction).
NOTE: This process assumes that the import is configured to aggregate duplicate rows that can be successfully aggregated. If the import does not allow aggregation, then all duplicate rows must be addressed to remove the duplicates.
-
- Re-run the import process with the corrected source file.
Outcome
Once the duplicate rows in the source file have been resolved, the import will complete successfully.
If your source file cannot be opened in Excel, or if the import source is not a file, or if the duplicate records are not present in the source and are instead being created by import transform steps, then inspecting the import temp table while running the import in development mode may be an alternate solution.
See also
Article information
Category |
Imports |
Applies To |
All versions |
Tags |
Source File, Duplicate Keys, Save Error |
Issue Number(s) |
N/A |