In this article:

Basic Properties

Edit Input

Edit Output

Edit Output with Error Records

Settings

Checked Fields

Deduplicator

Deleting Duplicates

The Delete Duplicates transformer is an object that deletes duplicate data. There is one provider at the object's input and two consumers at the object's output. This operation deletes duplicate values based on a specified index. A condition, based on which the records are selected, is generated to select records to be deleted.

To ensure efficient duplicate deletion, provider data should be ordered by index. After executing the operation the data remains ordered.

On using the Delete Duplicates transformer the table below:

Key Date Value
4 Summer 1111
1 Winter 2222
5 Summer 3333
2 Winter 4444
4 Summer 1111
6 Summer 5555
5 Summer 3333
3 Winter 6666

 can be converted into a table without duplicates:

Key Date Value
4 Summer 1111
1 Winter 2222
5 Summer 3333
2 Winter 4444
6 Summer 5555
3 Winter 6666

 and a table that contains deleted duplicates:

Key Date Value
4 Summer 1111
5 Summer 3333

Thus, duplicates are deleted, if values of all fields were equal at the same time.

Basic Properties

In basic properties object name, identifier and comment are set.

Edit Input

To set a list of fields and link to input, use the Edit Input page.

 

The following parameters are available on the page:

Identifier

Link to object

Fields

NOTE. The screenshot represents edit wizard for the Repository data consumer.

Edit Output

To set a list of fields and output link, use the Output Edit page.

The following settings are available on the page:

Identifier

Link to object

Fields

NOTE. The screenshot represents edit wizard for the Repository data provider.

Edit Output with Error Records

The Edit Outputs with Error Records page is used to set a link with a consumer, to which information about error records skipped by the transformer is exported:

Specify output identifier with errors and select available link with the consumer object, to which information about error records is exported.

Settings

The Settings button enables the user to determine advanced settings used on occurring error records:

Specify the maximum number of error records, information about which is exported. By default the -1 value is set, in this case information about all error records is exported.

NOTE. If there is a large number of error records, information export may slow down ETL task runtime.

If the maximum number of output records is set, select the action executed if this number is exceeded. By default, records output is not stopped.

NOTE. The screenshot represents edit wizard for the Split data transformer.

Checked Fields

Set the input fields which values should be checked for duplicates, on the Checked Fields page.

To create a list of checked fields:

Click the Delete button to delete a selected field form the list of checked fields.

If no checked field is defined, an attempt to go to the next page brings up a confirmation dialog box.

Deduplicator

Set a condition, based on which records to be deleted are selected, on the Deduplicator page.

Condition is formed in the editor, dialog box, which opens on clicking the button.

The rule of duplicate selection is determined by radio button in the Selection Rules group:

See also:

Data Transformers