To work with the tool in Foresight Analytics Platform 10, use the new interface.

In this article:

Basic Properties

Edit Input

Edit Output

Edit Output with Error Records

Checked Fields

Deduplicator

Delete Duplicates

The Delete Duplicates transformer is an object that deletes records with duplicate field values. There is one provider at the object's input and two consumers at the object's output. To provide the transformer work, on the Checked Fields page create a list of fields, which will be used to create unique combinations of values and to search for duplicate combinations. To speed up the transformer work by the selected fields, records are sorted beforehand. On the Deduplicator page set the expression that will be calculated based on duplicate records, and the rule of selecting the single record that will be sent to the data consumer.

When the Delete Duplicates transformer is used from the table below:

Key Date Value
4 Summer 1111
1 Winter 2222
5 Summer 3333
2 Winter 4444
4 Summer 1111
6 Summer 5555
5 Summer 3333
3 Winter 6666

 it can be converted into a table without duplicates:

Key Date Value
4 Summer 1111
1 Winter 2222
5 Summer 3333
2 Winter 4444
6 Summer 5555
3 Winter 6666

 and a table that contains deleted duplicates:

Key Date Value
4 Summer 1111
5 Summer 3333

Thus, duplicates are deleted if values of all fields were equal at the same time.

Basic Properties

The basic properties are used to set object name, identifier and comment.

Edit Input

To set a list of fields and link to input, use the Edit Input page.

The following parameters are available on the page:

Identifier

Link to object

Fields

NOTE. The screenshot represents edit wizard for the Repository data consumer.

Edit Output

The Edit Output page enables the user to set links to the consumer object, to which data on executing ETL tasks is loaded.

NOTE. The page is common to all data connectors and transformers, except for the Split and User Algorithm transformers. Consider setting up a list of fields and output links using the example of the Repository data source editing wizard.

The following settings are available on the page:

Identifier

Link to object

Fields

Edit Output with Error Records

The Edit Output with Error Records page is used to set links to the consumer object, to which information about error records skipped by the transformer is exported.

NOTE. The page is common to all data transformers, except for the Union and User Algorithm transformers. Consider setting up a list of fields and output links using the example of the Split data transformer editing wizard.

The following settings are available on the page:

Identifier

Link to object

Fields

Advanced settings

Checked Fields

Set the input fields, which values should be checked for duplicates, on the Checked Fields page.

To create a list of checked fields:

Click the Delete button to delete the selected field from the list of checked fields.

The selected fields will be used to create an index, which works similarly to relational table index. Records will also be sorted to speed up work with the selected fields. After deleting duplicates the records are sent to the data consumer as sorted.

If none of the checked fields is set, duplicates are calculated by combinations of values of all data provider fields. Records are not sorted in this case. Work in this mode may take long time.

Deduplicator

Set a condition, based on which records to be deleted are selected, on the Deduplicator page.

Condition is formed in the editor window, which opens by clicking the Setup button.

The rule of duplicate selection is determined by radio button in the Selection Rules group:

See also:

Getting Started with the ETL Task Tool in the Web Application | Data Transformers