Skip to main content

Remove Duplicates

Removes duplicate rows from a data table.

Common Properties

  • Name - The custom name of the node.
  • Color - The custom color of the node.
  • Delay Before (sec) - Waits in seconds before executing the node.
  • Delay After (sec) - Waits in seconds after executing node.
  • Continue On Error - Automation will continue regardless of any error. The default value is false.
info

If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.

Inputs

  • Table - The input data table from which to remove duplicate rows.

Options

  • Column Names - Optional. Specifies which columns to consider when identifying duplicates. If not provided, all columns are considered.
  • Keep - Specifies which duplicate rows to keep. Options are:
    • First - Keep the first occurrence of each duplicate
    • Last - Keep the last occurrence of each duplicate
    • None - Remove all duplicates (including the first occurrence)
  • Output Type - Specifies whether to pass the table by reference or by value. Options are:
    • Pass By Reference
    • Pass By Value

Output

  • Table - The resulting data table with duplicate rows removed.

How It Works

The Remove Duplicates node removes duplicate rows from a data table. When executed, the node:

  1. Validates that the input table is not empty and is valid
  2. Checks if the table is a reference table and handles it appropriately
  3. Converts the data table to a pandas DataFrame
  4. Removes duplicate rows based on the specified options
  5. Converts the modified DataFrame back to the data table format
  6. Returns the updated table

Requirements

  • A valid input data table

Error Handling

The node will return specific errors in the following cases:

  • Empty or invalid input table
  • Invalid table structure
  • Invalid "Keep" option selection

Usage Notes

  • The Output Type option can be set to "Pass By Reference" for handling large tables more efficiently
  • If Column Names is not specified, duplicates are identified based on all columns
  • The Keep option determines which duplicate rows are retained:
    • "First" keeps the first occurrence and removes subsequent duplicates
    • "Last" keeps the last occurrence and removes previous duplicates
    • "None" removes all duplicate rows, including the first occurrence
  • The node preserves the original column order in the output table