Remove Duplicates
Removes duplicate rows from a data table.
Common Properties
- Name - The custom name of the node.
- Color - The custom color of the node.
- Delay Before (sec) - Waits in seconds before executing the node.
- Delay After (sec) - Waits in seconds after executing node.
- Continue On Error - Automation will continue regardless of any error. The default value is false.
info
If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.
Inputs
- Table - The input data table from which to remove duplicate rows.
Options
- Column Names - Optional. Specifies which columns to consider when identifying duplicates. If not provided, all columns are considered.
- Keep - Specifies which duplicate rows to keep. Options are:
- First - Keep the first occurrence of each duplicate
- Last - Keep the last occurrence of each duplicate
- None - Remove all duplicates (including the first occurrence)
- Output Type - Specifies whether to pass the table by reference or by value. Options are:
- Pass By Reference
- Pass By Value
Output
- Table - The resulting data table with duplicate rows removed.
How It Works
The Remove Duplicates node removes duplicate rows from a data table. When executed, the node:
- Validates that the input table is not empty and is valid
- Checks if the table is a reference table and handles it appropriately
- Converts the data table to a pandas DataFrame
- Removes duplicate rows based on the specified options
- Converts the modified DataFrame back to the data table format
- Returns the updated table
Requirements
- A valid input data table
Error Handling
The node will return specific errors in the following cases:
- Empty or invalid input table
- Invalid table structure
- Invalid "Keep" option selection
Usage Notes
- The Output Type option can be set to "Pass By Reference" for handling large tables more efficiently
- If Column Names is not specified, duplicates are identified based on all columns
- The Keep option determines which duplicate rows are retained:
- "First" keeps the first occurrence and removes subsequent duplicates
- "Last" keeps the last occurrence and removes previous duplicates
- "None" removes all duplicate rows, including the first occurrence
- The node preserves the original column order in the output table