Upsert
Inserts new vectors or updates existing vectors in a Pinecone index. "Upsert" means the operation will insert new vectors or update existing ones if the ID already exists.
Common Properties
- Name - The custom name of the node.
- Color - The custom color of the node.
- Delay Before (sec) - Waits in seconds before executing the node.
- Delay After (sec) - Waits in seconds after executing node.
- Continue On Error - Automation will continue regardless of any error. The default value is false.
info
If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.
Inputs
- Connection Id - Connection identifier from Connect node (optional if API Key credential is provided directly).
- Host URL - Your index's host URL. Format:
https://INDEX_NAME-PROJECT_ID.svc.ENVIRONMENT.pinecone.io. You can find this in your Pinecone console under the index details. - Ids - Array of unique string identifiers for the vectors. Must have the same length as Values array.
- Values - Array of vector arrays (2D array of floats). Each inner array must have the same dimension as the index.
Options
- API Key - Pinecone API key credential (optional - use this instead of Connection Id if not using Connect node).
- Metadata - Array of metadata objects corresponding to each vector. Each metadata object can contain custom fields for filtering and retrieval. Optional but highly recommended.
- Name Space - Namespace to upsert vectors into. Namespaces allow you to partition vectors within a single index. Optional, defaults to the default namespace.
Output
- Response - Object containing the upsert response with the count of successfully upserted vectors.
{
"upsertedCount": 10
}
How It Works
The Upsert node adds or updates vectors in your Pinecone index. When executed, the node:
- Validates required inputs (Host URL, Ids, Values)
- Combines Ids, Values, and optional Metadata into vector objects
- Constructs the upsert request with namespace if specified
- Sends a POST request to the index's
/vectors/upsertendpoint - Returns the number of successfully upserted vectors
Requirements
- An existing Pinecone index in ready status
- Vector embeddings generated from your data (using OpenAI, Cohere, or other embedding models)
- IDs must be unique strings
- Vector dimensions must match the index dimension exactly
Error Handling
The node will return specific errors in the following cases:
- ErrInvalidArg - Host URL is empty
- ErrInvalidArg - Ids or Values array is empty or null
- ErrInvalidArg - Invalid Connection ID or missing API key
- ErrInternal - Response format is not valid
- ErrStatus - HTTP error from Pinecone API (dimension mismatch, invalid data, etc.)
Usage Notes
- Each ID must be unique within the namespace
- If an ID already exists, the vector and metadata are completely replaced
- All vectors in the Values array must have the same dimension
- The Ids and Values arrays must have the same length
- Metadata arrays (if provided) must also match the length of Ids/Values
- Pinecone has limits on batch size (typically 100-1000 vectors per request)
- For large datasets, split into multiple Upsert operations
Best Practices
- Always include meaningful metadata for filtering (e.g., category, date, source)
- Use descriptive IDs that you can reference later (e.g.,
doc_123,product_abc) - Batch vectors in groups of 100-200 for optimal performance
- Use namespaces to separate different data types or tenants
- Normalize vectors before upserting when using dotproduct metric
- Store original text or image references in metadata for retrieval
Metadata Best Practices
Metadata is crucial for effective vector search:
- Keep metadata fields simple and queryable
- Use consistent field names across all vectors
- Common metadata fields:
text- Original text contentsource- Document or source identifiercategory- Classification or typedate- Timestamp or dateauthor- Creator or ownerurl- Link to original content
Example: Upserting Document Embeddings
Input Preparation (JavaScript)
// Assume you have documents and their embeddings
const documents = [
{ id: "doc_1", text: "Introduction to AI", category: "education" },
{ id: "doc_2", text: "Machine Learning Basics", category: "education" },
{ id: "doc_3", text: "Deep Learning Guide", category: "advanced" }
];
// Get embeddings from OpenAI or another model
const embeddings = [
[0.1, 0.2, 0.3, ...], // 1536 dimensions for ada-002
[0.4, 0.5, 0.6, ...],
[0.7, 0.8, 0.9, ...]
];
// Prepare for Upsert node
const ids = documents.map(d => d.id);
const values = embeddings;
const metadatas = documents.map(d => ({
text: d.text,
category: d.category,
indexed_at: new Date().toISOString()
}));
// Set variables for Upsert node
msg.ids = ids;
msg.values = values;
msg.metadatas = metadatas;
Upsert Node Configuration
Connection Id: {{connection_id}}
Host URL: https://my-index-abc123.svc.us-east-1-aws.pinecone.io
Ids: {{ids}}
Values: {{values}}
Metadata: {{metadatas}}
Name Space: documents
Example: Upserting Product Embeddings
Input Preparation
const products = [
{
id: "prod_001",
name: "Wireless Headphones",
price: 99.99,
category: "electronics"
},
{
id: "prod_002",
name: "Running Shoes",
price: 79.99,
category: "sports"
}
];
// Generate embeddings from product descriptions
const embeddings = await generateEmbeddings(products.map(p => p.name));
msg.ids = products.map(p => p.id);
msg.values = embeddings;
msg.metadatas = products.map(p => ({
name: p.name,
price: p.price,
category: p.category,
in_stock: true
}));
Example: Using Namespaces
// Separate customer data by organization
msg.namespace = "org_abc123";
msg.ids = ["cust_1", "cust_2"];
msg.values = [[0.1, 0.2, ...], [0.3, 0.4, ...]];
msg.metadatas = [
{ name: "John Doe", tier: "premium" },
{ name: "Jane Smith", tier: "basic" }
];
Batch Processing Large Datasets
const BATCH_SIZE = 100;
for (let i = 0; i < allVectors.length; i += BATCH_SIZE) {
const batch = allVectors.slice(i, i + BATCH_SIZE);
msg.ids = batch.map(v => v.id);
msg.values = batch.map(v => v.embedding);
msg.metadatas = batch.map(v => v.metadata);
// Process batch with Upsert node
// Add delay between batches to avoid rate limits
}
Troubleshooting
Error: "Dimension mismatch"
- Verify your embeddings have the correct dimension
- Check that all vectors in the Values array have the same length
- Ensure the dimension matches your index configuration
Error: "Ids cannot be empty"
- Ensure Ids array is populated before the Upsert node
- Check that the variable reference is correct (e.g.,
{{ids}})
Error: "Arrays length mismatch"
- Ids, Values, and Metadata arrays must all have the same length
- Verify array construction in your preparation code
Upsert succeeds but vectors not found in queries
- Check that you're querying the correct namespace
- Wait a few seconds for index to update (eventual consistency)
- Verify the vectors were actually upserted (check response count)
Rate limit errors
- Reduce batch size
- Add delays between consecutive upserts
- Upgrade your Pinecone plan for higher throughput