Skip to main content

Upsert

Inserts new vectors or updates existing vectors in a Pinecone index. "Upsert" means the operation will insert new vectors or update existing ones if the ID already exists.

Common Properties

  • Name - The custom name of the node.
  • Color - The custom color of the node.
  • Delay Before (sec) - Waits in seconds before executing the node.
  • Delay After (sec) - Waits in seconds after executing node.
  • Continue On Error - Automation will continue regardless of any error. The default value is false.
info

If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.

Inputs

  • Connection Id - Connection identifier from Connect node (optional if API Key credential is provided directly).
  • Host URL - Your index's host URL. Format: https://INDEX_NAME-PROJECT_ID.svc.ENVIRONMENT.pinecone.io. You can find this in your Pinecone console under the index details.
  • Ids - Array of unique string identifiers for the vectors. Must have the same length as Values array.
  • Values - Array of vector arrays (2D array of floats). Each inner array must have the same dimension as the index.

Options

  • API Key - Pinecone API key credential (optional - use this instead of Connection Id if not using Connect node).
  • Metadata - Array of metadata objects corresponding to each vector. Each metadata object can contain custom fields for filtering and retrieval. Optional but highly recommended.
  • Name Space - Namespace to upsert vectors into. Namespaces allow you to partition vectors within a single index. Optional, defaults to the default namespace.

Output

  • Response - Object containing the upsert response with the count of successfully upserted vectors.
    {
    "upsertedCount": 10
    }

How It Works

The Upsert node adds or updates vectors in your Pinecone index. When executed, the node:

  1. Validates required inputs (Host URL, Ids, Values)
  2. Combines Ids, Values, and optional Metadata into vector objects
  3. Constructs the upsert request with namespace if specified
  4. Sends a POST request to the index's /vectors/upsert endpoint
  5. Returns the number of successfully upserted vectors

Requirements

  • An existing Pinecone index in ready status
  • Vector embeddings generated from your data (using OpenAI, Cohere, or other embedding models)
  • IDs must be unique strings
  • Vector dimensions must match the index dimension exactly

Error Handling

The node will return specific errors in the following cases:

  • ErrInvalidArg - Host URL is empty
  • ErrInvalidArg - Ids or Values array is empty or null
  • ErrInvalidArg - Invalid Connection ID or missing API key
  • ErrInternal - Response format is not valid
  • ErrStatus - HTTP error from Pinecone API (dimension mismatch, invalid data, etc.)

Usage Notes

  • Each ID must be unique within the namespace
  • If an ID already exists, the vector and metadata are completely replaced
  • All vectors in the Values array must have the same dimension
  • The Ids and Values arrays must have the same length
  • Metadata arrays (if provided) must also match the length of Ids/Values
  • Pinecone has limits on batch size (typically 100-1000 vectors per request)
  • For large datasets, split into multiple Upsert operations

Best Practices

  • Always include meaningful metadata for filtering (e.g., category, date, source)
  • Use descriptive IDs that you can reference later (e.g., doc_123, product_abc)
  • Batch vectors in groups of 100-200 for optimal performance
  • Use namespaces to separate different data types or tenants
  • Normalize vectors before upserting when using dotproduct metric
  • Store original text or image references in metadata for retrieval

Metadata Best Practices

Metadata is crucial for effective vector search:

  • Keep metadata fields simple and queryable
  • Use consistent field names across all vectors
  • Common metadata fields:
    • text - Original text content
    • source - Document or source identifier
    • category - Classification or type
    • date - Timestamp or date
    • author - Creator or owner
    • url - Link to original content

Example: Upserting Document Embeddings

Input Preparation (JavaScript)

// Assume you have documents and their embeddings
const documents = [
{ id: "doc_1", text: "Introduction to AI", category: "education" },
{ id: "doc_2", text: "Machine Learning Basics", category: "education" },
{ id: "doc_3", text: "Deep Learning Guide", category: "advanced" }
];

// Get embeddings from OpenAI or another model
const embeddings = [
[0.1, 0.2, 0.3, ...], // 1536 dimensions for ada-002
[0.4, 0.5, 0.6, ...],
[0.7, 0.8, 0.9, ...]
];

// Prepare for Upsert node
const ids = documents.map(d => d.id);
const values = embeddings;
const metadatas = documents.map(d => ({
text: d.text,
category: d.category,
indexed_at: new Date().toISOString()
}));

// Set variables for Upsert node
msg.ids = ids;
msg.values = values;
msg.metadatas = metadatas;

Upsert Node Configuration

Connection Id: {{connection_id}}
Host URL: https://my-index-abc123.svc.us-east-1-aws.pinecone.io
Ids: {{ids}}
Values: {{values}}
Metadata: {{metadatas}}
Name Space: documents

Example: Upserting Product Embeddings

Input Preparation

const products = [
{
id: "prod_001",
name: "Wireless Headphones",
price: 99.99,
category: "electronics"
},
{
id: "prod_002",
name: "Running Shoes",
price: 79.99,
category: "sports"
}
];

// Generate embeddings from product descriptions
const embeddings = await generateEmbeddings(products.map(p => p.name));

msg.ids = products.map(p => p.id);
msg.values = embeddings;
msg.metadatas = products.map(p => ({
name: p.name,
price: p.price,
category: p.category,
in_stock: true
}));

Example: Using Namespaces

// Separate customer data by organization
msg.namespace = "org_abc123";
msg.ids = ["cust_1", "cust_2"];
msg.values = [[0.1, 0.2, ...], [0.3, 0.4, ...]];
msg.metadatas = [
{ name: "John Doe", tier: "premium" },
{ name: "Jane Smith", tier: "basic" }
];

Batch Processing Large Datasets

const BATCH_SIZE = 100;

for (let i = 0; i < allVectors.length; i += BATCH_SIZE) {
const batch = allVectors.slice(i, i + BATCH_SIZE);

msg.ids = batch.map(v => v.id);
msg.values = batch.map(v => v.embedding);
msg.metadatas = batch.map(v => v.metadata);

// Process batch with Upsert node
// Add delay between batches to avoid rate limits
}

Troubleshooting

Error: "Dimension mismatch"

  • Verify your embeddings have the correct dimension
  • Check that all vectors in the Values array have the same length
  • Ensure the dimension matches your index configuration

Error: "Ids cannot be empty"

  • Ensure Ids array is populated before the Upsert node
  • Check that the variable reference is correct (e.g., {{ids}})

Error: "Arrays length mismatch"

  • Ids, Values, and Metadata arrays must all have the same length
  • Verify array construction in your preparation code

Upsert succeeds but vectors not found in queries

  • Check that you're querying the correct namespace
  • Wait a few seconds for index to update (eventual consistency)
  • Verify the vectors were actually upserted (check response count)

Rate limit errors

  • Reduce batch size
  • Add delays between consecutive upserts
  • Upgrade your Pinecone plan for higher throughput