Skip to main content

Create Index

Creates a new Pinecone index with specified configuration including dimension, metric, and infrastructure settings.

Common Properties

  • Name - The custom name of the node.
  • Color - The custom color of the node.
  • Delay Before (sec) - Waits in seconds before executing the node.
  • Delay After (sec) - Waits in seconds after executing node.
  • Continue On Error - Automation will continue regardless of any error. The default value is false.
info

If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.

Inputs

  • Connection Id - Connection identifier from Connect node (optional if API Key credential is provided directly).
  • Environment - Your Pinecone environment (e.g., us-east-1-aws, us-west1-gcp, eu-west1-gcp). You can find this in your Pinecone console.
  • Name - The name for the new index. Must be unique within your project.
  • Dimension - The dimension of vectors to be stored in the index. Must match the output dimension of your embedding model (e.g., 1536 for OpenAI's text-embedding-ada-002).

Options

  • API Key - Pinecone API key credential (optional - use this instead of Connection Id if not using Connect node).
  • Metric - Distance metric for similarity search. Default: cosine
    • cosine - Cosine similarity (recommended for most use cases)
    • euclidean - Euclidean distance
    • dotproduct - Dot product
  • Pods - Number of pods for the index. Default: 1. More pods = higher throughput and storage.
  • Replicas - Number of replicas for high availability. Default: 1. More replicas = better availability.
  • Pod Type - Pod type/tier. Default: p1. Options include:
    • s1 - Storage-optimized (cost-effective for large datasets)
    • p1 - Performance-optimized (balanced performance and cost)
    • p2 - Premium performance (highest performance)
  • Source Collection - Optional. Name of a collection to create the index from (for cloning existing data).

Output

This node does not return any output variables. The index creation is confirmed by successful execution.

How It Works

The Create Index node initializes a new vector database index in Pinecone with your specified configuration. When executed, the node:

  1. Validates the required inputs (environment, name, dimension)
  2. Validates that dimension is a valid integer
  3. Converts pod and replica counts from strings to integers if provided
  4. Constructs the index configuration with all specified parameters
  5. Sends a POST request to Pinecone's controller API to create the index
  6. Returns success (202) if the index creation is initiated
warning

Index creation is an asynchronous operation. The node returns success when creation is initiated, but the index may take several minutes to become fully ready. Use the List or Describe Indexes node to check the index status before performing operations.

Requirements

  • A valid Pinecone account and API key
  • Sufficient quota in your Pinecone plan for the requested index configuration
  • Unique index name within your project

Error Handling

The node will return specific errors in the following cases:

  • ErrInvalidArg - Environment, Name, or Dimension is empty
  • ErrInvalidArg - Dimension, Pods, or Replicas is not a valid number
  • ErrInvalidArg - Invalid Connection ID or missing API key
  • ErrStatus - HTTP error from Pinecone API (quota exceeded, invalid parameters, etc.)

Usage Notes

  • The dimension must exactly match your embedding model's output dimension
  • Choose the metric based on your use case:
    • Cosine - Best for most text embeddings (handles different vector magnitudes)
    • Euclidean - Good when magnitude matters (e.g., image embeddings)
    • Dotproduct - Fast when vectors are pre-normalized
  • Index names must be lowercase and can contain letters, numbers, and hyphens
  • You cannot change the dimension or metric after index creation
  • Index creation typically takes 2-5 minutes to complete

Best Practices

  • Use descriptive index names (e.g., product-embeddings-ada002)
  • Start with 1 pod and 1 replica, then scale based on performance needs
  • Document your embedding model and dimension for future reference
  • Consider using s1 pods for large datasets to reduce costs
  • Use replicas (2+) for production workloads requiring high availability
  • Wait for index to be ready before upserting data (check with List/Describe Indexes)

Common Dimension Values

  • OpenAI text-embedding-ada-002: 1536
  • OpenAI text-embedding-3-small: 1536
  • OpenAI text-embedding-3-large: 3072
  • Sentence Transformers (all-MiniLM-L6-v2): 384
  • Cohere embed-english-v3.0: 1024

Example Configuration

Basic Index for OpenAI Embeddings

Environment: us-east-1-aws
Name: documents-ada002
Dimension: 1536
Metric: cosine
Pods: 1
Replicas: 1
Pod Type: p1

High-Performance Production Index

Environment: us-east-1-aws
Name: production-search
Dimension: 1536
Metric: cosine
Pods: 3
Replicas: 2
Pod Type: p2

Cost-Optimized Large Dataset

Environment: us-west1-gcp
Name: archive-embeddings
Dimension: 384
Metric: cosine
Pods: 2
Replicas: 1
Pod Type: s1

Troubleshooting

Error: "Dimension is not valid"

  • Ensure dimension is a positive integer
  • Verify it matches your embedding model's output

Error: "Index name already exists"

  • Choose a different name or delete the existing index first
  • Index names must be unique within your project

Error: "Quota exceeded"

  • Check your Pinecone plan limits
  • Upgrade your plan or delete unused indexes

Index takes too long to create

  • This is normal, especially for larger configurations
  • Large indexes (many pods) can take 5-10 minutes
  • Check status with List/Describe Indexes node