InferenceScale

Unleash the Power of Billion-Scale Inference

Generate + embeddings or + LLM outputs in hours, not weeks. Scale your AI without limits.

How InferenceScale works

  1. 1

    Upload your data or provide an S3 link

  2. 2

    Choose your open source model and task

  3. 3

    InferenceScale processes your data at scale

  4. 4

    Monitor progress in real-time

  5. 5

    Get results URL or download directly


from inferencescale import Client, Model
import time

client = Client("YOUR_API_KEY")

job = client.create_job(
    model=Model.SENTENCE_TRANSFORMER_ALL_MPNET_BASE_V2,
    input_file="s3://your-bucket/data.parquet",
    input_column="text_to_process",
    task="embedding"
)

while not job.is_complete():
    status = job.get_status()
    print(f"Progress: {status.progress}%")
    time.sleep(60)  # Poll every 60 seconds

results_url = job.get_results_url()
print(f"Embeddings available at: {results_url}")

Why We Built This Product

This product was built because as AI engineers, it took us days to embed or process our datasets with LLMs/embedding models, and that's with our own GPUs! We wished there was a faster way. Here are some use cases where you may need to embed or process large amounts of data.

Vector Database Population

Fill vector databases with personal data efficiently. Enhance search and retrieval capabilities with high-quality embeddings.

340+ developer hours saved

Synthetic Dataset Creation

Classify and create synthetic datasets for various applications. Improve model training with high-quality, diverse data.

120+ developer hours saved

Ecommerce Description Enhancement

Improve the quality of ecommerce descriptions with AI. Enhance customer experience and increase sales with better product information.

200+ developer hours saved

Join our alpha program

Be among the first to scale your AI inference

We're onboarding a limited number of alpha users. Get early access to InferenceScale and help shape the future of large-scale AI processing.