🦙

LlamaIndex

Optimize your RAG pipeline costs with Promptly

Integrate Promptly with LlamaIndex to reduce LLM costs in RAG applications. Especially effective for long-context queries where context pruning and caching deliver maximum savings.

Setup Guide

Install LlamaIndex

Install the LlamaIndex OpenAI LLM integration.

bash

pip install llama-index-llms-openai

Configure the LLM

Point LlamaIndex's OpenAI LLM to Promptly. All index queries, summarization, and agent calls are automatically optimized.

python

from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="gpt-4o",
    api_key="sk-promptly-...",
    api_base="https://api.getpromptly.in/v1",
)

Use in a RAG Pipeline

RAG queries with large retrieved contexts benefit the most from Promptly's optimization - context pruning trims irrelevant chunks, and semantic caching catches repeated queries.

python

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents, llm=llm)

query_engine = index.as_query_engine()
response = query_engine.query("What are the key findings?")