Skip to content

07rjain/LLMlibrary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unified LLM Client

Provider-agnostic TypeScript client for Anthropic, OpenAI, and Google Gemini with shared message types, streaming, conversations, cost tracking, and pluggable session storage.

Features

  • One LLMClient surface for Anthropic, OpenAI, and Gemini
  • Canonical request/response types, including tools and multimodal parts
  • OpenAI uses the stateless Responses API under the hood while library-owned conversation state remains the source of truth
  • defineTool() helper for typed tool definitions
  • Non-streaming and streaming completions with explicit stream.cancel()
  • Conversation state with running token and cost totals
  • Automatic tool execution in conversations, including streaming pause/execute/resume
  • Context trimming via sliding window or summarisation strategies
  • Session persistence with InMemorySessionStore, PostgresSessionStore, and RedisSessionStore
  • Automatic Postgres session persistence when DATABASE_URL is present
  • Built-in framework-agnostic Session API handler with Request/Response endpoints
  • Model routing, fallback chains, weighted A/B routing, and usage logging
  • Live provider model discovery via client.models.listRemote({ provider })
  • Google Embedding 2 support through client.embed()
  • OpenAI batch speech support through client.speak() and client.transcribe()
  • Optional retrieval helpers via unified-llm-client/retrieval
  • Budget breach policies: throw, warn, or skip
  • Usage aggregation export as JSON or CSV
  • Edge-safe core imports with Node-only Postgres features loaded lazily
  • LLMClient.mock() for deterministic tests

Install

Use As A Library From GitHub

Once this repo is on GitHub, install it in another project with:

pnpm add github:07rjain/LLMlibrary

or:

pnpm add git+https://github.com/07rjain/LLMlibrary.git

The package runs prepare during Git installs, so the consumer project gets a built dist output automatically.

Develop Locally

pnpm install

Create a local environment file from the example:

cp .env.example .env

Environment

The library reads provider keys from environment variables when they are not passed directly.

ANTHROPIC_API_KEY=
OPENAI_API_KEY=
OPENAI_ORG_ID=
OPENAI_PROJECT_ID=
GEMINI_API_KEY=
DATABASE_URL=

If DATABASE_URL is set, LLMClient will automatically use PostgresSessionStore.fromEnv() for conversation() calls unless you pass an explicit sessionStore.

Quick Start

import { LLMClient } from 'unified-llm-client';

const client = LLMClient.fromEnv({
  defaultModel: 'gpt-4o',
});

const response = await client.complete({
  messages: [{ content: 'Say hello in one sentence.', role: 'user' }],
});

console.log(response.text);
console.log(response.usage.costUSD);

Use response.usage.costUSD for arithmetic, alerts, and persistence. response.usage.cost is the pre-formatted display string.

Conversations

import { LLMClient, SlidingWindowStrategy } from 'unified-llm-client';

const client = LLMClient.fromEnv({
  defaultModel: 'gpt-4o',
});

const conversation = await client.conversation({
  contextManager: new SlidingWindowStrategy({
    maxMessages: 12,
    maxTokens: 16_000,
  }),
  sessionId: 'customer-support-1',
  system: 'You are concise and operational.',
});

await conversation.send('Summarise the last user issue.');
console.log(conversation.totals);
console.log(conversation.toMarkdown());

Streaming

const stream = client.stream({
  messages: [{ content: 'Stream one sentence.', role: 'user' }],
});

for await (const chunk of stream) {
  if (chunk.type === 'text-delta') {
    process.stdout.write(chunk.delta);
  }
}

// Or cancel explicitly if the caller navigates away.
stream.cancel(new Error('Request no longer needed.'));

Usage Export

const csv = await client.exportUsage('csv', {
  tenantId: 'tenant-1',
});

console.log(csv);

Speech usage is tracked separately because the units are different from text tokens:

const speechCsv = await client.exportSpeechUsage('csv', {
  tenantId: 'tenant-1',
});

Speech

OpenAI batch text-to-speech and speech-to-text are available as explicit APIs:

const speech = await client.speak({
  input: 'Your appointment is confirmed for 10 AM.',
  model: 'gpt-4o-mini-tts',
  voice: 'alloy',
  format: 'mp3',
  estimatedOutputSeconds: 4,
});

const transcript = await client.transcribe({
  input: {
    data: audioBase64,
    filename: 'call.wav',
    mediaType: 'audio/wav',
  },
  inputAudioSeconds: 42,
  model: 'gpt-4o-mini-transcribe',
});

console.log(speech.audio); // Uint8Array
console.log(transcript.text);
console.log(speech.usage?.costUSD);

Use usage.costUSD for arithmetic and billing. usage.cost is a formatted display string. Speech audio and transcripts are not stored by the library; keep storage and retention in your application layer.

Summarisation Strategy

SummarisationStrategy accepts a summarizer() callback. In production, point that callback at a cheaper model or internal summarisation service.

import { LLMClient, SummarisationStrategy } from 'unified-llm-client';

const client = LLMClient.fromEnv({
  defaultModel: 'gpt-4o',
});

const conversation = await client.conversation({
  contextManager: new SummarisationStrategy({
    keepLastMessages: 2,
    maxMessages: 10,
    summarizer: async (messages) => {
      const summary = await client.complete({
        messages: [
          {
            content: `Summarise this conversation history:\n${JSON.stringify(messages)}`,
            role: 'user',
          },
        ],
        model: 'gpt-4o-mini',
      });

      return summary.text;
    },
  }),
});

Session Stores

Postgres

import { LLMClient, PostgresSessionStore } from 'unified-llm-client';

const client = new LLMClient({
  defaultModel: 'gpt-4o',
  sessionStore: PostgresSessionStore.fromEnv(),
});

Redis

RedisSessionStore is bring-your-own-client. Pass any Redis client that implements get(), set(), del(), and either scanIterator() or keys().

import { LLMClient, RedisSessionStore } from 'unified-llm-client';

const sessionStore = new RedisSessionStore({
  client: redisClient,
  ttlSeconds: 3600,
});

const client = new LLMClient({
  defaultModel: 'gpt-4o',
  sessionStore,
});

Session API

The package also exports a framework-agnostic session API handler. It accepts standard Request objects and returns standard Response objects, so it can be mounted in Express, Fastify, Hono, Next.js route handlers, Cloudflare Workers, or plain Node HTTP adapters.

import { LLMClient, PostgresSessionStore, createSessionApi } from 'unified-llm-client';

const store = PostgresSessionStore.fromEnv();
const client = LLMClient.fromEnv({
  defaultModel: 'gpt-4o',
  sessionStore: store,
});

const sessionApi = createSessionApi({
  client,
  sessionStore: store,
});

const response = await sessionApi.handle(
  new Request('https://example.test/sessions', {
    body: JSON.stringify({ sessionId: 'demo-session', system: 'Be concise.' }),
    headers: { 'content-type': 'application/json' },
    method: 'POST',
  }),
);

Supported endpoints include:

  • POST /sessions
  • POST /sessions/{id}/message
  • GET /sessions/{id}
  • GET /sessions/{id}/messages
  • DELETE /sessions/{id}
  • POST /sessions/{id}/compact
  • POST /sessions/{id}/fork
  • GET /sessions

For the full endpoint contract and the OpenAI Responses-style mapping notes, see SESSION_API.md.

Remote Model Discovery

Use client.models.listRemote({ provider }) when you want the provider's current live model list instead of the checked-in local registry.

const googleModels = await client.models.listRemote({
  provider: 'google',
});

console.log(googleModels[0]?.id);
console.log(googleModels[0]?.supportedActions);

listRemote() is discovery-only. It does not auto-register those models into the local routing registry, so complete() and stream() still require either a known built-in model or a manual client.models.register(...) step.

Embeddings

Google Embedding 2 is the current embeddings surface for v1.

import { LLMClient } from 'unified-llm-client';

const client = LLMClient.fromEnv({
  defaultEmbeddingModel: 'gemini-embedding-2',
});

const embedding = await client.embed({
  input: 'Refunds are available for 30 days after purchase.',
  purpose: 'retrieval_document',
  providerOptions: {
    google: {
      title: 'Refund Policy',
    },
  },
});

console.log(embedding.embeddings[0]?.values.length);
console.log(embedding.usage?.inputTokens);

client.embed() is separate from complete() and conversation(). Embedding and generation can use different providers in the same application flow.

Retrieval Helpers

The package also ships optional app-layer retrieval helpers. They do not hide retrieval inside LLMClient; they help you compose retrieval before generation.

import { LLMClient } from 'unified-llm-client';
import {
  chunkText,
  createDenseRetriever,
  createInMemoryKnowledgeStore,
  createPostgresKnowledgeStore,
  formatRetrievedContext,
} from 'unified-llm-client/retrieval';
import { cleanText, stripHtml } from 'unified-llm-client/chunking';

const client = LLMClient.fromEnv({
  defaultEmbeddingModel: 'gemini-embedding-2',
  defaultModel: 'gpt-4o',
});

const knowledgeStore = createPostgresKnowledgeStore({
  connectionString: process.env.DATABASE_URL,
});

await knowledgeStore.ensureSchema();

const retriever = createDenseRetriever({
  embed: client,
  embedding: {
    model: 'gemini-embedding-2',
  },
  store: knowledgeStore,
});

const results = await retriever.search({
  filter: {
    botId: 'bot-1',
    embeddingProfileId: 'profile-2026-04-24',
    knowledgeSpaceId: 'kb-support',
    tenantId: 'tenant-1',
  },
  query: 'What is the refund window?',
  topK: 4,
});

const context = formatRetrievedContext(results, {
  maxResults: 4,
  maxTokens: 900,
});

const answer = await client.complete({
  messages: [
    {
      content: `Question: What is the refund window?\n\n${context.text}`,
      role: 'user',
    },
  ],
});

Chunking helpers are now available as a separate subpath:

const cleaned = cleanText(stripHtml('<h1>Refund Policy</h1><p>Refunds last 30 days.</p>'));
const chunks = chunkText(cleaned, {
  chunkSize: 900,
  overlap: 120,
});

The retrieval module currently includes:

  • KnowledgeStore
  • Retriever
  • chunkText()
  • cleanText()
  • stripHtml()
  • createDenseRetriever()
  • createHybridRetriever()
  • createInMemoryKnowledgeStore()
  • createPostgresKnowledgeStore()
  • InMemoryKnowledgeStore
  • PostgresKnowledgeStore
  • createPgvectorHnswIndexSql()
  • mergeRetrievalCandidates()
  • formatRetrievedContext()

createDenseRetriever() and createHybridRetriever() now also accept optional rerank hooks, and PostgresKnowledgeStore now exposes active-profile and reindex helpers such as activateEmbeddingProfile(), getActiveEmbeddingProfile(), listKnowledgeSources(), and markKnowledgeSourcesNeedingReindex().

activateEmbeddingProfile() now throws a clear runtime error when the target knowledge space is missing or when the embedding profile does not belong to that scoped space. It no longer fails silently on scope mismatches.

When you use PostgresKnowledgeStore, search requests must stay fully scoped. Pass tenantId, botId, knowledgeSpaceId, and embeddingProfileId, and use the same embedding profile for chunk ingestion and live query embedding. The retrieval helpers intentionally do not take over chunking, ingestion queues, provider-managed reranking services, or automatic retrieval inside complete() / conversation().

For local demos, tests, or single-process apps that do not need Postgres yet, you can swap in the in-memory store:

const knowledgeStore = createInMemoryKnowledgeStore();

InMemoryKnowledgeStore keeps chunks and vectors in process memory, supports the same retriever-facing search interface, and mirrors the main upsert helpers. It is useful for local development and examples, but it is not durable and should not replace PostgresKnowledgeStore for production retrieval.

formatRetrievedContext() also supports explicit score display modes so users do not misread raw retrieval scores as probabilities:

const context = formatRetrievedContext(results, {
  includeScores: true,
  scoreDisplay: 'relative_top_1',
});
  • scoreDisplay: 'raw' prints labels such as raw dense similarity, raw lexical relevance, or raw fused rank score
  • scoreDisplay: 'relative_top_1' prints a display-only score normalized against the top shown result and clearly marks it as not a probability

Runtime Support

  • Edge/browser-safe core surface: LLMClient, Conversation, routing, in-memory storage, utilities, and SessionApi
  • Node-only persistence: PostgresSessionStore and PostgresUsageLogger
  • Runtime safety probe: pnpm edgecheck

Prompt Caching Status

  • OpenAI automatic prompt caching works on supported models, and request-side hints are exposed via providerOptions.openai.promptCaching.
  • Anthropic block-level and top-level cache_control are exposed for cacheable content and tool definitions.
  • Gemini implicit caching benefits supported models automatically, and explicit cache usage is exposed via providerOptions.google.promptCaching.cachedContent plus client.googleCaches.
  • Implementation planning lives in docs/PROMPT_CACHING_REPORT.md and the active task tracker lives in prompt_caching_todo.md.

Prompt Caching Examples

OpenAI request hints:

const openaiResponse = await client.complete({
  model: 'gpt-4o',
  messages: [{ content: 'Summarize the support FAQ.', role: 'user' }],
  providerOptions: {
    openai: {
      promptCaching: {
        key: 'support-faq-v1',
        retention: '24h',
      },
    },
  },
});

Anthropic block and tool cache control:

const anthropicResponse = await client.complete({
  model: 'claude-sonnet-4-6',
  messages: [
    {
      role: 'user',
      content: [
        {
          type: 'document',
          url: 'https://example.com/policy.pdf',
          mediaType: 'application/pdf',
          cacheControl: { type: 'ephemeral', ttl: '1h' },
        },
        {
          type: 'text',
          text: 'Answer using the cached policy document.',
        },
      ],
    },
  ],
  providerOptions: {
    anthropic: {
      cacheControl: { type: 'ephemeral' },
    },
  },
  tools: [
    {
      name: 'lookup_policy',
      description: 'Look up policy details',
      cacheControl: { type: 'ephemeral' },
      parameters: {
        type: 'object',
        properties: {
          topic: { type: 'string' },
        },
        required: ['topic'],
      },
    },
  ],
});

Gemini explicit cache lifecycle plus reuse:

const cache = await client.googleCaches.create({
  model: 'gemini-2.5-flash',
  displayName: 'Support FAQ',
  messages: [{ content: 'Refunds are available for 30 days.', role: 'user' }],
  ttl: '3600s',
});

const geminiResponse = await client.complete({
  model: 'gemini-2.5-flash',
  messages: [{ content: 'What is the refund window?', role: 'user' }],
  providerOptions: {
    google: {
      promptCaching: {
        cachedContent: cache.name,
      },
    },
  },
});

Gemini cache names are returned in the provider format cachedContents/{id} and can be passed back directly as cachedContent. Cache creation accepts the normal library model id such as gemini-2.5-flash; the Gemini adapter normalizes it to models/{model} for the cache API. Per-request generation cost includes cached-read discounts when cachedContentTokenCount is returned, but it does not include cache creation or persistence cost.

Docs

Quality And Performance

pnpm sizecheck
pnpm depcheck
pnpm edgecheck
pnpm bench:complete
pnpm bench:first-token
pnpm bench:memory
pnpm bench:concurrency
pnpm pricecheck

Optional live-provider smoke tests stay opt-in:

LIVE_TESTS=1 pnpm test:live
LIVE_TESTS=1 pnpm test:embeddings:live
pnpm test:prompt-caching:live

Testing

pnpm typecheck
pnpm lint
pnpm test
pnpm build

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors