Image Chat Models

Multimodal Models

To utilize multi-modal models effectively in your application using the Merlin API, follow this guide to understand the available models and how to interact with them using Node.js.

Available Models

Below is the list of the available text-based model IDs you can use with the Merlin API along with a brief description:

Model ID	Description	Provider	Pricing
`gpt-4-vision-preview`	A preview version of GPT-4 designed for vision tasks.	OpenAI	Pricing
`gemini-pro-vision`	Google's multi modal model.	Google

Interacting with Text-Based Models

To interact with the provided models using Node.js, you can use the following sample code:

import { Merlin } from "merlin-node";

const apiKey = "<YOUR_MERLIN_API_KEY>"; // Replace with your API key from Merlin
const merlin = new Merlin({ merlinConfig: { apiKey } });

async function createCompletion() {
  try {
    const completion = await merlin.chat.completions.create({
      messages: [
        {
          role: "user",
          content: [
            {
              type: "text",
              text: "What’s in this image?",
            },
            {
              type: "image_url",
              image_url: {
                url: "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
              },
            },
          ],
        },
      ],
      model: "gpt-4-vision-preview", // Adjust model as needed
    });

    console.log(completion.choices[0].message.content);
  } catch (error) {
    console.error("Error creating completion:", error);
  }
}

createCompletion();

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Image Chat Models

Multimodal Models

Available Models

Interacting with Text-Based Models

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally