Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -404,6 +404,7 @@ Experimental support for TTS. Today the following providers are available:
* Google (default)
* macOS say command
* Elevenlabs
* OpenAI

It will use the one you configure in settings.json. If you define settings for multiple TTS services, it will not be guaranteed which one it will choose!

Expand Down Expand Up @@ -653,6 +654,35 @@ Full:
}
```

#### OpenAI

This REQUIRES a registered API key from OpenAI! See https://platform.openai.com/docs/overview

You need to add this to a file called settings.json (create if it doesn't exist), like this:

```
{
"openaiKey": "sk-12822720jhskjhs9879879879"
}
```

Replace the code above (it is just made up) with the API key you've got after registering.

Action is:

/[Room name]/say/[phrase][/[language_code]][/[announce volume]]
/sayall/[phrase][/[language_code]][/[announce volume]]

Example:

/Office/say/Hello, dinner is ready
/Office/say/Hej, maten är klar/sv-se
/sayall/Hello, dinner is ready
/Office/say/Hello, dinner is ready/90
/Office/say/Hej, maten är klar/sv-se/90

The language code doesn't matter as OpenAI will determine the language from the text. This may not always be correct but the probability increases with longer texts.

#### Google (default if no other has been configured)

Does not require any API keys. Please note that Google has been known in the past to change the requirements for its Text-to-Speech API, and this may stop working in the future. There is also limiations to how many requests one is allowed to do in a specific time period.
Expand Down
83 changes: 83 additions & 0 deletions lib/tts-providers/openai.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
'use strict';
const crypto = require('crypto');
const fs = require('fs');
const http = require('http');
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The http module is imported on this line but is never used anywhere in openai.js. Only https is used. This unused import should be removed to avoid confusion.

Suggested change
const http = require('http');

Copilot uses AI. Check for mistakes.
const https = require('https');
const path = require('path');
const fileDuration = require('../helpers/file-duration');
const settings = require('../../settings');
const logger = require('sonos-discovery/lib/helpers/logger');

function openai(phrase, language, voice = 'alloy', model = 'tts-1') {
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The voice and model parameters have default values ('alloy' and 'tts-1'), but there is no way for users to override them. The provider is called by try-download-tts.js with only (phrase, language), so these parameters will always use their defaults. Other providers read override values from settings (e.g., settings.aws.name for AWS Polly). Consider reading voice and model overrides from settings (e.g., settings.openaiVoice, settings.openaiModel) so users can customize them without code changes, and document this in the README.

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The openai function is missing a guard at the top to check whether settings.openaiKey is configured. Every other TTS provider (e.g., aws-polly.js, voicerss.js, elevenlabs.js) returns Promise.resolve() immediately if its required settings key is absent, allowing the provider chain in try-download-tts.js to fall through to the next provider. Without this guard, the openai function will always attempt to call the OpenAI API, regardless of whether an API key has been configured, resulting in failed requests with an authorization error for every TTS invocation when no key is set.

Suggested change
function openai(phrase, language, voice = 'alloy', model = 'tts-1') {
function openai(phrase, language, voice = 'alloy', model = 'tts-1') {
if (!settings.openaiKey) {
logger.warn('OpenAI TTS disabled: settings.openaiKey is not configured');
return Promise.resolve();
}

Copilot uses AI. Check for mistakes.
if (!language) {
language = 'en';
}

// Construct a filesystem neutral filename
const phraseHash = crypto.createHash('sha1').update(phrase).digest('hex');
const filename = `openai-${phraseHash}-${language}.mp3`;
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache filename includes language (e.g., openai-${phraseHash}-${language}.mp3), but as documented in the README, the language code has no effect on the OpenAI TTS output — OpenAI determines the language from the text content itself. Including language in the cache key means the same phrase will be cached multiple times when called with different language codes (or without a language code vs. with 'en'), wasting disk space and defeating the cache. The filename should only include the phrase hash, voice, and model, as those are the actual parameters that affect the output.

Suggested change
const filename = `openai-${phraseHash}-${language}.mp3`;
const filename = `openai-${phraseHash}-${voice}-${model}.mp3`;

Copilot uses AI. Check for mistakes.
const filepath = path.resolve(settings.webroot, 'tts', filename);

const expectedUri = `/tts/${filename}`;
try {
fs.accessSync(filepath, fs.R_OK);
return fileDuration(filepath)
.then((duration) => {
return {
duration,
uri: expectedUri
};
});
} catch (err) {
logger.info(`announce file for phrase "${phrase}" does not seem to exist, downloading from OpenAI TTS`);
}

return new Promise((resolve, reject) => {
const postData = JSON.stringify({
model: model,
input: phrase,
voice: voice
});
const options = {
hostname: 'api.openai.com',
path: '/v1/audio/speech',
method: 'POST',
headers: {
'Authorization': `Bearer ${settings.openaiKey}`,
'Content-Type': 'application/json',
'Content-Length': postData.length
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Content-Length is set using postData.length, which returns the number of JavaScript characters (UTF-16 code units), not the byte length of the UTF-8 encoded string. When phrase contains multi-byte characters (e.g., accented characters, CJK characters, emoji), postData.length will under-count the actual byte size sent on the wire. This should use Buffer.byteLength(postData) to get the correct byte count.

Suggested change
'Content-Length': postData.length
'Content-Length': Buffer.byteLength(postData)

Copilot uses AI. Check for mistakes.
}
};

const req = https.request(options, (res) => {
if (res.statusCode >= 200 && res.statusCode < 300) {
const file = fs.createWriteStream(filepath);
res.pipe(file);
file.on('finish', function () {
file.end();
resolve(expectedUri);
});
} else {
reject(new Error(`Download from OpenAI TTS failed with status ${res.statusCode}, ${res.statusMessage}`));
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the response status code is not in the 2xx range, the response body is never consumed (neither read nor destroyed). This can cause the underlying TCP socket to remain open and stall, potentially preventing further requests. The response stream should be consumed and drained (e.g., by calling res.resume()) before rejecting the promise, and ideally the error body from OpenAI should be included in the rejection error for better diagnostics.

Suggested change
reject(new Error(`Download from OpenAI TTS failed with status ${res.statusCode}, ${res.statusMessage}`));
const chunks = [];
res.on('data', (chunk) => {
chunks.push(chunk);
});
res.on('end', () => {
const body = chunks.length ? Buffer.concat(chunks).toString('utf8') : '';
const message = `Download from OpenAI TTS failed with status ${res.statusCode}, ${res.statusMessage}` +
(body ? `, body: ${body}` : '');
reject(new Error(message));
});
res.on('error', (err) => {
// Ensure the stream is not left hanging on error
res.resume();
reject(err);
});

Copilot uses AI. Check for mistakes.
}
});

req.on('error', (err) => {
reject(err);
});

req.write(postData);
req.end();
})
.then(() => {
return fileDuration(filepath);
})
.then((duration) => {
return {
duration,
uri: expectedUri
};
});
}

module.exports = openai;
Loading