diff --git a/README.md b/README.md index f4320c54..c10ba0c3 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,47 @@ +### Change log [2025-11-30 12:16:49] +1. Item Updated: `histogram_data_drift` (from version: `1.0.0` to `1.0.0`) +2. Item Updated: `openai_proxy_app` (from version: `1.0.0` to `1.0.0`) +3. Item Updated: `count_events` (from version: `1.0.0` to `1.0.0`) +4. Item Updated: `evidently_iris` (from version: `1.0.0` to `1.0.0`) + +### Change log [2025-11-30 12:16:40] +1. Item Updated: `test_classifier` (from version: `1.1.0` to `1.1.0`) +2. Item Updated: `sklearn_classifier` (from version: `1.2.0` to `1.2.0`) +3. Item Updated: `model_server_tester` (from version: `1.1.0` to `1.1.0`) +4. Item Updated: `azureml_serving` (from version: `1.1.0` to `1.1.0`) +5. Item Updated: `describe_dask` (from version: `1.2.0` to `1.2.0`) +6. Item Updated: `batch_inference` (from version: `1.8.0` to `1.8.0`) +7. Item Updated: `v2_model_server` (from version: `1.2.0` to `1.2.0`) +8. Item Updated: `gen_class_data` (from version: `1.3.0` to `1.3.0`) +9. Item Updated: `send_email` (from version: `1.2.0` to `1.2.0`) +10. Item Updated: `tf2_serving` (from version: `1.1.0` to `1.1.0`) +11. Item Updated: `aggregate` (from version: `1.4.0` to `1.4.0`) +12. Item Updated: `open_archive` (from version: `1.2.0` to `1.2.0`) +13. Item Updated: `describe` (from version: `1.4.0` to `1.4.0`) +14. Item Updated: `v2_model_tester` (from version: `1.1.0` to `1.1.0`) +15. Item Updated: `text_to_audio_generator` (from version: `1.3.0` to `1.3.0`) +16. Item Updated: `pii_recognizer` (from version: `0.4.0` to `0.4.0`) +17. Item Updated: `github_utils` (from version: `1.1.0` to `1.1.0`) +18. Item Updated: `sklearn_classifier_dask` (from version: `1.1.1` to `1.1.1`) +19. Item Updated: `azureml_utils` (from version: `1.4.0` to `1.4.0`) +20. Item Updated: `question_answering` (from version: `0.5.0` to `0.5.0`) +21. Item Updated: `structured_data_generator` (from version: `1.6.0` to `1.6.0`) +22. Item Updated: `arc_to_parquet` (from version: `1.5.0` to `1.5.0`) +23. Item Updated: `silero_vad` (from version: `1.4.0` to `1.4.0`) +24. Item Updated: `load_dataset` (from version: `1.2.0` to `1.2.0`) +25. Item Updated: `auto_trainer` (from version: `1.8.0` to `1.8.0`) +26. Item Updated: `feature_selection` (from version: `1.6.0` to `1.6.0`) +27. Item Updated: `translate` (from version: `0.3.0` to `0.3.0`) +28. Item Updated: `describe_spark` (from version: `1.1.0` to `1.1.0`) +29. Item Updated: `pyannote_audio` (from version: `1.3.0` to `1.3.0`) +30. Item Updated: `onnx_utils` (from version: `1.3.0` to `1.3.0`) +31. Item Updated: `batch_inference_v2` (from version: `2.6.0` to `2.6.0`) +32. Item Updated: `transcribe` (from version: `1.2.0` to `1.2.0`) +33. Item Updated: `model_server` (from version: `1.2.0` to `1.2.0`) +34. Item Updated: `mlflow_utils` (from version: `1.1.0` to `1.1.0`) +35. Item Updated: `noise_reduction` (from version: `1.1.0` to `1.1.0`) +36. Item Updated: `hugging_face_serving` (from version: `1.1.0` to `1.1.0`) + ### Change log [2025-11-26 11:49:13] 1. Item Updated: `histogram_data_drift` (from version: `1.0.0` to `1.0.0`) 2. Item Updated: `openai_proxy_app` (from version: `1.0.0` to `1.0.0`) diff --git a/functions/development/noise_reduction/1.1.0/static/documentation.html b/functions/development/noise_reduction/1.1.0/static/documentation.html index a772e518..c798c0e2 100644 --- a/functions/development/noise_reduction/1.1.0/static/documentation.html +++ b/functions/development/noise_reduction/1.1.0/static/documentation.html @@ -165,32 +165,7 @@
Bases: ReduceNoiseBase
Clean the audio from noise. Here you should implement the noise reduction algorithm.
-data – The audio data to clean.
-The cleaned audio.
-Load the audio from a file.
-file – The file to load the audio from.
-A tuple of: -- the audio data -- the sample rate
-Save the audio to a file.
-audio – The audio to save.
target_path – The target path to save the audio to.
Bases: ReduceNoiseBase
Clean the audio from noise. Here you should implement the noise reduction algorithm.
-data – The audio data to clean.
-The cleaned audio.
-Load the audio from a file.
-file – The file to load the audio from.
-A tuple of: -- the audio data -- the sample rate
-Save the audio to a file.
-audio – The audio to save.
target_path – The target path to save the audio to.
Bases: object
Base class for noise reduction. -This class is aimed to be inherited by specific noise reduction algorithms. -You must implement the following methods: -- clean_audio: The method to clean the audio, where the noise reduction algorithm is implemented. -- save_audio: The method to save the audio to a file. -- load_audio: The method to load the audio from a file.
-After implementing the above methods, you can use the reduce_noise method to reduce noise from audio files.
-Clean the audio from noise. Here you should implement the noise reduction algorithm.
-data – The audio data to clean.
-The cleaned audio.
-Load the audio from a file.
-file – The file to load the audio from.
-A tuple of: -- the audio data -- the sample rate
-Reduce noise from the given audio file.
-audio_file – The audio file to reduce noise from.
-A tuple of: -- a boolean indicating whether an error occurred -- a tuple of:
--- --
-- -
audio file name
- -
target path in case of success / error message in case of failure.
Remove silence sections from the audio.
-audio – The audio to remove silence from.
-The audio without silence.
-Save the audio to a file.
-audio – The audio to save.
target_path – The target path to save the audio to.
Reduce noise from audio file or directory containing audio files. -The audio files must be in .wav format. -The cleaned audio files will be saved in the target_directory. -For information about the noise reduction algorithm see: -timsainb/noisereduce -Notice that the saved files are in wav format, even if the original files are in other format.
-audio_source – path to audio file or directory containing audio files
target_directory – path to directory to save the cleaned audio files.
sample_rate – Number of samples in one second in the audio file. -Pass None to keep the original sample rate.
duration – Duration of the audio file to clean in seconds. -Pass None to keep the original duration.
channel – Channel to clean. Pass the number of the channel to clean. -To clean all channels pass None.
silence_threshold – The threshold to remove silence from the audio, in dB. -If None, no silence removal is performed.
use_multiprocessing – Number of processes to use for cleaning the audio files. -If 0, no multiprocessing is used.
verbose – Verbosity level. If True, display progress bar.
Reduce noise from audio files using DeepFilterNet. -For more information about the noise reduction algorithm see: -Rikorose/DeepFilterNet -Notice that the saved files are in wav format, even if the original files are in other format.
-audio_source – path to audio file or directory of audio files
target_directory – path to target directory to save cleaned audio files
pad – whether to pad the audio file with zeros before cleaning
atten_lim_db – maximum attenuation in dB
silence_threshold – the threshold to remove silence from the audio, in dB. If None, no silence removal is -performed.
use_multiprocessing – Number of processes to use for cleaning the audio files. -If 0, no multiprocessing is used.
verbose – verbosity level. If True, display progress bar and logs.
kwargs – additional arguments to pass to torchaudio.load(). For more information see: -https://pytorch.org/audio/stable/generated/torchaudio.load.html
Bases: ReduceNoiseBase
Clean the audio from noise. Here you should implement the noise reduction algorithm.
-data – The audio data to clean.
-The cleaned audio.
-Load the audio from a file.
-file – The file to load the audio from.
-A tuple of: -- the audio data -- the sample rate
-Save the audio to a file.
-audio – The audio to save.
target_path – The target path to save the audio to.
Bases: ReduceNoiseBase
Clean the audio from noise. Here you should implement the noise reduction algorithm.
-data – The audio data to clean.
-The cleaned audio.
-Load the audio from a file.
-file – The file to load the audio from.
-A tuple of: -- the audio data -- the sample rate
-Save the audio to a file.
-audio – The audio to save.
target_path – The target path to save the audio to.
Bases: object
Base class for noise reduction. -This class is aimed to be inherited by specific noise reduction algorithms. -You must implement the following methods: -- clean_audio: The method to clean the audio, where the noise reduction algorithm is implemented. -- save_audio: The method to save the audio to a file. -- load_audio: The method to load the audio from a file.
-After implementing the above methods, you can use the reduce_noise method to reduce noise from audio files.
-Clean the audio from noise. Here you should implement the noise reduction algorithm.
-data – The audio data to clean.
-The cleaned audio.
-Load the audio from a file.
-file – The file to load the audio from.
-A tuple of: -- the audio data -- the sample rate
-Reduce noise from the given audio file.
-audio_file – The audio file to reduce noise from.
-A tuple of: -- a boolean indicating whether an error occurred -- a tuple of:
--- --
-- -
audio file name
- -
target path in case of success / error message in case of failure.
Remove silence sections from the audio.
-audio – The audio to remove silence from.
-The audio without silence.
-Save the audio to a file.
-audio – The audio to save.
target_path – The target path to save the audio to.
Reduce noise from audio file or directory containing audio files. -The audio files must be in .wav format. -The cleaned audio files will be saved in the target_directory. -For information about the noise reduction algorithm see: -timsainb/noisereduce -Notice that the saved files are in wav format, even if the original files are in other format.
-audio_source – path to audio file or directory containing audio files
target_directory – path to directory to save the cleaned audio files.
sample_rate – Number of samples in one second in the audio file. -Pass None to keep the original sample rate.
duration – Duration of the audio file to clean in seconds. -Pass None to keep the original duration.
channel – Channel to clean. Pass the number of the channel to clean. -To clean all channels pass None.
silence_threshold – The threshold to remove silence from the audio, in dB. -If None, no silence removal is performed.
use_multiprocessing – Number of processes to use for cleaning the audio files. -If 0, no multiprocessing is used.
verbose – Verbosity level. If True, display progress bar.
Reduce noise from audio files using DeepFilterNet. -For more information about the noise reduction algorithm see: -Rikorose/DeepFilterNet -Notice that the saved files are in wav format, even if the original files are in other format.
-audio_source – path to audio file or directory of audio files
target_directory – path to target directory to save cleaned audio files
pad – whether to pad the audio file with zeros before cleaning
atten_lim_db – maximum attenuation in dB
silence_threshold – the threshold to remove silence from the audio, in dB. If None, no silence removal is -performed.
use_multiprocessing – Number of processes to use for cleaning the audio files. -If 0, no multiprocessing is used.
verbose – verbosity level. If True, display progress bar and logs.
kwargs – additional arguments to pass to torchaudio.load(). For more information see: -https://pytorch.org/audio/stable/generated/torchaudio.load.html
Perform speech diarization on given audio files using pyannote-audio (pyannote/pyannote-audio). -The end result is a dictionary with the file names as keys and their diarization as value. A diarization is a list -of tuples: (start, end, speaker_label).
-To use the pyannote.audio models you must pass a Huggingface token and get access to the required models. The -token can be passed in one of the following options:
-Use the parameter access_token.
Set an environment variable named “HUGGING_FACE_HUB_TOKEN”.
If using MLRun, you can pass it as a secret named “HUGGING_FACE_HUB_TOKEN”.
To get access to the models on Huggingface, visit their page. For example, to use the default diarization model set -in this function (“pyannote/speaker-diarization-3.0”), you need access for these two models:
-Note: To control the recognized speakers in the diarization output you can choose one of the following methods:
-For a known speakers amount, you may set speaker labels via the speakers_labels parameter that will be used in -the order of speaking in the audio (first person speaking be the first label in the list). In addition, you can do -diarization per channel (setting the parameter separate_by_channels to True). Each label will be assigned to a -specific channel by order (first label to channel 0, second label to channel 1 and so on). Notice, this will -increase runtime.
For unknown speakers amount, you can set the speaker_prefix parameter to add a prefix for each speaker number. -You can also help the diarization by setting the speakers range via the speakers_amount_range parameter.
data_path – A directory of the audio files, a single file or a list of files to transcribe.
model_name – One of the official diarization model names (referred as diarization pipelines) of -pyannote.audio Huggingface page. Default: “pyannote/speaker-diarization-3.0”.
access_token – An access token to pass for using the pyannote.audio models. If not provided, it -will be looking for the environment variable “HUGGING_FACE_HUB_TOKEN”. If MLRun is -available, it will look for a secret “HUGGING_FACE_HUB_TOKEN”.
device – Device to load the model. Can be one of {“cuda”, “cpu”}. Default will prefer “cuda” if -available.
speakers_labels – Labels to use for the recognized speakers. Default: numeric labels (0, 1, …).
separate_by_channels – If each speaker is speaking in a separate channel, you can diarize each channel and -combine the result into a single diarization. Each label set in the speakers_labels -parameter will be assigned to a specific channel by order.
speaker_prefix – A prefix to add for the speakers labels. This parameter is ignored if -speakers_labels is not None. Default: “speaker”.
minimum_speakers – Set the minimum expected amount of speakers to be in the audio files. This parameter is -ignored if speakers_labels is not None.
maximum_speakers – Set the maximum expected amount of speakers to be in the audio files. This parameter is -ignored if speakers_labels is not None.
verbose – Whether to present logs of a progress bar and errors. Default: True.
A tuple of:
-Speech diarization dictionary.
A dictionary of errored files that were not transcribed.
Perform speech diarization on given audio files using pyannote-audio (pyannote/pyannote-audio). -The end result is a dictionary with the file names as keys and their diarization as value. A diarization is a list -of tuples: (start, end, speaker_label).
-To use the pyannote.audio models you must pass a Huggingface token and get access to the required models. The -token can be passed in one of the following options:
-Use the parameter access_token.
Set an environment variable named “HUGGING_FACE_HUB_TOKEN”.
If using MLRun, you can pass it as a secret named “HUGGING_FACE_HUB_TOKEN”.
To get access to the models on Huggingface, visit their page. For example, to use the default diarization model set -in this function (“pyannote/speaker-diarization-3.0”), you need access for these two models:
-Note: To control the recognized speakers in the diarization output you can choose one of the following methods:
-For a known speakers amount, you may set speaker labels via the speakers_labels parameter that will be used in -the order of speaking in the audio (first person speaking be the first label in the list). In addition, you can do -diarization per channel (setting the parameter separate_by_channels to True). Each label will be assigned to a -specific channel by order (first label to channel 0, second label to channel 1 and so on). Notice, this will -increase runtime.
For unknown speakers amount, you can set the speaker_prefix parameter to add a prefix for each speaker number. -You can also help the diarization by setting the speakers range via the speakers_amount_range parameter.
data_path – A directory of the audio files, a single file or a list of files to transcribe.
model_name – One of the official diarization model names (referred as diarization pipelines) of -pyannote.audio Huggingface page. Default: “pyannote/speaker-diarization-3.0”.
access_token – An access token to pass for using the pyannote.audio models. If not provided, it -will be looking for the environment variable “HUGGING_FACE_HUB_TOKEN”. If MLRun is -available, it will look for a secret “HUGGING_FACE_HUB_TOKEN”.
device – Device to load the model. Can be one of {“cuda”, “cpu”}. Default will prefer “cuda” if -available.
speakers_labels – Labels to use for the recognized speakers. Default: numeric labels (0, 1, …).
separate_by_channels – If each speaker is speaking in a separate channel, you can diarize each channel and -combine the result into a single diarization. Each label set in the speakers_labels -parameter will be assigned to a specific channel by order.
speaker_prefix – A prefix to add for the speakers labels. This parameter is ignored if -speakers_labels is not None. Default: “speaker”.
minimum_speakers – Set the minimum expected amount of speakers to be in the audio files. This parameter is -ignored if speakers_labels is not None.
maximum_speakers – Set the maximum expected amount of speakers to be in the audio files. This parameter is -ignored if speakers_labels is not None.
verbose – Whether to present logs of a progress bar and errors. Default: True.
A tuple of:
-Speech diarization dictionary.
A dictionary of errored files that were not transcribed.
Bases: QuestionHandler
Static class to hold all the possible poll question configurations options keys
-Bases: object
A class for handling questions answering for poll type questions. -These type of question are answered by asking the same question multiple times -and choosing the most common answer or the average answer.
-The number of times to ask the same question.
-The strategy to use for choosing the answer from the poll.
-Bases: Enum
An enumeration.
-The average answer strategy.
-The most common answer strategy.
-Calculate the average answer for a given list of answers.
-Perform the strategy.
-Calculate the most common answer for a given list of answers.
-Answer questions with a context to the given text files contents by a pretrained LLM model in given pipeline.
-Bases: object
A class for handling questions answering for a given question type. -This class is used as a base class for all question types, and for default question type (regular question -answering without any special handling).
-Bases: object
Answer questions with a context to the given text files contents by a pretrained LLM model in given pipeline.
-Bases: object
Answer questions with a context to the given text files contents by a pretrained LLM model. Each text file will have -the following prompt built:
-start of text_wrapper -<text file content> -end of text_wrapper
-start of questions_wrapper -1. <questions[0]> -2. <questions[1]> -… -n. <questions[n-1]> -end of questions_wrapper
-data_path – A path to a directory of text files or a path to a text file to ask -questions about.
model_name – The pre-trained model name from the huggingface hub to use for asking -questions.
questions – The questions to ask. -A list of lists of questions to ask per text file, and devided -by question groups, the groups can be dtermained by size (in order to -avoid large inputs to the llm) or by questioning method -(regular or poll like questioning).
device_map – A map to use for loading the model on multiple devices.
model_kwargs – Keyword arguments to pass for loading the model using HuggingFace’s -transformers.AutoModelForCausalLM.from_pretrained function.
auto_gptq_exllama_max_input_length – For AutoGPTQ models to set and extend the model’s input buffer size.
tokenizer_name – The tokenizer name from the huggingface hub to use. If not given, the -model name will be used.
tokenizer_kwargs – Keyword arguments to pass for loading the tokenizer using HuggingFace’s -transformers.AutoTokenizer.from_pretrained function.
text_wrapper – A wrapper for the file’s text. Will be added at the start of the prompt. -Must have a placeholder (‘{}’) for the text of the file.
questions_wrapper – A wrapper for the questions received. Will be added after the text -wrapper in the prompt template. Must have a placeholder (‘{}’) for the -questions.
generation_config – HuggingFace’s GenerationConfig keyword arguments to pass to the -generate method.
questions_config – A dictionary or list of dictionaries containing specific ways to answer -questions (using a poll for example), each dictionary in the list is for -corresponding question group and determines the question asking method -for said group.
batch_size – Batch size for inference.
questions_columns – Columns to use for the dataframe returned.
verbose – Whether to present logs of a progress bar and errors. Default: True.
A tuple of:
-A dataframe dataset of the questions answers.
A dictionary of errored files that were not inferred or were not answered properly.
Bases: QuestionHandler
Static class to hold all the possible poll question configurations options keys
-Bases: object
A class for handling questions answering for poll type questions. -These type of question are answered by asking the same question multiple times -and choosing the most common answer or the average answer.
-The number of times to ask the same question.
-The strategy to use for choosing the answer from the poll.
-Bases: Enum
An enumeration.
-The average answer strategy.
-The most common answer strategy.
-Calculate the average answer for a given list of answers.
-Perform the strategy.
-Calculate the most common answer for a given list of answers.
-Answer questions with a context to the given text files contents by a pretrained LLM model in given pipeline.
-Bases: object
A class for handling questions answering for a given question type. -This class is used as a base class for all question types, and for default question type (regular question -answering without any special handling).
-Bases: object
Answer questions with a context to the given text files contents by a pretrained LLM model in given pipeline.
-Bases: object
Answer questions with a context to the given text files contents by a pretrained LLM model. Each text file will have -the following prompt built:
-start of text_wrapper -<text file content> -end of text_wrapper
-start of questions_wrapper -1. <questions[0]> -2. <questions[1]> -… -n. <questions[n-1]> -end of questions_wrapper
-data_path – A path to a directory of text files or a path to a text file to ask -questions about.
model_name – The pre-trained model name from the huggingface hub to use for asking -questions.
questions – The questions to ask. -A list of lists of questions to ask per text file, and devided -by question groups, the groups can be dtermained by size (in order to -avoid large inputs to the llm) or by questioning method -(regular or poll like questioning).
device_map – A map to use for loading the model on multiple devices.
model_kwargs – Keyword arguments to pass for loading the model using HuggingFace’s -transformers.AutoModelForCausalLM.from_pretrained function.
auto_gptq_exllama_max_input_length – For AutoGPTQ models to set and extend the model’s input buffer size.
tokenizer_name – The tokenizer name from the huggingface hub to use. If not given, the -model name will be used.
tokenizer_kwargs – Keyword arguments to pass for loading the tokenizer using HuggingFace’s -transformers.AutoTokenizer.from_pretrained function.
text_wrapper – A wrapper for the file’s text. Will be added at the start of the prompt. -Must have a placeholder (‘{}’) for the text of the file.
questions_wrapper – A wrapper for the questions received. Will be added after the text -wrapper in the prompt template. Must have a placeholder (‘{}’) for the -questions.
generation_config – HuggingFace’s GenerationConfig keyword arguments to pass to the -generate method.
questions_config – A dictionary or list of dictionaries containing specific ways to answer -questions (using a poll for example), each dictionary in the list is for -corresponding question group and determines the question asking method -for said group.
batch_size – Batch size for inference.
questions_columns – Columns to use for the dataframe returned.
verbose – Whether to present logs of a progress bar and errors. Default: True.
A tuple of:
-A dataframe dataset of the questions answers.
A dictionary of errored files that were not inferred or were not answered properly.
Bases: object
A base class for a task to complete after VAD.
-Get the audio file of the task.
-The audio file of the task.
-Do the task on the given speech timestamps. The base task will simply save the speech timestamps as the result.
-speech_timestamps – The speech timestamps to do the task on as outputted from the VAD.
-Get the result of the task. A tuple of the audio file name and the result.
-The result of the task.
-Convert the task to a tuple to reconstruct it later (used for multiprocessing to pass in queue).
-The converted task.
-Bases: BaseTask
A speech diarization task. The task will diarize the VAD speech timestamps into speakers.
-Do the task on the given speech timestamps. The task will diarize the VAD speech timestamps into speakers.
-speech_timestamps – The speech timestamps per channel to do the task on as outputted from the VAD.
-Convert the task to a tuple to reconstruct it later (used for multiprocessing to pass in queue).
-The converted task.
-Bases: object
A task creator to create different tasks to run after the VAD.
-Create a task with the given audio file.
-audio_file – The audio file to assign to the task.
-The created task.
-Create a task from a tuple of the audio file name and the task kwargs.
-task_tuple – The task tuple to create the task from.
-The created task.
-Bases: object
A voice activity detection wrapper for the silero VAD model - snakers4/silero-vad.
-Infer the audio through the VAD model and return the speech timestamps.
-audio_file – The audio file to infer.
-The speech timestamps in the audio. A list of timestamps where each timestamp is a dictionary with the -following keys:
-”start”: The start sample index of the speech in the audio.
”end”: The end sample index of the speech in the audio.
If per_channel is True, a list of timestamps per channel will be returned.
- -Load the VAD model.
-force_reload – Whether to force reload the model even if it was already loaded. Default is True.
-Perform voice activity detection on given audio files using the silero VAD model - -snakers4/silero-vad. The end result is a dictionary with the file names as keys and their -VAD timestamps dictionaries as value.
-For example:
-{
- "file_1.wav": [
- {"start": 0, "end": 16000},
- {"start": 16000, "end": 32000},
- {"start": 32000, "end": 48000},
- ...
- ],
- "file_2.wav": [
- {"start": 0, "end": 16000},
- {"start": 16000, "end": 32000},
- {"start": 32000, "end": 48000},
- ...
- ],
- ...
-}
-data_path – The path to the audio files to diarize. Can be a path to a single file, a path to a -directory or a list of paths to files.
use_onnx – Whether to use ONNX for inference. Default is True.
force_onnx_cpu – Whether to force ONNX to use CPU for inference. Default is True.
threshold – Speech threshold. Silero VAD outputs speech probabilities for each audio chunk, -probabilities ABOVE this value are considered as SPEECH. It is better to tune -this parameter for each dataset separately, but “lazy” 0.5 is pretty good for -most datasets.
sampling_rate – Currently, silero VAD models support 8000 and 16000 sample rates.
min_speech_duration_ms – Final speech chunks shorter min_speech_duration_ms are thrown out.
max_speech_duration_s – Maximum duration of speech chunks in seconds. Chunks longer than -max_speech_duration_s will be split at the timestamp of the last silence that -lasts more than 100ms (if any), to prevent aggressive cutting. Otherwise, they will -be split aggressively just before max_speech_duration_s.
min_silence_duration_ms – In the end of each speech chunk wait for min_silence_duration_ms before separating -it.
window_size_samples –
Audio chunks of window_size_samples size are fed to the silero VAD model.
-WARNING! Silero VAD models were trained using 512, 1024, 1536 samples for 16000 -sample rate and 256, 512, 768 samples for 8000 sample rate. Values other than -these may affect model performance!
-speech_pad_ms – Final speech chunks are padded by speech_pad_ms each side.
return_seconds – Whether return timestamps in seconds. False means to return timestamps in samples -(default - False).
per_channel – Whether to return timestamps per channel (default - False). This will run VAD on -each channel separately and return a list of timestamps per channel.
use_multiprocessing – The number of workers to use for multiprocessing. If 0, no multiprocessing will -be used. Default is 0.
verbose – Verbosity.
Perform speech diarization on given audio files using the silero VAD model - snakers4/silero-vad. -The speech diarization is performed per channel so that each channel in the audio belong to a different speaker. The -end result is a dictionary with the file names as keys and their diarization as value. A diarization is a list -of tuples: (start, end, speaker_label).
-For example:
-{
- "file_1.wav": [
- (0.0, 1.0, "speaker_0"),
- (1.0, 2.0, "speaker_1"),
- (2.0, 3.0, "speaker_0"),
- ...
- ],
- "file_2.wav": [
- (0.0, 1.0, "speaker_0"),
- (1.0, 2.0, "speaker_1"),
- (2.0, 3.0, "speaker_0"),
- ...
- ],
- ...
-}
-data_path – The path to the audio files to diarize. Can be a path to a single file, a path to a -directory or a list of paths to files.
use_onnx – Whether to use ONNX for inference. Default is True.
force_onnx_cpu – Whether to force ONNX to use CPU for inference. Default is True.
threshold – Speech threshold. Silero VAD outputs speech probabilities for each audio chunk, -probabilities ABOVE this value are considered as SPEECH. It is better to tune -this parameter for each dataset separately, but “lazy” 0.5 is pretty good for -most datasets.
sampling_rate – Currently, silero VAD models support 8000 and 16000 sample rates.
min_speech_duration_ms – Final speech chunks shorter min_speech_duration_ms are thrown out.
max_speech_duration_s – Maximum duration of speech chunks in seconds. Chunks longer than -max_speech_duration_s will be split at the timestamp of the last silence that -lasts more than 100ms (if any), to prevent aggressive cutting. Otherwise, they will -be split aggressively just before max_speech_duration_s.
min_silence_duration_ms – In the end of each speech chunk wait for min_silence_duration_ms before separating -it.
window_size_samples –
Audio chunks of window_size_samples size are fed to the silero VAD model.
-WARNING! Silero VAD models were trained using 512, 1024, 1536 samples for 16000 -sample rate and 256, 512, 768 samples for 8000 sample rate. Values other than -these may affect model performance!
-speech_pad_ms – Final speech chunks are padded by speech_pad_ms each side.
speaker_labels – The speaker labels to use for the diarization. If not given, the speakers will be -named “speaker_0”, “speaker_1”, etc.
use_multiprocessing – The number of workers to use for multiprocessing. If 0, no multiprocessing will -be used. Default is 0.
verbose – Verbosity.
Bases: object
A base class for a task to complete after VAD.
-Get the audio file of the task.
-The audio file of the task.
-Do the task on the given speech timestamps. The base task will simply save the speech timestamps as the result.
-speech_timestamps – The speech timestamps to do the task on as outputted from the VAD.
-Get the result of the task. A tuple of the audio file name and the result.
-The result of the task.
-Convert the task to a tuple to reconstruct it later (used for multiprocessing to pass in queue).
-The converted task.
-Bases: BaseTask
A speech diarization task. The task will diarize the VAD speech timestamps into speakers.
-Do the task on the given speech timestamps. The task will diarize the VAD speech timestamps into speakers.
-speech_timestamps – The speech timestamps per channel to do the task on as outputted from the VAD.
-Convert the task to a tuple to reconstruct it later (used for multiprocessing to pass in queue).
-The converted task.
-Bases: object
A task creator to create different tasks to run after the VAD.
-Create a task with the given audio file.
-audio_file – The audio file to assign to the task.
-The created task.
-Create a task from a tuple of the audio file name and the task kwargs.
-task_tuple – The task tuple to create the task from.
-The created task.
-Bases: object
A voice activity detection wrapper for the silero VAD model - snakers4/silero-vad.
-Infer the audio through the VAD model and return the speech timestamps.
-audio_file – The audio file to infer.
-The speech timestamps in the audio. A list of timestamps where each timestamp is a dictionary with the -following keys:
-”start”: The start sample index of the speech in the audio.
”end”: The end sample index of the speech in the audio.
If per_channel is True, a list of timestamps per channel will be returned.
- -Load the VAD model.
-force_reload – Whether to force reload the model even if it was already loaded. Default is True.
-Perform voice activity detection on given audio files using the silero VAD model - -snakers4/silero-vad. The end result is a dictionary with the file names as keys and their -VAD timestamps dictionaries as value.
-For example:
-{
- "file_1.wav": [
- {"start": 0, "end": 16000},
- {"start": 16000, "end": 32000},
- {"start": 32000, "end": 48000},
- ...
- ],
- "file_2.wav": [
- {"start": 0, "end": 16000},
- {"start": 16000, "end": 32000},
- {"start": 32000, "end": 48000},
- ...
- ],
- ...
-}
-data_path – The path to the audio files to diarize. Can be a path to a single file, a path to a -directory or a list of paths to files.
use_onnx – Whether to use ONNX for inference. Default is True.
force_onnx_cpu – Whether to force ONNX to use CPU for inference. Default is True.
threshold – Speech threshold. Silero VAD outputs speech probabilities for each audio chunk, -probabilities ABOVE this value are considered as SPEECH. It is better to tune -this parameter for each dataset separately, but “lazy” 0.5 is pretty good for -most datasets.
sampling_rate – Currently, silero VAD models support 8000 and 16000 sample rates.
min_speech_duration_ms – Final speech chunks shorter min_speech_duration_ms are thrown out.
max_speech_duration_s – Maximum duration of speech chunks in seconds. Chunks longer than -max_speech_duration_s will be split at the timestamp of the last silence that -lasts more than 100ms (if any), to prevent aggressive cutting. Otherwise, they will -be split aggressively just before max_speech_duration_s.
min_silence_duration_ms – In the end of each speech chunk wait for min_silence_duration_ms before separating -it.
window_size_samples –
Audio chunks of window_size_samples size are fed to the silero VAD model.
-WARNING! Silero VAD models were trained using 512, 1024, 1536 samples for 16000 -sample rate and 256, 512, 768 samples for 8000 sample rate. Values other than -these may affect model performance!
-speech_pad_ms – Final speech chunks are padded by speech_pad_ms each side.
return_seconds – Whether return timestamps in seconds. False means to return timestamps in samples -(default - False).
per_channel – Whether to return timestamps per channel (default - False). This will run VAD on -each channel separately and return a list of timestamps per channel.
use_multiprocessing – The number of workers to use for multiprocessing. If 0, no multiprocessing will -be used. Default is 0.
verbose – Verbosity.
Perform speech diarization on given audio files using the silero VAD model - snakers4/silero-vad. -The speech diarization is performed per channel so that each channel in the audio belong to a different speaker. The -end result is a dictionary with the file names as keys and their diarization as value. A diarization is a list -of tuples: (start, end, speaker_label).
-For example:
-{
- "file_1.wav": [
- (0.0, 1.0, "speaker_0"),
- (1.0, 2.0, "speaker_1"),
- (2.0, 3.0, "speaker_0"),
- ...
- ],
- "file_2.wav": [
- (0.0, 1.0, "speaker_0"),
- (1.0, 2.0, "speaker_1"),
- (2.0, 3.0, "speaker_0"),
- ...
- ],
- ...
-}
-data_path – The path to the audio files to diarize. Can be a path to a single file, a path to a -directory or a list of paths to files.
use_onnx – Whether to use ONNX for inference. Default is True.
force_onnx_cpu – Whether to force ONNX to use CPU for inference. Default is True.
threshold – Speech threshold. Silero VAD outputs speech probabilities for each audio chunk, -probabilities ABOVE this value are considered as SPEECH. It is better to tune -this parameter for each dataset separately, but “lazy” 0.5 is pretty good for -most datasets.
sampling_rate – Currently, silero VAD models support 8000 and 16000 sample rates.
min_speech_duration_ms – Final speech chunks shorter min_speech_duration_ms are thrown out.
max_speech_duration_s – Maximum duration of speech chunks in seconds. Chunks longer than -max_speech_duration_s will be split at the timestamp of the last silence that -lasts more than 100ms (if any), to prevent aggressive cutting. Otherwise, they will -be split aggressively just before max_speech_duration_s.
min_silence_duration_ms – In the end of each speech chunk wait for min_silence_duration_ms before separating -it.
window_size_samples –
Audio chunks of window_size_samples size are fed to the silero VAD model.
-WARNING! Silero VAD models were trained using 512, 1024, 1536 samples for 16000 -sample rate and 256, 512, 768 samples for 8000 sample rate. Values other than -these may affect model performance!
-speech_pad_ms – Final speech chunks are padded by speech_pad_ms each side.
speaker_labels – The speaker labels to use for the diarization. If not given, the speakers will be -named “speaker_0”, “speaker_1”, etc.
use_multiprocessing – The number of workers to use for multiprocessing. If 0, no multiprocessing will -be used. Default is 0.
verbose – Verbosity.
Bases: object
A task to write the transcription to file.
-Try to perform the task storing an error if occurred.
-Get the result of the task. If the task failed, the error will be returned, otherwise, the result will be the -text file name.
-The task’s result.
-Check if the task failed.
-Whether the task failed.
-Convert the task to a tuple to reconstruct it later (used for multiprocessing to pass in queue).
-The converted task.
-Bases: object
A batch processor to process batches of transcriptions. The batch processor is creating tasks and is aimed to be -working along the transcriber. It can be used with multiprocessing queue or run the tasks directly using the -associated methods.
-Perform the tasks. Should be used if no multiprocessing queue is given to a transcriber.
-Get the results of the tasks. The stored results are then cleared.
-The results of the tasks.
-Get the tasks to perform.
-The tasks to perform.
-Process a batch of transcriptions. Tasks related to the given batch will be created and stored in the batch -processor.
-batch – The batch of transcriptions to process.
-Bases: BatchProcessor
A batch processor to process batches of transcriptions per channel. The batch processor is creating tasks with the -selected amount of channels given and is aimed to be working along the transcriber. It can be used with -multiprocessing queue or run the tasks directly using the associated methods.
-Process a batch of transcriptions. Tasks related to the given batch will be created and stored in the batch -processor.
-batch – The batch of transcriptions to process.
-Bases: BatchProcessor
A batch processor to process batches of transcriptions with respect to a given speech diarization. The batch -processor is creating tasks and is aimed to be working along the transcriber. It can be used with multiprocessing -queue or run the tasks directly using the associated methods.
-Process a batch of transcriptions. Tasks related to the given batch will be created and stored in the batch -processor.
-batch – The batch of transcriptions to process.
-Bases: BaseTask
A task to write the transcription to file with respect to a given speech diarization per channel.
-Try to perform the task storing an error if occurred.
-Convert the task to a tuple to reconstruct it later (used for multiprocessing to pass in queue).
-The converted task.
-Get the transcription output channels.
-The transcription output channels.
-Bases: BaseTask
A task to write the transcription to file with respect to a given speech diarization.
-Convert the task to a tuple to reconstruct it later (used for multiprocessing to pass in queue).
-The converted task.
-Bases: object
A transcription wrapper for the Huggingface’s ASR pipeline - -https://huggingface.co/transformers/main_classes/pipelines.html#transformers.AutomaticSpeechRecognitionPipeline to -use with OpenAI’s Whisper models - https://huggingface.co/openai.
-Load the transcriber. Must be called before transcribing.
-Transcribe the given audio files. The transcriptions will be sent to a queue or a batch processor for further -processing like writing to text files. If no queue or batch processor is given, the transcriptions outputs from -the pipeline will be returned. Otherwise, None is returned.
-audio_files – The audio files to transcribe.
batch_processor – A batch processor.
batches_queue – A multiprocessing queue to put the batches in.
verbose – Whether to show a progress bar. Default is False.
The transcriptions outputs from the pipeline if no queue or batch processor is given, otherwise, -None.
-Transcribe audio files into text files and collect additional data. The end result is a directory of transcribed -text files and a dataframe containing the following columns:
-audio_file - The audio file path.
transcription_file - The transcribed text file name in the output directory.
The transcription is based on Huggingface’s ASR pipeline - -https://huggingface.co/transformers/main_classes/pipelines.html#transformers.AutomaticSpeechRecognitionPipeline and -is tested with OpenAI’s Whisper models - https://huggingface.co/openai.
-If one of the speaker diarization parameters are given (either speech_diarization or -speech_diarize_per_channel), the transcription will be written in a conversation format, where each speaker will -be written in a separate line:
-speaker_1: text
-speaker_2: text
-speaker_1: text
-...
-data_path – A directory of audio files or a single file or a list of files to transcribe.
output_directory – Path to a directory to save all transcribed audio files. If not given, will save -the transcribed files in a temporary directory.
model_name – The model name to use. Should be a model from the OpenAI’s Whisper models for -best results (for example “tiny”, “base”, “large”, etc.). See here for more -information: https://huggingface.co/openai?search_models=whisper.
device – The device to use for inference. If not given, will use GPU if available.
use_flash_attention_2 –
Whether to use the Flash Attention 2 implementation. It can be used only with -one of the following GPUs: Nvidia H series and Nvidia A series. T4 support -will be available soon.
-Note: If both use_flash_attention_2 and -use_better_transformers are None, the optimization will be chosen -automatically according to the available resources.
-use_better_transformers –
Whether to use the Better Transformers library to further optimize the model. -Should be used for all use cases that do not support flash attention 2.
-Note: If both use_flash_attention_2 and use_better_transformers are -None, the optimization will be chosen automatically according to the -available resources.
-assistant_model –
The assistant model name to use for inference. Notice that the optimizations -(flash attention 2 and better transformers) will be applied for the assistant as -well. Should be a model from Huggingface’s distil-whisper (see here for more -information: huggingface/distil-whisper).
-Note: Currently an assistant model is only usable with batch size of 1.
-max_new_tokens – The maximum number of new tokens to generate. This is used to limit the -generation length. Default is 128 tokens.
chunk_length_s – The audio chunk to split the audio to (in seconds). Default is 30 seconds.
batch_size – The batch size to use for inference. Default is 2.
spoken_language – Aim whisper to know what language is spoken. If None, it will try to detect -it.
translate_to_english – Whether to translate the transcriptions to English.
speech_diarization –
A speech diarization dictionary with the file names to transcribe as keys and -their diarization as value. The diarization is a list of tuples: -(start, end, speaker). An example -for a diarization dictionary:
-{
----
-- ”audio_file_name”: [
- -
-
-- {
- -
“start”: 0.0, -“end”: 2.0, -“speaker”: “Agent”,
-}, -{
---”start”: 2.0, -“end”: 4.0, -“speaker”: “Client”,
-
}
-Note: The diarization must be for the entire duration of the audio file (as long -as Whisper is predicting words up until then.
-speech_diarize_per_channel – Perform speech diarization per channel. Each speaker is expected to belong to -a separate channel in the audio. Notice: This will make the transcription -slower as each channel wil be transcribed separatly. If a speech diarization -is passed (via the speech_diarization parameter), this parameter is -ignored.
speaker_labels – A list of speaker labels by channel order to use for writing the -transcription with respect to per channel speech diarization. This won’t be -used together with a given speech diarization (via the speech_diarization -parameter).
use_multiprocessing – Whether to use multiprocessing to transcribe the audio files. Can be either a -boolean value or an integer. If True, will use the default amount of workers -(3): 1 for transcription, 1 for batch processing and 1 for task completion (such -as speech diarization and writing to files). To control the amount of tasks -completion workers, an integer can be provided to specify the amount of workers. -False, will use a single process. Default is False.
verbose – Whether to print the progress of the transcription. Default is False.
Bases: object
A task to write the transcription to file.
-Try to perform the task storing an error if occurred.
-Get the result of the task. If the task failed, the error will be returned, otherwise, the result will be the -text file name.
-The task’s result.
-Check if the task failed.
-Whether the task failed.
-Convert the task to a tuple to reconstruct it later (used for multiprocessing to pass in queue).
-The converted task.
-Bases: object
A batch processor to process batches of transcriptions. The batch processor is creating tasks and is aimed to be -working along the transcriber. It can be used with multiprocessing queue or run the tasks directly using the -associated methods.
-Perform the tasks. Should be used if no multiprocessing queue is given to a transcriber.
-Get the results of the tasks. The stored results are then cleared.
-The results of the tasks.
-Get the tasks to perform.
-The tasks to perform.
-Process a batch of transcriptions. Tasks related to the given batch will be created and stored in the batch -processor.
-batch – The batch of transcriptions to process.
-Bases: BatchProcessor
A batch processor to process batches of transcriptions per channel. The batch processor is creating tasks with the -selected amount of channels given and is aimed to be working along the transcriber. It can be used with -multiprocessing queue or run the tasks directly using the associated methods.
-Process a batch of transcriptions. Tasks related to the given batch will be created and stored in the batch -processor.
-batch – The batch of transcriptions to process.
-Bases: BatchProcessor
A batch processor to process batches of transcriptions with respect to a given speech diarization. The batch -processor is creating tasks and is aimed to be working along the transcriber. It can be used with multiprocessing -queue or run the tasks directly using the associated methods.
-Process a batch of transcriptions. Tasks related to the given batch will be created and stored in the batch -processor.
-batch – The batch of transcriptions to process.
-Bases: BaseTask
A task to write the transcription to file with respect to a given speech diarization per channel.
-Try to perform the task storing an error if occurred.
-Convert the task to a tuple to reconstruct it later (used for multiprocessing to pass in queue).
-The converted task.
-Get the transcription output channels.
-The transcription output channels.
-Bases: BaseTask
A task to write the transcription to file with respect to a given speech diarization.
-Convert the task to a tuple to reconstruct it later (used for multiprocessing to pass in queue).
-The converted task.
-Bases: object
A transcription wrapper for the Huggingface’s ASR pipeline - -https://huggingface.co/transformers/main_classes/pipelines.html#transformers.AutomaticSpeechRecognitionPipeline to -use with OpenAI’s Whisper models - https://huggingface.co/openai.
-Load the transcriber. Must be called before transcribing.
-Transcribe the given audio files. The transcriptions will be sent to a queue or a batch processor for further -processing like writing to text files. If no queue or batch processor is given, the transcriptions outputs from -the pipeline will be returned. Otherwise, None is returned.
-audio_files – The audio files to transcribe.
batch_processor – A batch processor.
batches_queue – A multiprocessing queue to put the batches in.
verbose – Whether to show a progress bar. Default is False.
The transcriptions outputs from the pipeline if no queue or batch processor is given, otherwise, -None.
-Transcribe audio files into text files and collect additional data. The end result is a directory of transcribed -text files and a dataframe containing the following columns:
-audio_file - The audio file path.
transcription_file - The transcribed text file name in the output directory.
The transcription is based on Huggingface’s ASR pipeline - -https://huggingface.co/transformers/main_classes/pipelines.html#transformers.AutomaticSpeechRecognitionPipeline and -is tested with OpenAI’s Whisper models - https://huggingface.co/openai.
-If one of the speaker diarization parameters are given (either speech_diarization or -speech_diarize_per_channel), the transcription will be written in a conversation format, where each speaker will -be written in a separate line:
-speaker_1: text
-speaker_2: text
-speaker_1: text
-...
-data_path – A directory of audio files or a single file or a list of files to transcribe.
output_directory – Path to a directory to save all transcribed audio files. If not given, will save -the transcribed files in a temporary directory.
model_name – The model name to use. Should be a model from the OpenAI’s Whisper models for -best results (for example “tiny”, “base”, “large”, etc.). See here for more -information: https://huggingface.co/openai?search_models=whisper.
device – The device to use for inference. If not given, will use GPU if available.
use_flash_attention_2 –
Whether to use the Flash Attention 2 implementation. It can be used only with -one of the following GPUs: Nvidia H series and Nvidia A series. T4 support -will be available soon.
-Note: If both use_flash_attention_2 and -use_better_transformers are None, the optimization will be chosen -automatically according to the available resources.
-use_better_transformers –
Whether to use the Better Transformers library to further optimize the model. -Should be used for all use cases that do not support flash attention 2.
-Note: If both use_flash_attention_2 and use_better_transformers are -None, the optimization will be chosen automatically according to the -available resources.
-assistant_model –
The assistant model name to use for inference. Notice that the optimizations -(flash attention 2 and better transformers) will be applied for the assistant as -well. Should be a model from Huggingface’s distil-whisper (see here for more -information: huggingface/distil-whisper).
-Note: Currently an assistant model is only usable with batch size of 1.
-max_new_tokens – The maximum number of new tokens to generate. This is used to limit the -generation length. Default is 128 tokens.
chunk_length_s – The audio chunk to split the audio to (in seconds). Default is 30 seconds.
batch_size – The batch size to use for inference. Default is 2.
spoken_language – Aim whisper to know what language is spoken. If None, it will try to detect -it.
translate_to_english – Whether to translate the transcriptions to English.
speech_diarization –
A speech diarization dictionary with the file names to transcribe as keys and -their diarization as value. The diarization is a list of tuples: -(start, end, speaker). An example -for a diarization dictionary:
-{
----
-- ”audio_file_name”: [
- -
-
-- {
- -
“start”: 0.0, -“end”: 2.0, -“speaker”: “Agent”,
-}, -{
---”start”: 2.0, -“end”: 4.0, -“speaker”: “Client”,
-
}
-Note: The diarization must be for the entire duration of the audio file (as long -as Whisper is predicting words up until then.
-speech_diarize_per_channel – Perform speech diarization per channel. Each speaker is expected to belong to -a separate channel in the audio. Notice: This will make the transcription -slower as each channel wil be transcribed separatly. If a speech diarization -is passed (via the speech_diarization parameter), this parameter is -ignored.
speaker_labels – A list of speaker labels by channel order to use for writing the -transcription with respect to per channel speech diarization. This won’t be -used together with a given speech diarization (via the speech_diarization -parameter).
use_multiprocessing – Whether to use multiprocessing to transcribe the audio files. Can be either a -boolean value or an integer. If True, will use the default amount of workers -(3): 1 for transcription, 1 for batch processing and 1 for task completion (such -as speech diarization and writing to files). To control the amount of tasks -completion workers, an integer can be provided to specify the amount of workers. -False, will use a single process. Default is False.
verbose – Whether to print the progress of the transcription. Default is False.
Translate text files using a transformer model from Huggingface’s hub according to the source and target languages -given (or using the directly provided model name). The end result is a directory of translated text files and a -dataframe containing the following columns:
-text_file - The text file path.
translation_file - The translation text file name in the output directory.
data_path – A directory of text files or a single file or a list of files to translate.
output_directory – Directory where the translated files will be saved.
model_name – The name of a model to load. If None, the model name is constructed using the source and -target languages parameters.
source_language – The source language code (e.g., ‘en’ for English).
target_language – The target language code (e.g., ‘en’ for English).
model_kwargs – Keyword arguments to pass regarding the loading of the model in HuggingFace’s pipeline -function.
device – The device index for transformers. Default will prefer cuda if available.
batch_size – The number of batches to use in translation. The files are translated one by one, but the -sentences can be batched.
translation_kwargs – Additional keyword arguments to pass to a transformers.TranslationPipeline when doing -the translation inference. Notice the batch size here is being added automatically.
verbose – Whether to present logs of a progress bar and errors. Default: True.
A tuple of:
-Path to the output directory.
A dataframe dataset of the translated file names.
A dictionary of errored files that were not translated.
Translate text files using a transformer model from Huggingface’s hub according to the source and target languages -given (or using the directly provided model name). The end result is a directory of translated text files and a -dataframe containing the following columns:
-text_file - The text file path.
translation_file - The translation text file name in the output directory.
data_path – A directory of text files or a single file or a list of files to translate.
output_directory – Directory where the translated files will be saved.
model_name – The name of a model to load. If None, the model name is constructed using the source and -target languages parameters.
source_language – The source language code (e.g., ‘en’ for English).
target_language – The target language code (e.g., ‘en’ for English).
model_kwargs – Keyword arguments to pass regarding the loading of the model in HuggingFace’s pipeline -function.
device – The device index for transformers. Default will prefer cuda if available.
batch_size – The number of batches to use in translation. The files are translated one by one, but the -sentences can be batched.
translation_kwargs – Additional keyword arguments to pass to a transformers.TranslationPipeline when doing -the translation inference. Notice the batch size here is being added automatically.
verbose – Whether to present logs of a progress bar and errors. Default: True.
A tuple of:
-Path to the output directory.
A dataframe dataset of the translated file names.
A dictionary of errored files that were not translated.