Introducing Whisper-TikTok 🤖🎥

Star History

Introduction

Discover Whisper-TikTok, an innovative AI-powered tool that leverages the prowess of Edge TTS, OpenAI-Whisper, and FFMPEG to craft captivating TikTok videos. Harnessing the capabilities of OpenAI's Whisper model, Whisper-TikTok effortlessly generates an accurate transcription from provided audio files, laying the foundation for the creation of mesmerizing TikTok videos through the utilization of FFMPEG. Additionally, the program seamlessly integrates the Microsoft Edge Cloud Text-to-Speech (TTS) API to lend a vibrant voiceover to the video. Opting for Microsoft Edge Cloud TTS API's voiceover is a deliberate choice, as it delivers a remarkably natural and authentic auditory experience, setting it apart from the often monotonous and artificial voiceovers prevalent in numerous TikTok videos.

Demo Video

https://github.com/MatteoFasulo/Whisper-TikTok/assets/74818541/68e25504-c305-4144-bd39-c9acc218c3a4

Installation 🛠️

Whisper-TikTok has been tested in Windows 10, Windows 11 and Ubuntu 24.04 systems equipped with Python versions 3.11, and 3.12.

If you want to run Whisper-TikTok locally, you can clone the repository using the following command:

git clone https://github.com/MatteoFasulo/Whisper-TikTok.git

Install the required dependencies using pip:

pip install -r requirements.txt

However, we encourage the adoption of astral uv to install the required dependencies. If you are using uv, you can install the dependencies with the following command:

uv pip install -e . --torch-backend=auto

Note: The --torch-backend=auto allows automatic selection of the appropriate PyTorch backend. More info can be found in the uv documentation.

FFMPEG Installation

Binaries for FFMPEG are not included in the repository and must be installed separately. Make sure to have FFMPEG installed and accessible in your system's PATH. For convenience, here are the installation instructions for various package managers:

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (<https://brew.sh/>)
brew install ffmpeg

# on Windows using Chocolatey (<https://chocolatey.org/>)
choco install ffmpeg

# on Windows using Scoop (<https://scoop.sh/>)
scoop install ffmpeg

Command-Line

To run the program from the command-line, execute the following command within your terminal:

whisper_tiktok --help

which will provide you with a list of available commands.

CLI Options

Whisper-TikTok supports many command-line options to customize the generated TikTok video. Just to name a few, you can choose the Whisper model to use, the TTS voice, subtitle format, subtitle position, font size, font color, and many more.

To browse all available options, run the following command:

whisper_tiktok create --help

If you use the --random_voice option, please specify both --gender and --language arguments. Whisper model will auto-detect the language of the audio file and use the corresponding model.

Usage Examples

Generate a TikTok video using a specific TTS voice:

whisper_tiktok create --tts en-US-EricNeural

Use a custom YouTube video as the background video:

whisper_tiktok create --background-url https://www.youtube.com/watch?v=dQw4w9WgXcQ

Modify the font color of the subtitles:

whisper_tiktok create --font_color FFF000

Generate a TikTok video with a random TTS voice:

whisper_tiktok create --random_voice --gender Male --language en-US

List all available voices:

whisper_tiktok list-voices

you will find a list of available voices together with some information about each voice, such as the tone, style, and suitable scenarios.

Additional Resources

Code of Conduct

Please review our Code of Conduct before contributing to Whisper-TikTok.

Contributing

We welcome contributions from the community! Please see our Contributing Guidelines for more information.

Acknowledgments

We'd like to give a huge thanks to @rany2 for their edge-tts package, which made it possible to use the Microsoft Edge Cloud TTS API with Whisper-TikTok.
We also acknowledge the contributions of the Whisper model by @OpenAI for robust speech recognition via large-scale weak supervision
Also @jianfch for the stable-ts package, which made it possible to use the OpenAI Whisper model with Whisper-TikTok in a stable manner with font color and subtitle format options.

License

Whisper-TikTok is licensed under the Apache License, Version 2.0.