Skip to content

namastexlabs/whisperx-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Whisperx API Wrapper

An API Wrapper for Whisperx Library

Overview

This is a FastAPI application that provides an endpoint for video/audio transcription using the whisperx command. The application supports multiple audio and video formats. It performs the transcription, alignment, and diarization of the uploaded media files.

Features

  • User Authentication with JWT
  • Support for multiple audio and video formats
  • Diarization support
  • Customizable language and model settings

Requirements

  • whisperx
  • Python 3.8+
  • FastAPI
  • ffmpeg
  • SQLite
  • pyjwt
  • dotenv

Follow the instructions on how to install Whisperx in the official repository You can install these dependencies using the requirements.txt file:

pip install -r requirements.txt

Environment Variables

Create a .env file in your root directory and add the following variables:

SECRET_KEY=your_secret_key
MASTER_KEY=your_master_key
HUGGING_FACE_TOKEN=your_hugging_face_token
API_PORT=11300

Database Setup

SQLite is used for storing user information. The database is created automatically when the application runs.

Running the Application

Run the application using:

python api_whisperx.py

Replace main with the name of your Python file if it's not main.py.

API Endpoints

POST /auth

Authenticate a user and return a JWT token.

  • username: The username of the user.
  • password: The password of the user.

POST /create_user

Create a new user.

  • username: Desired username.
  • password: Desired password.
  • master_key: Master key for authorized user creation.

POST /whisperx-transcribe/

Transcribe an uploaded audio or video file.

  • file: The audio or video file to transcribe.
  • lang: Language for transcription (default is "pt").
  • model: Model to use for transcription (default is "large-v2").
  • min_speakers: Minimum number of speakers for diarization (default is 1).
  • max_speakers: Maximum number of speakers for diarization (default is 2).

Logging

The application has built-in logging that informs about the steps being performed and any errors that occur.