A2E
  1. TTS and Voice Clone
A2E
  • AI Avatar API
  • Get Tokens
    • Obtain Login Token
      POST
    • Get API token
      POST
    • Delete API token
      POST
    • List api tokens
      GET
    • Getting API Tokens (2025 version)
      GET
  • TTS and Voice Clone
    • List Public TTS Options
      POST
    • List Voice Clone Options
      GET
    • Train TTS Model of The User's Voice (Voice Clone)
      POST
    • List Ongoing Voice Clone Tasks
      GET
    • Generate TTS Audio (Text-to-Speech)
      POST
    • Get Details of a Voice
      GET
    • Delete a User Voice
      DELETE
  • Generate Avatar Videos
    • Generate AI Avatar Videos
      POST
    • List of Result Videos
      POST
    • List One or All Avatars
      POST
    • Obtain the Status of One Avatar Video Task
      POST
    • Obtain the List of Personalized Lip-Sync Models
      GET
    • Delete or Cancel a Video
      DELETE
    • Auto Language Detect
      POST
    • Auto Swith to Public Computing Pool
      POST
  • Create Avatars and Train Lip-sync Models
    • Create A Custom Avatar by a Video or an Image
      POST
    • Train a Personalized Lip-sync Model (Optional) a.k.a. Continue Training 💠
      POST
    • Remove A Customized Avatar
      POST
    • Get Status of All Tasks
      GET
    • Get All Ongoing "Training" Tasks
      GET
    • Status of One Task
      GET
    • Clone Voice from a Video
      POST
  • Background Matting and Replacement
    • Obtain the List of Background Images
    • Add Custom Background Image
    • Delete Custom Image
  • Face Swap
    • Manage Face Swap Resource
      • Add Face Swap Image
      • Get Records of Face Swap Images
      • Delete User Face Swap Image
    • Quickly Preview Face Swap
      • Add User Face Swap Preview
      • Get Status of Face Swap Preview Process
    • Face Swap Tasks
      • Add User Face Swap Task
      • Get Status of Face Swap Task
      • Get Face Swap Task Records
      • Get Details of Face Swap
      • Delete Record
  • AI Dubbing
    • Start dubbing
    • List Dubbing Tasks
    • List All Processing Dubbing Tasks
    • Get Details
    • Delete Record
  • Image to Video
    • Start Image-to-Video
    • Check Status of One Task
    • List Status of All Tasks
    • Delete Record
  • Caption Removal
    • Start Caption Removal
    • Get Records of All Tasks
    • Get Status of All Tasks in Processing
    • Get Details of One Task
    • Delete a Task
  • Streaming Avatar
    • Get All avatars
    • Get a Streaming Avatar Token
    • Set QA Context
    • Get QA Context
    • Ask a Question to the Avatar
    • Let the Avatar Speak Directly
    • Leave the Room
  • Miscellaneous
    • Add a User
    • Get User Remaining Credits
    • Exchange Diamonds
    • List Available Languages
    • Save URL to A2E's storage
  1. TTS and Voice Clone

Train TTS Model of The User's Voice (Voice Clone)

Global Server
https://video.a2e.ai
Global Server
https://video.a2e.ai
POST
/api/v1/userVoice/training

This endpoint allows to submit a POST request to initiate the training of a user's voice. The result of the training enables the TTS capability of the user's voice.

English and Chinese are most optimized. Other languages such as Arabic, Japanese, Korean, Thai, French, Spanish are supported. But the quality is less optimized. The only supported mode of training is "best". Requirements:

  • The voice file supports mp3 wav or m4a formats. You should upload an audio file with total duration >= 15 seconds and <= 60 seconds.

  • The mime type of your audio URL must be set correctly (e.g. audio/wav audio/mpeg). We use the mime of the URL header to determine the file type, not the suffix of the URL. If you use an object storage service of a popular cloud service (e.g. S3 of AWS), the mime is usually automatically set.

  • We do not allow space in the URL.

  • Address redirect is not allowed (i.e. 3xx response code from the http request). This is a common issue if someone provides a http link, but later the server redirects the http address to a https address.

  • The voice quality is more important than audio length. We recommend uploading high quality audio in wav format.

  • Audio: single person, clear vocals without any background noise, consistent volume, avoiding long silence, avoiding multiple speakers, avoiding noise from air conditioners or street.

  • Time to finish: The training usually will finish within 1 minute.

After the training is started, there will be two phases: (1) processing (2) completed. The result is ready usually within 2 minutes, by the time you see "completed" response.

Once the voice clone is done, you can provide TTS texts in multiple languages.
Auido files support mp3, m4a, wav, and mp4 formats, with total duration of at least 30 seconds.

Request

Authorization
Provide your bearer token in the
Authorization
header when making requests to protected resources.
Example:
Authorization: Bearer ********************
Header Params
x-lang
enum<string> 
required
The language of your input voice. The option is limited to zh-CN or en-US. If you do not intend to clone Chinese language, please set it to en-US.
Allowed values:
en-USzh-CN
Example:
en-US
Body Params application/json
name
string 
required
the name of your volce clone. Any string that you can memorize
voice_urls
array[string]
required
train_mode
string 
required
The only option is best. Currently only the best mode is available for cloning large models in multiple languages. Please make sure you also properly set x-lang in the header of the request to match your language.
gender
string 
required
The gender: female or male. This is not currently affecting your result. But it is reserved for the next-gen algorithm.
Example
{
  "name": "your voice clone name",
  "voice_urls": [
    "https://abc.com/123.wav"
  ],
  "train_mode": "best",
  "gender": "female"
}

Request samples

Shell
JavaScript
Java
Swift
Go
PHP
Python
HTTP
C
C#
Objective-C
Ruby
OCaml
Dart
R
Request Request Example
Shell
JavaScript
Java
Swift
curl --location --request POST 'https://video.a2e.ai/api/v1/userVoice/training' \
--header 'x-lang: en-US' \
--header 'Content-Type: application/json' \
--data-raw '{
    "name": "your voice clone name",
    "voice_urls": [
        "https://abc.com/123.wav"
    ],
    "train_mode": "best",
    "gender": "female"
}'

Responses

🟢200training
application/json
Body
code
integer 
required
data
object 
required
name
string 
required
voice_urls
array[string]
required
current_status
string 
required
status of the training task. e.g. sent, processing, completed
The result is ready usually within 1 minute, then you see "completed" response.
train_mode
string 
required
_id
string 
required
the ID of the voice clone record. You must carefully save this id for later inquiry
gender
string 
required
createdAt
string 
required
updatedAt
string 
required
Example
{
  "code": 0,
  "data": {
    "_id": "67bc2c2cc0f5208c812f9438",
    "name": "your voice clone name",
    "voice_urls": [
      "https://dh24as48lv9ce.cloudfront.net/adam2eve/beta/user_voice_clone/2b3b0881-e9e9-4bc2-943f-c2127b4b4961/11月13日.MP3"
    ],
    "train_mode": "best",
    "gender": "female",
    "lang": "en-US",
    "current_status": "sent",
    "createdAt": "2025-02-24T08:22:04.285Z",
    "updatedAt": "2025-02-24T08:22:04.285Z"
  },
  "trace_id": "0f4de019-e41d-4dac-a070-f7bf38722786"
}
Previous
List Voice Clone Options
Next
List Ongoing Voice Clone Tasks
Built with