[code record 0929] python voice to text, visual operation, preliminary version. The results will be stored in txt.

This code should be the most secure code translation example on CSDN.
Because although the example given by Baidu only needs more than a dozen lines of code to translate. But really using it make complaints about death. So the code snippet of this article adds a lot of code. The only thing you need to change is to create your own baidu account (general), and then get your own key.

Baidu Intelligent Cloud Description:

#Fill in the API of the application that has opened the "audio file transfer" interface in Baidu console_ Key and SECRET_KEY
These need to be opened in Baidu Intelligent Cloud for free for half a year.
Baidu Intelligent Cloud related links https://console.bce.baidu.com/ai/#/ai/speech/overview/index
The following is just an example: please go to Baidu to create your own, and then change it into your own in the code, otherwise it won't work. After receiving free resources, you can create an application

APP_ID = '24919015'
API_KEY = 'LXGs4gvpc81lSrzp94k4MIwm'
SECRET_KEY = 'H2Xzh2ltiQcedKsilcbI2apTZQppO4yw'

Operating environment description:

PYTHON3
Libraries to be installed: Baidu AIP and pydub
Audio AIDS file ffmpeg to install Installation tutorial
Some other audio related operations are also included

Code flow description:

1. Select the audio file that needs to be converted to text. Double click on the computer to support all the audio files that can be played
2. Divide the audio file according to a fixed length of time. In the code, divide a wav file every 30 seconds and save it
3. Many segmented audio files are converted into pcm format one by one using ffmpeg again
4. Send pcm format files to Baidu to obtain the translation results
5. Write the results to a local txt file

(the reason why it is so complicated is that it has to be translated into pcm, because other format files are always translated into the underworld.)

Full code:

from aip import AipSpeech
from pydub import AudioSegment
from pydub.utils import make_chunks
import os
import tkinter
from tkinter import filedialog

'''The program will open the select Folder dialog box and manually select the file to unzip'''
root = tkinter.Tk()
root.withdraw()
# Get selected files
Filepath = filedialog.askopenfilename(title='Please select the audio file to translate and select open')

print('Because you clicked cancel, you didn't get the file. Please select the file again')  if Filepath == '' else print(
    f'File selection succeeded=> {Filepath} \n')

APP_ID = '24919015'
#Fill in the API of the application that has opened the "audio file transfer" interface in Baidu console_ Key and SECRET_KEY
API_KEY = 'LXGs4gvpc81lSrzp94k4MIwm'
SECRET_KEY = 'H2Xzh2ltiQcedKsilcbI2apTZQppO4y'
# Parameters obtained from Baidu AI Library
client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)
# Construct a function to read voice files
def get_file_content(filePath):
    with open(filePath, 'rb') as fp:
        return fp.read()

"""Split local files by fixed duration and convert to pcm Format Retranslation"""

# Load local file
local_audio = AudioSegment.from_file(Filepath, Filepath.split(".")[-1])
size = 30*1000  # Set how many milliseconds each block is divided into. 1 second = 1000 milliseconds
chunks = make_chunks(local_audio, size)  # Cut the file into pieces
wav_file_name = []
pcm_file_name = []
orial_dir_address = os.path.dirname(Filepath)
print("orial_dir_address",orial_dir_address)
file_sep0 = '\\' if '\\' in Filepath else '/'
save_file_name = os.path.splitext(Filepath)[0].split(file_sep0)[-1]

save_file_path = orial_dir_address + file_sep0 + save_file_name
if not os.path.exists(save_file_path):
    print(f"establish{save_file_path}Folder, audio files will exist here in the future")
    os.mkdir(save_file_path)
else:
    print(save_file_path,"Already exists")

print("The original file is being split into wav format")
for i, chunk in enumerate(chunks):
    # i is the index and chunk is the cut file
    chunk_name = f"_chuank{i}.wav"
    # Save file
    # File path
    file_path = save_file_path + file_sep0 + chunk_name
    chunk.export(file_path, format="wav")
    wav_file_name.append(file_path)
    pcm_file_name.append(file_path.split(".")[0]+".pcm")
print("be-all wav_file_name",wav_file_name)
print("be-all pcm_file_name",pcm_file_name)
print("After segmentation, start converting to pcm format")

for x in range(len(wav_file_name)):
    os.system(f"ffmpeg -y -i {wav_file_name[x]} -acodec pcm_s16le -f s16le -ac 1 -ar 16000 {pcm_file_name[x]} -loglevel quiet")
    print(f"{x+1}/{len(wav_file_name)}_pcm Successful conversion",pcm_file_name[x])
print("pcm After the conversion, start to send a translation request to Baidu")
# Identify local files
content = []
error_content = []
for index,path in enumerate(pcm_file_name):
    try:
        results = client.asr(get_file_content(path), 'pcm', 16000, { 'dev_pid': 1537,})
        print(f"{index+1}/{len(pcm_file_name)}",results["result"])
        content.append("".join(results["result"]))
    except Exception as e:
        print(path,"There was an error reported here!!!!!!!!!!!!!")
        error_content.append(path)
        print(e)
print("After Baidu translation, start writing local files")
with open(f"{save_file_path}_Translation results.txt",'w') as f:
    for con in content:
        f.write(con+'\n')
    print(f"Your translation results are stored in the following directory:{save_file_path}_Translation results.txt In the file")
if len(error_content) > 0 :
    print("This audio is not translated well, please supplement it manually",error_content)

Display of operation results

Start directly visualizing and selecting audio files to bid farewell to the trouble of changing file names each time, and then wait quietly for the results

result

Tags: Python AI

Posted on Wed, 29 Sep 2021 15:16:01 -0400 by mjax