Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

分享提取脚本 #11

Open
HuanLinOTO opened this issue Jan 23, 2023 · 4 comments
Open

分享提取脚本 #11

HuanLinOTO opened this issue Jan 23, 2023 · 4 comments

Comments

@HuanLinOTO
Copy link

HuanLinOTO commented Jan 23, 2023

const { appendFileSync, writeFileSync, existsSync, createReadStream } = require("fs");
const readline = require('readline');
const cliProgress = require('cli-progress')
// init a progress bar
console.log("读取文件...");
const result = require("./resultv33.1.json")
console.log("读取完成");

var chars = ['标贝'];

var isready = {'标贝': 0};

const check = (char)=>{
    // console.log(char);
    if(isready[char] == undefined) {
        isready[char] = chars.push(char)-1;
    }
    return isready[char];
}

const npclist = ["派蒙","流浪者","珐露珊","莱依拉","纳西妲","妮露","坎蒂丝","赛诺","多莉","提纳里","柯莱","鹿野院平藏","久岐忍","夜兰","空","荧","神里绫人","八重神子","云堇","申鹤","荒泷一斗","五郎","优菈","阿贝多","托马","胡桃","达达利亚","雷电将军","珊瑚宫心海","埃洛伊","宵宫","神里绫华","枫原万叶","温迪","刻晴","莫娜","可莉","琴","迪卢克","七七","魈","钟离","甘雨","早柚","九条裟罗","凝光","菲谢尔","班尼特","丽莎","行秋","迪奥娜","安柏","重云","雷泽","芭芭拉","罗莎莉亚","香菱","凯亚","北斗","诺艾尔","砂糖","辛焱","烟绯"]

const bar1 = new cliProgress.SingleBar({}, cliProgress.Presets.shades_classic);
writeFileSync("genshin_train.txt",'')
writeFileSync("genshin_val.txt",'')
// start the progress bar with a total value of 200 and start value of 0
bar1.start(Object.keys(result).length, 0);
var num = 0;
for (const key in result) {
    num++;
    bar1.update(num)
    const item = result[key];
    if(npclist.indexOf(item.npcName) == -1) continue;
    if(item.npcName == undefined || item.npcName.indexOf("#") != -1 || item.npcName.indexOf("test") != -1 || item.npcName.indexOf("?") != -1 || item.npcName.indexOf("?") != -1) continue
    if(item.language != "CHS") continue;
    if(item.type == "Card") continue;
    if(item.text == undefined) continue;
    if(!existsSync(`V33_Merged_Chinese_Wav/Merged_Chinese_Wav${item.fileName.replace("Chinese","").replaceAll("\\","/")}`)) continue;
    // 是否作为val集
    if(Math.random()*100%25>>>0 == 1)
        appendFileSync("./genshin_val.txt",`wavs${item.fileName.replace("Chinese","").replaceAll("\\","/")}|${check(item.npcName)}|${item.text.replace(/<.*?>/g,'').replace(/{.*}/gm,'').replaceAll("\\n",'').replaceAll("#",'')}\n`);
    else
        appendFileSync("./genshin_train.txt",`wavs${item.fileName.replace("Chinese","").replaceAll("\\","/")}|${check(item.npcName)}|${item.text.replace(/<.*?>/g,'').replace(/{.*}/gm,'').replaceAll("\\n",'').replaceAll("#",'')}\n`);
}

bar1.stop()
console.log("标贝");
const readInterface = readline.createInterface({
    input: createReadStream('biaobei_train.txt'),
});

readInterface.on('line', function(line) {
    // console.log(line);
        // 是否作为val集
        if(Math.random()*100%25>>>0 == 1)
        appendFileSync("./genshin_val.txt",line.replace("|","|0|")+"\n");
    else
        appendFileSync("./genshin_train.txt",line.replace("|","|0|")+"\n");
});

writeFileSync("chars.txt",JSON.stringify(chars)+"\n"+JSON.stringify(isready))

默认使用了标贝数据集作为 speaker 0 辅助其他speaker 如果无需标贝可以去掉L47-L59 L9-L11改成

var chars = [];

var isready = {};

其中 L21 表示需要提取的 speaker 列表, 可自己修改

需要 nodejs 16 以上环境 使用前先

npm i cli-progress
@HuanLinOTO
Copy link
Author

顺便给一下音频格式化的py脚本

from pydub import AudioSegment
import threading
import os
def classify(input,output):
    song = AudioSegment.from_file(input)
    song = song.set_channels(1)
    song = song.set_frame_rate(采样率)
    song.export(output, format='wav')
for root,dirs,files in os.walk("这里改成音频目录/"):
    for file in files:
        if(os.path.join(root,file).endswith(".wav")):
            print(os.path.join(root,file))
            threading. Thread(target=classify,args= (os.path.join(root,file),os.path.join(root,file)))
            classify(os.path.join(root,file),os.path.join(root,file))

@yz1392946854
Copy link

谢谢脚本,想请教一下什么是“标贝”数据集

@HuanLinOTO
Copy link
Author

谢谢脚本,想请教一下什么是“标贝”数据集

由标贝公司出品的免费、开源已标注语音数据集

@zhamao114514
Copy link

大佬,脚本怎么用啊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants