Skip to content

خطا در Fine tune #55

@MohammadNazari98

Description

@MohammadNazari98

توضیحات

من می‌خواهم مدل TTS فارسی را با استفاده از یک مدل pretrained را fine tune کنم.
در حین آموزش، با خطای FileNotFoundError برای فایل phoneme cache .npy مواجه می‌شوم، و بعد از آن، خطای PermissionError رخ می‌دهد وقتی که اسکریپت تلاش می‌کند پوشه experiment را حذف کند.


محیط اجرا

  • مدل: Persian TTS Female VITS
  • Checkpoint: best_model_30824.pth (فونم‌بیس)
  • Python: 3.10
  • Torch: 2.x
  • سیستم عامل: Windows 10
  • GPU: 1 عدد NVIDIA
  • دیتاست: حدود 100 فایل صوتی کوتاه فارسی

مراحل تکرار خطا

  1. تنظیم use_phonemes=True در کانفیگ
  2. بارگذاری مدل
  3. شروع fine-tune با دیتاست خودم

کدها

import torch

if torch.cuda.is_available():
    print("CUDA available:", torch.cuda.is_available())
    print("CUDA version:", torch.version.cuda)
    print("GPU count:", torch.cuda.device_count())
else:
    print("No GPU available. Training will run on CPU.")

CUDA available: True
CUDA version: 11.8
GPU count: 1
code='''import os

from trainer import Trainer, TrainerArgs

from TTS.tts.configs.shared_configs import BaseDatasetConfig , CharactersConfig
from TTS.config.shared_configs import BaseAudioConfig
from TTS.tts.configs.vits_config import VitsConfig
from TTS.tts.datasets import load_tts_samples
from TTS.tts.models.vits import Vits, VitsAudioConfig
from TTS.tts.utils.text.tokenizer import TTSTokenizer
from TTS.utils.audio import AudioProcessor
from TTS.utils.downloaders import download_thorsten_de

output_path = os.path.dirname(os.path.abspath(__file__))
dataset_config = BaseDatasetConfig(
    formatter="mozilla", meta_file_train="metadata_female.csv", path="data" 
)



audio_config = BaseAudioConfig(
    sample_rate=22050,
    do_trim_silence=False,
    resample=False,
    mel_fmin=0,
    mel_fmax=None 
)
character_config=CharactersConfig(
  characters='ءابتثجحخدذرزسشصضطظعغفقلمنهويِپچژکگیآأؤإئ',
  punctuations='!(),-.:;? ̠،؛؟‌<>',
  phonemes='ˈˌːˑpbtdʈɖcɟkɡqɢʔɴŋɲɳnɱmʙrʀⱱɾɽɸβfvθðszʃʒʂʐçʝxɣχʁħʕhɦɬɮʋɹɻjɰlɭʎʟaegiouwyɪʊ̩æɑɔəɚɛɝɨ̃ʉʌʍ0123456789"#$%*+/=ABCDEFGHIJKLMNOPRSTUVWXYZ[]^_{}',
  pad="<PAD>",
  eos="<EOS>",
  bos="<BOS>",
  blank="<BLNK>",
  characters_class="TTS.tts.utils.text.characters.IPAPhonemes",
  )
config = VitsConfig(
    audio=audio_config,
    run_name="vits_fa_female",
    batch_size=4,
    eval_batch_size=2,
    batch_group_size=5,
    num_loader_workers=0,
    num_eval_loader_workers=2,
    run_eval=True,
    test_delay_epochs=-1,
    epochs=200,
    save_step=500,
    text_cleaner="basic_cleaners",
    use_phonemes=True,
    phoneme_language="fa",
    characters=character_config,
    phoneme_cache_path=os.path.join(output_path, "phoneme_cache"),
    compute_input_seq_cache=True,
    print_step=250,
    print_eval=False,
    mixed_precision=False,
    test_sentences=[
        ["سلطان محمود در زمستانی سخت به طلخک گفت "],
        ["کارل و لرل کارها رو رله کردن "],
        ["مردی نزد بقالی آمد و گفت پیاز هم ده تا دهان بدان خو شبوی سازم."],
        ["سه سیر سرشیر سه شیشه شیر! "],
        ["از مال خود پاره ای گوشت بستان و زیره بایی معطّر بساز"],
        ["لورل روی ریل راه میرفت "],
        ["یکی اسبی به عاریت خواست"]
    ],
    output_path=output_path,
    datasets=[dataset_config],
)

# INITIALIZE THE AUDIO PROCESSOR
# Audio processor is used for feature extraction and audio I/O.
# It mainly serves to the dataloader and the training loggers.
ap = AudioProcessor.init_from_config(config)

# INITIALIZE THE TOKENIZER
# Tokenizer is used to convert text to sequences of token IDs.
# config is updated with the default characters if not defined in the config.
tokenizer, config = TTSTokenizer.init_from_config(config)

# LOAD DATA SAMPLES
# Each sample is a list of ```[text, audio_file_path, speaker_name]```
# You can define your custom sample loader returning the list of samples.
# Or define your custom formatter and pass it to the `load_tts_samples`.
# Check `TTS.tts.datasets.load_tts_samples` for more details.
train_samples, eval_samples = load_tts_samples(
    dataset_config,
    eval_split=True,
    eval_split_max_size=config.eval_split_max_size,
    eval_split_size=0.1,
)

# init model
model = Vits(config, ap, tokenizer, speaker_manager=None)

# init the trainer and 🚀
trainer = Trainer(
    TrainerArgs(),
    config,
    output_path,
    model=model,
    train_samples=train_samples,
    eval_samples=eval_samples,
)
trainer.fit()'''
f=open("train_output/train_vits.py","w",encoding="utf-8")

f.write(code)

f.close()
import torch
torch.cuda.empty_cache()
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:128"
!python "train_output/train_vits.py" \
--restore_path "models/female/best_model_30824.pth" \
--coqpit.run_name "vits-female-finetune" 

خطاها

> Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:20
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:True
 | > symmetric_norm:True
 | > mel_fmin:0
 | > mel_fmax:None
 | > pitch_fmin:1.0
 | > pitch_fmax:640.0
 | > spec_gain:20.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:False
 | > trim_db:45
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:10
 | > hop_length:256
 | > win_length:1024
 | > Found 111 files in D:\tts\data


> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: fa
		| > phoneme backend: espeak
| > Number of instances : 100
 | > Preprocessing samples
 | > Max text length: 3
 | > Min text length: 1
 | > Avg text length: 2.02
 | 
 | > Max audio length: 92546.0
 | > Min audio length: 28389.5
 | > Avg audio length: 51679.07
 | > Num. instances discarded samples: 0
 | > Batch group size: 20.
[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\site-packages\librosa\core\intervals.py:8](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/site-packages/librosa/core/intervals.py#line=7): UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import resource_filename
fatal: not a git repository (or any of the parent directories): .git
fatal: not a git repository (or any of the parent directories): .git
 > Training Environment:
 | > Backend: Torch
 | > Mixed precision: False
 | > Precision: float32
 | > Current device: 0
 | > Num. of GPUs: 1
 | > Num. of CPUs: 8
 | > Num. of Torch Threads: 4
 | > Torch seed: 54321
 | > Torch CUDNN: True
 | > Torch CUDNN deterministic: False
 | > Torch CUDNN benchmark: False
 | > Torch TF32 MatMul: False
 > Start Tensorboard: tensorboard --logdir=[D:\tts\train_output\vits-female-finetune-February-19-2026_11](file:///D:/tts/train_output/vits-female-finetune-February-19-2026_11)+54AM-0000000
 > Restoring from best_model_30824.pth ...
 > Restoring Model...
 > Restoring Optimizer...
 > Model restored from step 30824
[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\site-packages\trainer\trainer.py:561](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/site-packages/trainer/trainer.py#line=560): FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.
  self.scaler = torch.cuda.amp.GradScaler()

 > Model has 83063980 parameters

 > EPOCH: 0/200
 --> [D:\tts\train_output\vits-female-finetune-February-19-2026_11](file:///D:/tts/train_output/vits-female-finetune-February-19-2026_11)+54AM-0000000

 > TRAINING (2026-02-19 11:54:41) 
[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\site-packages\torch\functional.py:730](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/site-packages/torch/functional.py#line=729): UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at [C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\SpectralOps.cpp:880](file:///C:/actions-runner/_work/pytorch/pytorch/pytorch/aten/src/ATen/native/SpectralOps.cpp#line=879).)
  return _VF.stft(  # type: ignore[attr-defined]
[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\site-packages\TTS\tts\models\vits.py:1273](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/site-packages/TTS/tts/models/vits.py#line=1272): FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with autocast(enabled=False):  # use float32 for the criterion
[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\site-packages\TTS\tts\models\vits.py:1284](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/site-packages/TTS/tts/models/vits.py#line=1283): FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with autocast(enabled=False):
[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\site-packages\TTS\tts\models\vits.py:1311](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/site-packages/TTS/tts/models/vits.py#line=1310): FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with autocast(enabled=False):  # use float32 for the criterion
Traceback (most recent call last):
  File "[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\site-packages\TTS\tts\datasets\dataset.py", line 617](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/site-packages/TTS/tts/datasets/dataset.py#line=616), in compute_or_load
    ids = np.load(cache_path)
  File "[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\site-packages\numpy\lib\npyio.py", line 407](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/site-packages/numpy/lib/npyio.py#line=406), in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '[D:\\tts\\train_output\\phoneme_cache\\I3dhdnNc2b7Ysdiv2KfYsti0INi12K3bjNitINiv2KfYr9mH4oCM2YfYpyDZhduM4oCM2KrZiNin2YbYryDYp9mE2q_ZiNmH2KfbjNuMINix2Kcg2KLYtNqp2KfYsSDaqdmG2K8g2qnZhyDYr9ixINmG2q_Yp9mHINin2YjZhCDZgtin2KjZhCDZhdi02KfZh9iv2Ycg2YbbjNiz2KrZhtiv_phoneme.npy](file:///D://tts//train_output//phoneme_cache//I3dhdnNc2b7Ysdiv2KfYsti0INi12K3bjNitINiv2KfYr9mH4oCM2YfYpyDZhduM4oCM2KrZiNin2YbYryDYp9mE2q_ZiNmH2KfbjNuMINix2Kcg2KLYtNqp2KfYsSDaqdmG2K8g2qnZhyDYr9ixINmG2q_Yp9mHINin2YjZhCDZgtin2KjZhCDZhdi02KfZh9iv2Ycg2YbbjNiz2KrZhtiv_phoneme.npy)'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\site-packages\trainer\trainer.py", line 1833](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/site-packages/trainer/trainer.py#line=1832), in fit
    self._fit()
  File "[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\site-packages\trainer\trainer.py", line 1785](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/site-packages/trainer/trainer.py#line=1784), in _fit
    self.train_epoch()
  File "[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\site-packages\trainer\trainer.py", line 1503](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/site-packages/trainer/trainer.py#line=1502), in train_epoch
    for cur_step, batch in enumerate(self.train_loader):
  File "[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\site-packages\torch\utils\data\dataloader.py", line 733](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/site-packages/torch/utils/data/dataloader.py#line=732), in __next__
    data = self._next_data()
  File "[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\site-packages\torch\utils\data\dataloader.py", line 789](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/site-packages/torch/utils/data/dataloader.py#line=788), in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\site-packages\torch\utils\data\_utils\fetch.py", line 52](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/site-packages/torch/utils/data/_utils/fetch.py#line=51), in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\site-packages\torch\utils\data\_utils\fetch.py", line 52](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/site-packages/torch/utils/data/_utils/fetch.py#line=51), in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\site-packages\TTS\tts\models\vits.py", line 273](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/site-packages/TTS/tts/models/vits.py#line=272), in __getitem__
    token_ids = self.get_token_ids(idx, item["text"])
  File "[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\site-packages\TTS\tts\datasets\dataset.py", line 240](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/site-packages/TTS/tts/datasets/dataset.py#line=239), in get_token_ids
    token_ids = self.get_phonemes(idx, text)["token_ids"]
  File "[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\site-packages\TTS\tts\datasets\dataset.py", line 217](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/site-packages/TTS/tts/datasets/dataset.py#line=216), in get_phonemes
    out_dict = self.phoneme_dataset[idx]
  File "[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\site-packages\TTS\tts\datasets\dataset.py", line 602](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/site-packages/TTS/tts/datasets/dataset.py#line=601), in __getitem__
    ids = self.compute_or_load(string2filename(item["audio_unique_name"]), item["text"], item["language"])
  File "[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\site-packages\TTS\tts\datasets\dataset.py", line 620](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/site-packages/TTS/tts/datasets/dataset.py#line=619), in compute_or_load
    np.save(cache_path, ids)
  File "<__array_function__ internals>", line 180, in save
  File "[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\site-packages\numpy\lib\npyio.py", line 515](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/site-packages/numpy/lib/npyio.py#line=514), in save
    file_ctx = open(file, "wb")
FileNotFoundError: [Errno 2] No such file or directory: '[D:\\tts\\train_output\\phoneme_cache\\I3dhdnNc2b7Ysdiv2KfYsti0INi12K3bjNitINiv2KfYr9mH4oCM2YfYpyDZhduM4oCM2KrZiNin2YbYryDYp9mE2q_ZiNmH2KfbjNuMINix2Kcg2KLYtNqp2KfYsSDaqdmG2K8g2qnZhyDYr9ixINmG2q_Yp9mHINin2YjZhCDZgtin2KjZhCDZhdi02KfZh9iv2Ycg2YbbjNiz2KrZhtiv_phoneme.npy](file:///D://tts//train_output//phoneme_cache//I3dhdnNc2b7Ysdiv2KfYsti0INi12K3bjNitINiv2KfYr9mH4oCM2YfYpyDZhduM4oCM2KrZiNin2YbYryDYp9mE2q_ZiNmH2KfbjNuMINix2Kcg2KLYtNqp2KfYsSDaqdmG2K8g2qnZhyDYr9ixINmG2q_Yp9mHINin2YjZhCDZgtin2KjZhCDZhdi02KfZh9iv2Ycg2YbbjNiz2KrZhtiv_phoneme.npy)'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "[D:\tts\train_output\train_vits.py", line 106](file:///D:/tts/train_output/train_vits.py#line=105), in <module>
    trainer.fit()
  File "[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\site-packages\trainer\trainer.py", line 1860](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/site-packages/trainer/trainer.py#line=1859), in fit
    remove_experiment_folder(self.output_path)
  File "[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\site-packages\trainer\generic_utils.py", line 77](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/site-packages/trainer/generic_utils.py#line=76), in remove_experiment_folder
    fs.rm(experiment_path, recursive=True)
  File "[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\site-packages\fsspec\implementations\local.py", line 202](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/site-packages/fsspec/implementations/local.py#line=201), in rm
    shutil.rmtree(p)
  File "[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\shutil.py", line 750](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/shutil.py#line=749), in rmtree
    return _rmtree_unsafe(path, onerror)
  File "[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\shutil.py", line 620](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/shutil.py#line=619), in _rmtree_unsafe
    onerror(os.unlink, fullname, sys.exc_info())
  File "[D:\project\python\AI\DL\Persian_tts_voicetext\envtts\lib\shutil.py", line 618](file:///D:/project/python/AI/DL/Persian_tts_voicetext/envtts/lib/shutil.py#line=617), in _rmtree_unsafe
    os.unlink(fullname)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: '[D:/tts/train_output/vits-female-finetune-February-19-2026_11](file:///D:/tts/train_output/vits-female-finetune-February-19-2026_11)+54AM-0000000\\trainer_0_log.txt'

رفتار مورد انتظار

  1. فایل phoneme cache به صورت خودکار ساخته شود اگر موجود نیست.
  2. پوشه experiment بدون مشکل دسترسی حذف شود.

سوال

راه حل پیشنهادی برای رفع خطای phoneme cache و جلوگیری از PermissionError در ویندوز چیست؟
چطور مشکل phoneme_cache را حل کنیم؟

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions