نوع مقاله : مقاله پژوهشی
عنوان مقاله English
نویسنده English
This paper provides a methodological overview of the stages involved in designing and constructing a speech corpus, with particular emphasis on the decisions and challenges that arise when collecting standardized speech data for experimental phonetics. The discussion draws on the author’s experience developing a speech corpus as part of a doctoral dissertation. Although the original corpus is bilingual, containing Persian and German data, the present article focuses exclusively on the Persian component. The paper describes the key stages of corpus construction, including the design and selection of speech elicitation tasks, speaker sampling, recording environment and conditions, data organization and naming conventions, and procedures for speech preprocessing and segmentation. It argues that corpus construction is not merely a technical or operational task, but a sequence of deliberate methodological decisions that directly affect data quality, the interpretability of results, and the reproducibility of analyses. In addition, the paper discusses the challenges of applying automatic segmentation methods to Persian speech, highlighting the limitations of automatic speech recognition and romanization tools in low-resource language contexts. It further argues that systematically documenting methodological decisions is itself an essential component of corpus construction, as such documentation promotes transparency, enables critical evaluation of results, and enhances the long-term scientific usability of the data.
کلیدواژهها English