Namaeh Farhangistan

Namaeh Farhangistan

An Investigation of the Advantages and Disadvantages of Using Internet-Based Sources in a Lexicographic Corpus

Document Type : Original Article

Abstract
This article investigates the use of internet as a partial corpus for lexicographic projects. For this purpose, examples are selected from The Comprehensive Persian Dictionary to show that albeit a large linguistic corpus had been already available to the compilers, they had but to use internet-based citations due to their proximity to everyday language in common use. Having shown the significance of such data, the present author then draws upon Prat & Fuertes Olivera (2016) to explore not only the advantages and disadvantages of internet evidence but the active role compilers can play in using such resources. The author subsequently examines the strategies employed in The Comprehensive Persian Dictionary and provides example entries toward a SWOT (Strengths, Weaknesses, Opportunities, and Threats) analysis.
Keywords

خطیبی، ابوالفضل (۱۳۸۶)، «ﻓﺮﻫﻨﮓ ﺟﺎﻣﻊ ﺯﺑﺎﻥ ﻓﺎﺭﺳﯽ، پیکره ﺩﺭ ﻓﺮﻫﻨﮓ ﻧﻮیسی ﻓﺎﺭﺳﯽ ﻭ پیکره ﺯﺑﺎﻧﯽ ﺭﺍیانه‌ﺍﯼ»، فرهنگ‌نویسی، شمارۀ ۱، صفحه‌های ۴ـ۶۷.
فرهنگ جامع زبان فارسی (۱۳۹۲و۱۳۹۵)، زیر نظر علی‌اشرف صادقی، فرهنگستان زبان و ادب فارسی، تهران.
De Groc, C. (2011, August). Babouk: Focused web crawling for corpus compilation and automatic terminology extraction. In 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology (Vol. 1, pp. 497-498). IEEE.
Fuertes-Olivera, P. A. (2012), “Lexicography and the Internet as a (Re-) source”, Lexicographica, 28(1), 49-70.
Gudmann, H. R. (2014), Betydningshuller i Spanske Ordbøger. En Undersøgelse af Betydningsenheder i Spanske Monolingvale Almene Receptionsordbøger, M.A. Thesis. Aarhus: Aarhus University, Department of Business Communication.
Kilgarriff, A. (1997), “I don’t believe in word senses”, Computers and the Humanities, 31(2), 91-113.
Kilgarriff, A. and G. Grefenstette (2003), “Introduction to the Special Issue on the Web as Corpus”, Computational Linguistics 29(3): 333-347.
Tarp, S. T., and Fuertes-Olivera, P. A. (2016), “Advantages and disadvantages in the use of internet as a corpus: The case of the online dictionaries of Spanish Valladolid-UVa”, Lexikos, 26(1), 273-295.
Zgusta, L. (1989), “Probable future developments in lexicography”, Hausmann, FJ et al.(Eds.), 1991, 3157-3167.