SOFTWIN announces the start of the project Romanian Language Phonetic Analysis: Study and applications (AFLR), contract: 332/2014, Submission Code: PN-II-PT-PCCA-2013-4-1451.

SOFTWIN is one of the Romanian institutions dealing with Natural Language Processing. The project Romanian Language Phonetic Analysis: Study and applications (AFLR) aims to develop various products with scientific and commercial value, based on the contributions realized so far by SOFTWIN and its partners (linguistic knowledge bases, linguistic tools, linguistics applications): a Phonetic Study for Romanian Language, starting from a significant set of linguistic knowledge formalized in GRAALAN metalanguage (about 100 000 lemmas, 12 500 000 of analytical inflected forms, 1 250 000 of synthetic inflected forms etc.), a Romanian Morphological and Phonetic Dictionary, a Phonetic Dictionary of Romanian Syllables and an Application of Speech Recognition for Romanian Language.

The project will attest the possibility of a formalized approach of Romanian phonetics, over a large lexicon, that will cover approx. 90.000-100.000 lemmas out of 120.000 forms (170.000 lemmas + variants), starting from the Explanatory Dictionary of the Romanian Language (Dictionarul explicativ a al limbii române), Small Academy Dictionary (Micul Dictionar al Academiei – 4 volumes), and the Thesaurus Dictionary of the Romanian Language (Dictionarul Tezaur al Limbii Române – 19 volumes).

Project objectives

  1. Updating, completing, and enriching the linguistic knowledge bases for Romanian in order to cover approx. 90.000 – 100.000 words. Apart from the phonetic aspect, there are other components that need to be taken into account, such as rules for developing the phonetic databases (i.e. phonetic rules and syllabification rules), as well as the lexicon, or the inflection rules.
  2. Elaborating the first morphological and phonetic dictionary to cover a large number of entries, of approx. 90.000-100.000 lemmas and corresponding paradigms (i.e. single-word forms – approx. 1.250.000 distinct forms corresponding to approx. 2.500.000 inflection situations – and multi-word forms – approx. 12.500.000 distinct forms corresponding to approx. 18.750.000 inflection situations). 
  3. Developing an application that will generate all single-word and multi-word forms, associated to their corresponding phonetic transcription.
  4. Developing a phonetic dictionary of the syllables of the Romanian language, and of the corresponding words.
  5. Design of an application of Automatic Speech Recognition for Romanian starting from the phonetic analysis of the syllables corresponding to the Romanian words.

