Competition in science: *stela.manova@univie.ac.at* has been hacked!
I currently work on the relationship between Large Language Models (LLMs) and Linguistic Theory (LT). In LT, at least since Aronoff's (1976) Word Formation in Generative Grammar, it has been claimed that morphemes are not associated with semantics (see also a-morphous morphology, word-based morphology and paradigm-based morphology; likewise recent research in psycholinguistics provides evidence for morphemes that have form but no meaning, e.g. for the human parser, -er in corn-er is a morpheme, although corner is not derived from corn), which is entirely in accord with what is going on in LLMs. The major difference between LLMs and LT consists in the fact that linguists love making things abstract and hierarchical, while in LLMs things are concrete and linear (OK, at an abstract level everything is possible, i.e. all analyses work perfectly :) ; to better understand the LLMs perspective, think of representations, e.g., in terms of a binary code). I also do research on scientific text writing and programming with ChatGPT. For coding experiments, I use Python 3 and tasks from sites such as CodeSignal and LeetCode. Don't hesitate to contact me if you are interested in any of these issues!
A related project is Language Modeling with N-Grams: The End-to-end N-Gram Model (EteNGraM). This paper introduces EteNGraM, a toy model for NLP. EteNGraM operates only with bigrams and trigrams but appears more efficient than current syntactic models. If you are a linguist, you should try writing texts with n-grams, i.e. based solely on the frequency of occurrence of sequences of word forms, e.g. with the multilingual Google Books Ngram Viewer. The paper ChatGPT, n-grams and the power of subword units: The future of research in morphology tackles the question of how humans and machines process language and can be accessed here.
Since 2020, I have also been the main organizer of a series of workshops titled Dissecting Morphological Theory: Diminutivization. The workshops are held in conjunction with different international conferences. You can access the workshop-series website here. The most recent output, the book Diminutives across Languages, Theoretical Frameworks and Linguistic Domains (De Gruyter Mouton, Dec. 2023), can be read online here.Â