Computer-Assisted Language Learning and the Revolution in Computational Linguistics
Computer-Assisted Language Learning (CALL) is the field concerned with the use of computer tools in second language acquisition. Somewhat surprisingly, perhaps, this field has never been closely related to Computational Linguistics (CL). Until recently, the two fields were almost completely detached. Despite occasional attempts to apply techniques of Natural Language Processing (NLP) to the recognition of errors, NLP in CALL has long remained in a very small minority position while CALL was hardly if at all recognized as a part of CL. In this contribution, I intend to show how CL could remain largely irrelevant to CALL for such a long time and why there is a good prospect that this will change in the near future. Section 1 describes the situation of CL before the revolution. In section 2, the crisis leading to the revolution in CL is outlined. The revolution itself is the topic of section 3. The implications for the field are then sketched in section 4. Finally, section 5 summarizes the conclusions.
2 Computational Linguistics as Natural Language Understanding
CL is almost as old as the first working computer. In fact, at a time when computer science was still in its infancy, Weaver (1955 ) had already proposed the use of computers for translation, thus initiating research in Machine Translation (MT). Weaver considered two approaches to MT, one based on linguistic analysis and the other on information theory. Neither of these could be implemented at the time of Weaver’s proposal. Information theory had been more or less fully developed by Shannon (1948), but its application to MT required computational power of a magnitude that would not be available for several decades. Linguistic analysis appeared more promising, because it can be performed with considerably less computational power, but the theoretical elements necessary for its successful application were still missing. Thus much work in early CL was devoted to developing the basic mechanisms required for linguistic analysis.
One of the first types of knowledge to be developed concerns the computational properties of formalisms to be used in the description of languages. In response to this requirement, the theory of formal grammars was developed, mainly in the course of the 1950s. Noam Chomsky played an active role in systematizing and extending this knowledge and Chomsky (1963) provides an early, fairly comprehensive overview of the properties of grammars consisting of rewrite rules of the general type as in (1).
In this approach, a formal description of a language consists of a set of rules in which and in (1) are replaced by strings of symbols. When designed properly, such a system of rules is able to generate sentences. If we consider a language as a set of sentences, we can see the grammar as a definition of the language. Different types of grammar impose different conditions on and . Thus, if in all rules of a grammar is not shorter than , it can always be determined by a finite procedure whether a given sentence belongs to the grammar or not. For Context-Free Grammars (CFGs), in which in (1) is a single symbol in each rule, the structure can be represented as a tree diagram.
The next step on the road to linguistic analysis in CL was the development of parsers. A parser is an algorithm to determine for a given sentence x and a grammar G whether G can generate x and which structure(s) G assigns to x. Ground-breaking work in this area was done in the 1960s with the development of the chart parser (cf. Varile 1983 for an overview), Earley’s (1970) efficient parser for CFGs, and the more powerful Augmented Transition Networks of Woods (1970).
With a grammar formalism and a number of parsing algorithms in place, the only missing link to successful linguistic analysis was the description of the relevant languages. As it turned out, however, this problem was more recalcitrant than the other two. Chomsky developed a theory of grammar using formal rules of the type in (1), but his theory is less congenial to CL than may appear at first sight. Chomskyan linguistics has often been considered as based on a concept of language as a set of sentences and some remarks by Chomsky (1957) can be taken to support this view. At least from the early 1960s onwards, however, Chomsky has consistently and explicitly rejected such a view in favour of language as a knowledge component in the speaker’s mind. Chomsky (1988) gives an accessible explanation and justification of the assumptions underlying this general approach and the type of linguistic theory it leads to.
Given this approach to language, there is no convergence in goals between Chomskyan linguistics and CL. Whereas the former is interested in describing and explaining a human being’s knowledge of language, the latter is interested in processing the products of language use on a computer. An example of this divergence is the reaction to the realization that transformational rules of the type used in Chomsky (1965) are excessively powerful. This excessive power appears both in language acquisition on the basis of input sentences and in language processing leading to the understanding of sentences and utterances. In Chomskyan linguistics it was not the processing complexity but only the learnability requirement of the grammar which drove the restriction of transformations. Chomsky’s linguistic theory continued to involve movement operations defined over nodes in a tree structure. In analysis, this requires the ‘undoing’ of movement, which is a computationally complex operation. Processing complexity of grammars produced in the Chomskyan framework has remained a major problem for their computational implementation, but this does not and need not inconvenience Chomskyan linguists. From the perspective of Chomskyan linguistics, as language is a typically human property, it is quite plausible that the human mind is structured so as to facilitate processing of the type necessary for human language. A computer does not have this structure.
From the 1970s onwards, a number of alternative linguistic theories have been developed with the computational implementation in mind. At present, the most influential ones are Lexical-Functional Grammar (LFG, cf. Bresnan 2001) and Head-Driven Phrase Structure Grammar (HPSG, cf. Pollard/Sag 1994). They still use rewrite rules of type (1) to some extent, but their actual formal basis is the unification of feature structures. Feature structures can be seen as sets of attribute-value pairs describing individual nodes in a tree structure. The formal device of feature structures and the operations on them were developed in full only in the 1980s. An early overview is Shieber (1986). By applying operations such as unification to feature structures, movement of nodes in a tree can be dispensed with. This is important for CL, because operations of this type are much more computer-friendly than undoing movement.
Given this historical development, it is understandable why for a long time research in CL, a significant part of which was at least in name devoted to MT, largely coincided with research in natural language analysis, i.e. parsing techniques and formal linguistic description. Work on different applications (e.g. MT, dialogue systems, text summarization) did not lead to major divisions in the CL research community, because in all such applications analysis was considered as the logical first step. This attitude is reflected in Kay’s (1973) proposal of a modular system of natural language understanding, the parts of which could be connected in different ways depending on the requirements of the application.
If major divisions in the CL research community could not be identified on the basis of different applications, one might wonder whether there was any other source of major divisions. Most of the discussions in CL turned on issues such as the choice of linguistic theory, formalism, and parsing strategy. Although in the perception of people working in the field, different positions on these issues led to a division into competing currents of research, they should not be confused with major divisions in the field. All of these currents were basically geared towards the same task and their success could be compared directly. This contrasts with the situati