📜 Linguistics Papers

Intro to Linguistics · Prof. Jan Osborn · Fall 2004

Talking to Machines

Course: Eng 471-01: Intro. to Linguistics
Date: December 1, 2004
Focus: Natural language interfaces, predictive text, search engines, grammar checking

Abstract

In this paper I attempt to examine the budding linguistic relationship between human beings and computers through phonetically predictive text messaging, word processing functions (mainly grammar checking), and search engines. Each of these technologies requires human language as input and attempts to analyze and interpret that language for output. I surveyed 45 college students, heavy users of these technologies, to determine how often they were using them, how satisfied they were with them, and how accurate the technologies performed their jobs. I found that although they were generally satisfied with most of them, there was a general desire improvement before a natural linguistic user interface with computers could be achieved.

Introduction

Personal computing is more accessible now then it has ever been — especially in industrialized nations — yet there is still a significant knowledge barrier that keeps us from using a computer as easily as we use our brains. Computers are kind of complicated, and many people do not feel motivated to learn how to take advantage of all those bells and whistles. In order to get around this roadblock, and allow everyone easy access to the immense benefits of personal computing, researchers all over the world are attempting to teach computers to understand us on levels that previously only another human being could.

It is commonly assumed, both in and out of the linguistic community, that human beings are the only entities capable of using language as we understand it. However, more and more linguists are being recruited to teach human language to another kind of intelligence. As they further their understanding of how we pass meaning between one another, they turn around and teach what they know to computers. Eventually they hope to make computer interfaces that can converse in natural human language with users.

It's happening slowly in many different areas. The natural starting point is for computers to learn how to better understand the deeper meanings in our written text, since that is already how we communicate with them. For example, researchers from Cornell University are working hard on a computer program that can tell how objective or subjective a given block of text is by weeding out neutral sentences, and counting the frequency of emotionally laden sentences that are left behind. This will allow them to create search engines that can actually sift through volumes of information at a semantic or contextual level.

Elsewhere, researchers at IBM and Rensselaer Polytechnic Institute are working on a program called Brutus.1 that will be able to write fictional narratives comparable to that of any human writer. Brutus.1's capabilities are not yet that advanced, but it is able to generate short paragraphs of fiction that are surprisingly good, as I will demonstrate in my research.

As amazing as these ideas are, if the systems these engineers dream up are not easy and intuitive then people are not going to use them. Some might say that we can barely keep up with the technology that we already have. This raises a few questions. There are computer programs that are made to interpret human language at a rudimentary level. How are people using them? Are they satisfied with their experience, or do they get frustrated when the computer cannot understand what it is they want the way another human would?

Research Methodology

To try to answer these questions, I surveyed 45 college students. I chose this population because the 18-24 age group is generally considered to be more comfortable with the basic language technologies that exist today: mainly predictive text messaging, word processing functions like spell and grammar check, and internet search engines. College students, in particular, incorporate these into their everyday lives for academic and social reasons.

For each of the technologies I asked students to rate their level of use, satisfaction, and openness to technological upgrades. I also asked them to rate the computer's performance in the three distinct tasks. I delineated between genders, however I did not attempt get an equal number of responses from males and females.

Findings

Predictive text messaging is an algorithm that tries to guess what a cell phone user is trying to type, in order to lessen the number of keystrokes that are required to type a message on a telephone keypad. I found that a majority of the people I surveyed, even those with cell phones, do not use predictive text messaging. The most often cited reason among cell phone owners was that they did not feel like learning how to use the system. The people who did use predictive text messaging reported generally high levels of satisfaction on their part, and accuracy on the cell phone's part however.

In comparison to predictive text messaging, word processing and search engine use was almost universal among the study participants with only four people saying that they "never use a spell or grammar check" and one person saying he "never uses a search engine". People seemed the most satisfied by the skills of word processors, but search engines were not far behind. People also seemed to harbor less resentment against the shortcomings of these technologies compared to predictive text messaging, with mostly all of the participants claiming they would be "very open" to improvement. A vast majority of them also said that these technologies only made mistakes in interpreting them occasionally.

Conclusion

My findings surprised me in some ways. To be honest, I expected young people to be more frustrated with the performance of word processors and search engines, and feel more benignly about text messaging on cell phones. The opposite appears to be the case. This may be just as well though, since predictive text messaging deals with language at a phonetic level, whereas search engines and grammar checks deal with it at a syntactic and semantic level. It is more important that technology is successful at these levels if the dream of a natural linguistic user interface is to become a reality. The fact that people who do actually use predictive text messaging seem pretty happy with it implies that it is only a learning curve, however, that keeps people from using it and not an actual flaw in the technology.

Overall, the results point to a general restlessness in the generation that is poised to reap the most benefits from technologies that process natural language in the next decades. Although participants reported moderately high levels of satisfaction, they were more eager for improvements than anything else. This shows a willingness to keep learning and growing with linguistic technology, rather than just getting frustrated and giving up.

On a final note, there is one area where computers may progress surprisingly faster than one may intuitively expect: fiction writing. The last question of my survey asked participants to compare a passage from J.D. Salinger's Catcher in the Rye to a passage written by the computer program I mentioned earlier, Brutus.1. 38 out of 45 respondents chose the passage written by Brutus.1 as the one that was "written the best".

Syntax

Course: Intro to Linguistics
Date: November 3, 2004
Word Count: 619
Focus: Phrase structure grammar, discrete combinatorial systems, machine language

What fascinates me the most about syntax is the structure that it provides to language, which I used to think of as amorphous. Also, it illuminates a lot of the problems found in the teaching of language rules to children. The "subject/predicate" model works in its own way, but does not actively reflect the natural way in which people use language. Phrase structure grammar is a system I personally find effortless to learn, most likely because it is a more accurate reflection of the way my mind is working to construct meaning.

The conventional wisdom about language seems to be the view that languages are just utterly and arbitrarily different from one another. The phase structure of grammar, however, shows that all human language works in basically the same manner. Language takes a finite set of parts and creates an infinite number of expressions with them using the basic structure of the phrase. Because of this quality, Pinker refers to language as a discrete combinatorial system.

I had heard of the concept of a discrete combinatorial system earlier, but after learning how phrase structure grammar works I found a lot of symmetry with the rest of reality in it. Other examples of discrete combinatorial systems kept springing to mind, like our bodies for example. They use basically the same set of stem cells and can turn them into (as far as we can discern) an infinite number of different types of cells for many different uses. The DNA within cells, in turn, uses four different nucleotides to make the infinite variety of proteins found in every living organism on the planet. The molecules within DNA, moving deeper, use a finite set of atoms to form the infinite number of possible compounds and chemicals that scientists can imagine and create. This sort of symmetry is the ultimate form of proof of any concept for me, and it reflects the fact that our brains follow the same sorts of rules as the rest of the universe.

Phrase structure also opens up a possibility I had never really considered before, that machines could be programmed to communicate with us as naturally as we communicate with each other. "The infinite use of finite media distinguishes the human brain from virtually all the artificial language devices we commonly come across, like pull-string dolls, cars that nag you to close the door, and cheery voice mail instruction…all of which use a fixed list of prefabricated sentences." Notice that Pinker says "virtually all" and "commonly come across." This implies that there are artificial language devices approaching the complexity necessary to model language.

The University of Kyoto's graduate school of Informatics is one example of actual work that is being done in this area. One of their programs involves linking linguists, software programmers, and other specialists "to establish an academic discipline and technologies of handling and understanding language by machines." Another example is the Persona project that Microsoft Research is working on. They hope to produce the technologies required to create conversational assistants, or lifelike animated characters that interact with the user in a natural spoken dialogue.

If and when scientists succeed in teaching machines the basic syntax of human language, the implications for society seem pretty profound. Language barriers will no longer be a roadblock to information. Anyone, including illiterates and people who have little to no computer experience will have access to the vast amounts of knowledge currently stored in the world's computers. Perhaps the most profound implication is for the machines themselves. If we could make machines that can use language as naturally as we do, they might even be able to convince us that they are people too.

Works Cited: Pinker, Stephen. The Language Instinct. New York: William Morrow and Company, 1994.

Stingy With Words

Course: Intro to Linguistics
Date: October 19, 2004
Focus: Morphology, mentalese, the primacy of meaning over form

The concept in morphology that most stands out to me is linguist Steven Pinker's suggestion that people are "stingy with words, and profligate with meanings". This statement rings true with me on a number of levels because it connects with other linguistic concepts in a way that supports them, and allows me to extend the ideas of morphology into other areas.

Pinker says that few meanings have more than one word assigned to them, whereas words can easily take on a number of addition meanings. Even synonyms differ slightly in their definitions. This strongly implies that meanings or concepts are the more important of the two. Rather than maintain word-purity, humans innately try to preserve the clarity of ideas, which makes logical sense when one's goal is communication.

This idea also supports Pinker's language = instinct philosophy in a very concrete way, through "arbitrariness of the sign". Since it is the meaning that is important, it does not matter how you choose to express it as long as everyone agrees on that particular method. Discovering the meanings of words when they are arbitrarily assigned is a very prickly problem that gets easier when you apply the "stingy with words" philosophy. Pinker's response to the "gavagai problem" shows that children actually expect this characteristic of language, and have an innate resistance to learning synonyms. This in turn helps them to learn new words more efficiently by simplifying the possibilities. This is further proof that language is not a human product or creation, but rather something that arose spontaneously without our knowledge or consent.

The idea of concepts being sacred also shatters a personal view of mine. I used to believe very strongly that one's language defines the boundaries of their thought processes. When I started to learn Japanese, I realized that people with different languages were not just saying things differently, they were thinking differently. That was not as obvious in a language like Spanish, which I had also studied. While I still think there is a strong link between language and the "shape" of one's mind, the idea of meanings being more important than words overtly conflicts with this notion. Pinker's idea of "mentalese" works better at the individual level. Culturally, however, the conforming effects of language on the individual consciousness are undeniable.

Perhaps the ultimate example of our tendency to be stingy with words but profligate with meanings is our naming system. People use mostly the same birthnames over and over again, with certain names obviously being added and dropped over time by the whims of cultural taste. A new meaning for one of these particular words is created for every person whose name we bother to learn. Each is an individual "listeme" for that entity in our little mentalese dictionary. This is fundamental stuff. The sounds we pick for our own names reflect the very dominance that our ideas have over our language.

Works Cited: Pinker, Stephen. The Language Instinct. New York: William Morrow and Company, 1994.

Semantics

Type: Class Notes
Topic: How language means; language in context

I. Examples of How Language Means

An invitation that listed "Newport Casual" as the preferred attire.
"The Family Circus" cartoon for 9/12/02 that said, "September Mourn."
A Cover Girl mascara advertisement that reads, "dramateyes."
A holiday card with a frog hanging over the door that reads, "mistletoad."
A photo of a boy standing in a shopping cart and fishing at Echo Park. The headline: "Fish a la Cart."
A "Gaytionary" with definitions for "Closet Case" (n. one who is secretly gay), "Bicurious" (adj. questioning heterosexuality), "Fag Hag" (n. straight woman who likes gay men), and "Gaytionary" (n. dictionary of gay terms).
A Los Angeles Times headline that reads, "The Inside Dope on '420' Buzz."

II. Semantic Continuums

A visual format that helps visualize that words mean in relation to other words:

Hate — despise — dislike — enjoy — like — cherish — Love

Dry — damp — moist — wet — soggy — drenched — soaked

III. "Languages are stingy with words and profligate with meanings"

Homonyms: identical forms with different meanings — sound and spelled the same, different meanings (pen/pen)
Homophones: forms which sound the same but are spelled differently (tier/tear)
Homographs: forms which sound different but are spelled the same — a written signifier with two signified (tear/tear — only the same graphically)

IV. Meaning is Economical

Polysemy: an economy in the number of signs used — can use words in multiple senses (screen)
Antanaclasis: "the heart of the matter is often the matter of the heart"
Zeugma: associating a verb with two or more words: "open the window; open your heart"

With polysemy, the speaker is aware they are using the same word with different meanings. With homonymy, the speaker is aware they are using different words whether they sound the same or not.

V. Synonyms

Several signifiers correspond to the same signified. Few if any true synonyms exist — there are always shades of meaning.

VI. Connotation

Example: "feminists" — can be taken as positive or negative depending on context and speaker.