Words have different meanings based on the context of the word usage in a
sentence. Word sense is one of the meanings of a word. Human language is ambiguous,
so that many words can be interpreted in multiple ways depending on the context in
which they occur. Word sense disambiguation (WSD) is the ability to identify the
meaning of words in context in a computational manner. WSD is considered an AIcomplete
problem, that is, a task whose solution is at least as hard as the most difficult
problems in artificial intelligence.
WSD can be viewed as a classification task: word senses are the classes, and an
automatic classification method is used to assign each occurrence of a word to one or
more classes based on the evidence from the context and from external knowledge
sources. WSD heavily relies on knowledge. Knowledge sources provide data which are
essential to associate senses with words.
The assessment of WSD systems is discussed in the context of the
Senseval/Semeval campaigns, aiming at the objective evaluation of systems participating
in several different disambiguation tasks. Here, some of the knowledge sources used in
WSD, different approaches for WSD (supervised, unsupervised and Knowledge-based )
and evaluation of WSD systems are discussed. The applications of WSD are also seen.
One of the first problems that any natural language processing (NLP) system encounters
is lexical ambiguity, syntactic or semantic. The resolution of a word’s syntactic ambiguity has
been solved in language processing by part-of-speech taggers with high levels of accuracy. The
problem of resolving semantic ambiguity is generally known as word sense disambiguation
(WSD) and has been proved to be more difficult than syntactic disambiguation.
Human language is ambiguous, so that many words can be interpreted in multiple ways
depending on the context in which they occur the identification of the specific meaning that a
word assumes in context is only apparently simple. Unfortunately, the identification of the
specific meaning that a word assumes in context is only apparently simple.
While most of the
time humans do not even think about the ambiguities of language, machines need to process
unstructured textual information and transform them into data structures which must be analyzed
in order to determine the underlying meaning. The computational identification of meaning for
words in context is called word sense disambiguation (WSD).
Words have multiple meaning based on the context of the word usage in a sentence.
Word Sense is one of the meanings of a word .Word Sense Disambiguation (WSD) is the ability
to identify the meaning of words in context in a computational manner. WSD is considered as an
AI-complete problem, that is, a problem which can be solved only by first resolving all the
difficult problems in Artificial Intelligence such as Turing Test.
Consider the following two
(a) I can hear bass sounds.
(b) They like grilled bass.
The occurrences of the word bass in the two sentences clearly denote different meanings: lowfrequency
tones and a type of fish, respectively. Here, the process WSD assigns correct
meaning to the word bass in the above two sentences as
(a) I can hear bass / low frequency tone sounds.
(b) They like grilled bass / fish.
WSD is one of the central challenges in Natural Language Processing(NLP). Many tasks
in NLP require diambiguation. Word Sense Disambiguation is needed in Machine Translation,
Information Retrieval , Information Extraction etc. WSD is typically configured as an
intermediate task, either as a stand-alone module or properly integrated into an application
WSD task description :
Word sense disambiguation is the ability to computationally determine which sense of a
word is activated by its use in a particular context. WSD is usually performed on one or more
texts .If we disregard the punctuation, we can view a text T as a sequence of words (w1, w2, . . . ,
wn), and we can formally describe WSD as the task of assigning the appropriate senses to all or
some of the words in T. That is, WSD identifies a mapping A from words to senses, such that
A(i) ⊆ SensesD(wi) where SensesD(wi) is the set of senses encoded in the dictionary D for word
wi. and A(i) is that subset of the senses of wi which are appropriate in the context T. The mapping A can assign more than one sense to each word wi ∈ T, although typically only the
most appropriate sense is selected, that is, | A(i) |= 1.
WSD can be viewed as a classification task where word senses are the classes, and an
automatic classification method is used to assign each occurrence of a word to one or more
classes based on the evidence from the context and from external knowledge sources. The task of
WSD involves two steps: (1) the determination of all the different senses for every word in the text
and (2) the assignment of each occurrence of a word to the appropriate sense. The assignment is
done by relying on two major sources of information, the context of the ambiguous word and
external knowledge sources.
There are two varients of generic WSD Task.
1) Lexical Sample, where a system is required to disambiguate a restricted set of target
words usually occurring one per sentence.Supervised systems are typically employed
in this setting, as they can be trained using a number of hand-labeled instances
(training set) and then applied to classify a set of unlabeled examples (test set)
2) All words WSD, where the systems are expected to disambiguate all open –class
words in a text (ie, nouns , verbs, adjectives etc). This task requires wide-coverage
systems. Consequently, purely supervised systems can potentially suffer from the
problem of data sparseness, as it is unlikely that a training set of adequate size is
available which covers the full lexicon of the language of interest. On the other
hand,other approaches, such as knowledge-lean systems, rely on full-coverage
knowledge resources, whose availability must be assured.
There are mainly four elements for performing WSD. They are
1) Selection of word senses
2) Use of external knowledge sources
3) Representation of context
4) Selection of an automatic classification method