An introduction to Conceptual Speech Recognition
Conceptual Speech Recognition (CSR) was invented in the
early 2000’s by Philippe Roy, the founder of Conceptual Speech Technologies, LLC. After more
than three years of research and development, Conceptual Speech Recognition is
ready for deployment.
Conceptual Speech Recognition brings a new dimension to the
analysis of speech. We think a formal Conceptual Speech Recognition that
best summarizes the technology reads as follow:
“Conceptual Speech Recognition is a method that uses phonetic, statistical,
syntactic and conceptual analysis to extract a concept from speech.”
An important element in Conceptual Speech Recognition is
based on the premises that the purpose of speech is to convey concepts.
Conceptual Speech Recognition is the result of a deep and accurate
understanding of human speech. Indeed, Conceptual Speech Recognition and human
speech both share the same method and goal. They both seek to identify concepts
conveyed from speech, and use words as well as syntax only as building blocks
to get to concepts. As a direct result of this approach, users are no longer
restricted to speaking commands using pre-defined words, and can issue commands
in their own words so long as the concept is conveyed. The slogan of Conceptual
Speech Technologies, LLC tries to efficiently convey this reality by stating
the following: It’s not how you say it,
it’s what you mean!
Many studies demonstrate that syntax is in fact transient
in human speech. A listener quickly discards the syntactic structure where the
concept constructed from it is itself held more permanently. In other words,
human brains are fashioned for concepts, and use syntax (which includes words)
mainly as a tool to convey or recognize concepts.
Communicating words is obviously not the purpose of
speech. If a human says to another “Blue Thrown River gone”, there is no real
communication between the two. The speaker didn’t affect the listener with
anything useful. The research community came to this understanding a long time
ago, but then most of them missed an important step by not looking beyond
syntax. If a human says to another “Red Rounded House Eats a Squared Ship”,
there is still no valid communication between them even though the syntactic
structure of that sequence of words is valid. This is an expression of the fact
that to be syntactically valid is a necessity in speech, but it is not
sufficient.
What is the reaction of a person that hears “Red Rounded
House Eats a Squared Ship”? The reaction can essentially be presumed to the
following: “That doesn’t make sense”! Translation: “I cannot extract a concept
from what I just heard”. And that is correct, since the cognitive process
actually involves looking to build a syntactic structure that makes sense so
that it can be integrated in the web of already existent concepts in the
listener’s brain.
Conceptual Speech Recognition goes beyond syntactic
analysis. It follows the same steps and has the same goal as human-to-human
successful communications; that being to extract a valid concept from a
communication.
What is a concept?
A concept is a
relationship between at least one Object of Knowledge and at least one action,
one state or one other concept.
There are consequently
different types of concepts.
· A state concept: (“The water is cold”). That concept
associates a state (“cold”) to an Object of Knowledge (“water”).
· An action concept (“Move the red block to the right”).
That concept describes a desired or executed action related to an Object of
Knowledge.
· A relationship concept (“The red circle is bigger than
the blue square”). That concept relates two or more Objects of Knowledge with
some differentiation or categorization.
· A recursive concept (“The fact that I did not know
about that technology before troubles me”). That concept has an entire concept
as an Object of Knowledge.
· A complex concept. Concepts that have configurations of the preceding
concept types.
Most importantly, a concept does not hold any syntactic
related content. The concept of “the
pen that writes with blue ink” is the same as the concept of “the blue pen”
even though they were formulated using different words and syntactic
structures.
How does Conceptual Speech Recognition differ from
grammar-based Speech Recognition?
Conceptual Speech Recognition uses different technologies
and achieves objectives that are different from grammar-based Speech
Recognition systems. First, Conceptual Speech Recognition does not force the
speaker to utter commands in a predetermined fashion. In grammar-based Speech
Recognition, speech recognition is based on sound matching, and if the system
handles a command like “Connect me to technical support”, a speaker may not
speak “I wish that you could transfer me to tech support”.
In Conceptual Speech Recognition, a speaker may use any
sequences of words he or she wishes (as long as they are defined conceptually
in the system), in the order that he wishes, as long as those words being
sequenced together generate a valid concept for that system. Valid concepts are
defined within each Conceptual Speech Recognition system. For example, a speaker calling an airline
response system is expected to ask about flight information (valid concepts)
and should not be expected to ask questions about their bank account (not valid
concepts in an airline response system). We refer to this feature as the
“conceptual domain”.
Furthermore, with Conceptual Speech Recognition systems, a
speaker may state a request or command that contains more than one concept, and
the system is capable of giving a response for each concept conveyed. As an
example, a speaker may say something like “Give me my account information and
then transfer me to tech support”. This is not possible in current state of the
art grammar based systems.
An introduction to Conceptual Speech Commander and CLUE
Conceptual Language Understanding Engine (CLUE) is a product of Conceptual Speech Technologies, LLC (CST) that analyzes text input in order to extract and react to concepts conveyed. CLUE is based on Conceptual Speech Recognition technologies for which there are United States and international patents pending. CST’s other product from which CLUE is derived is Conceptual Speech Commander (CSC), which analyzes speech input to extract and react to concepts conveyed.
Overview of the Key
Processes in Conceptual Speech Recognition
To this day, manipulating concepts and syntactic
structures has been out of the ordinary for any computer-related field. Such
manipulation obviously required special languages and techniques in order to
become an efficient means to solve real-life problems. Conceptual Speech
Technologies, LLC had to solve such problem.
Following is a representation and corresponding comments
of key elements in Conceptual Speech Recognition.
|
Figure
1. Key elements of Conceptual
Speech Recognition.
|
Here are some comments related to
Figure 1:
1. Sound is the medium of speech. Sound
is not speech. In that sense sound recognition alone is not the best tool for
performing speech recognition since it involves only analyzing the medium of
speech, and not speech as a whole.
2. Sound analysis is a process that uses phonetics to produce potentially spoken
words from sound exclusively. Sound
analysis is not a phonetic
process. If sound analysis were a phonetic process, an English listener would
be able to create potentially spoken Chinese words from a Chinese utterance
simply because he is equipped to do it in English. Also, since words are
actually syntactic elements, it becomes obvious that mapping sounds to words is
a syntactic process that only uses phonetics to reach its transient goal.
3. The
set of potentially spoken words
produced from sound analysis
exclusively is large, and the actual spoken words are somewhere in that large
set of potentially spoken words. Sound
analysis, left alone, generates ambiguity,
as represented in the wider circle of figure 1.
4. Following
the sound analysis process is a
sequence of other processes used to disambiguate
the result produced by sound analysis.
5. The
syntactic analysis process isolates
a subset of sequences of potentially spoken words that are syntactically valid and the actual spoken words are within that
subset. Syntactic analysis, left alone, can not identify the correct sequence
of spoken words from the sound analysis results, since to be syntactically
correct is not sufficient in speech.
6. The
conceptual analysis process further
isolates a subset of sequences of potentially spoken words
that are not only syntactically valid,
but that also form a sequence that generates a valid concept.
7. More than one valid concept can be
formed through conceptual analysis
(as represented by the smallest circle). Nevertheless, the identification of
the most probable conveyed concept
(as represented by the point in the middle of the smallest circle) is the objective of speech and is the goal of Conceptual Speech Recognition.
|
Figure
2. Input and output of Conceptual
Speech Recognition processes.
See Figure 2.
|
The Sound Analysis process
The Sound Analysis process analyzes the audio to build a
list of potentially spoken words with associated parts of speech based
exclusively on 2 elements: 1) phonetics rules, and 2) the dictionary of
association of phoneme streams to spellings and atom parts of speech. This
process generates a list containing numerous sequences of potentially spoken
words. The actual sequence of spoken words is somewhere within this list. The
potentially spoken words are kept in an ordered list that also identifies word
boundaries in the analyzed audio input.
Note for this pre-release: In early Conceptual Speech Recognition
developments, phoneme recognition was intended to generate a phoneme stream
with associated probabilities, followed by a phoneme stream analysis process
that generates the potentially spoken words list. Although that approach
succeed, our research lead us to conclude that using a Viterbi algorithm
produced vastly improved results. A Viterbi algorithm uses biasing for words
that have a conceptual definition on sound samples, so that potentially spoken
words that are defined in the conceptual domain are detected and well
positioned in the sound sample.
The Syntactic Analysis process
Syntactic Analysis is not limited to identifying the subset
of sequences of words that have a high potential of being syntactically valid.
It also builds a Syntactic Hierarchy that will later be referred to in the
Conceptual Analysis process.
|
Figure
3. Syntactic Hierarchy under “has
flight six hundred been delayed”
|
Syntactic Hierarchy:
The structured syntactic organization based on a parent – child – sibling
relationship of nodes that are associated with each node.
Node: A word
or sequences of words that are generated by sound analysis (potentially spoken
words) or derived through a Syntax Transform Script. Each node holds a spelling,
a part of speech, and a Syntactic Hierarchy.
Part of speech:
The syntactic related nature of a node. A part of speech can either be an atom
part of speech or a derived part of speech. An atom part of speech is
associated to a potentially spoken word recognized by the Sound Analysis
process that refers to a pronunciation/spelling/part of speech association from
the dictionary (like NOUN, VERB, ADJECTIVE, etc). A derived part of speech is
associated to a node generated by a Syntax Transform Scripts (like SENTENCE,
VERB_PHRASE, etc).
Syntax Transform Script:
The set of rules used to transform atom and derived parts of speech into other
derived parts of speech.
In order to perform syntactic analysis, a new scripting
language was required. That scripting language should represent in a compact
and accurate fashion the rules used to isolate some sequences of words that
have a high potential of being syntactically valid. Conceptual Speech
Technologies, LLC invented a language called Syntax Transform Scripts for such
purpose. As part of every Conceptual Speech product, a set of base syntactic
rules are compiled into Syntax Transform Scripts and are made available. Those
rules refer to spellings and atom parts of speech of potentially spoken words
recognized during the Sound Analysis process. Some atom parts of speech
associated with words are stored in a dictionary.
Parts of speech can also be added and associated to already existent or new
words from Dictionary Explorer.
Syntax Transform Scripts create derived parts of speech.
As an example, the dictionary does not hold any words or sequences of words of
the SENTENCE part of speech type. The SENTENCE part of speech type that uses an
entire utterance is of interest to us since it delimits a completed thread of
thought to be analyzed. Indeed, when conceptual analysis is performed, it
requires a complete thread of thought in order to be able to respond to it.
So, how is a SENTENCE part of speech constructed from atom
parts of speech like VERB, NOUN and other atom parts of speech? The Syntax
Transform Script is a scripting language that creates new words and sequences
of words, called nodes, while assigning them a corresponding part of speech
which itself can be used as a basis to build even more complex sequences like a
SENTENCE part of speech node.
The
syntax of Syntax Transform Scripts is as follows:
- Between ‘[‘ and ‘]’ is a node description. This
description may relate to the part of speech associated with the node or its
spelling.
- [ADJECTIVE] represents a node
with an associated part of speech ADJECTIVE.
- [“have”] refers to nodes that
have the spelling ‘have’ regardless of its part of speech.
- [VERB & “is” | “was”] refers
to nodes that have a spelling of ‘is’ or ‘was’ and that are a VERB part of
speech.
- [VERB & “*ed”] refers to
nodes that have a spelling that ends with the letters ‘ed’ and that are a VERB
part of speech (as an example the VERB ‘derived’).
- Between ‘(‘ and ‘)’ are conditional sequences to fulfill
for the sequencing to succeed.
- ([ADVERB])[ADJECTIVE] represents
a sequencing of a potential ADVERB followed by a mandatory ADJECTIVE. The
sequence ‘more transparent’ and the sequence ‘transparent’ would both succeed
in this case.
- The assignment operator is ‘->’. After the
‘->’
characters is the part of speech of the new node to create for succeeding
sequences.
- ([ADVERB])[ADJECTIVE] ->
ADJECTIVE_PHRASE
That preceding Syntax Transform
Script line states that each ADVERB node (optional) followed by an ADJECTIVE
node (mandatory) are transformed into new nodes with associated part of speech
ADJECTIVE_PHRASE and spelling that is the concatenation of each of the
spellings.
- Prior to the ‘:’ character on a Syntax Transform Script
line is the line’s name.
- ADJECTIVE PHRASE CONSTRUCTION 1: ([ADVERB])[ADJECTIVE] ->
ADJECTIVE_PHRASE
That preceding Syntax Transform
Script line is named ‘ADJECTIVE PHRASE CONSTRUCTION 1’.
A Syntax Transform Script that holds rules related to the
English language is part of Conceptual Speech Commander and CLUE. A programmer can
modify them as he or she wishes based on the needs of the context that needs to
be covered. As a reference, rules to date are as follows:
|
Figure 4. Base Syntax Transform Script
ADJECTIVE PHRASE CONSTRUCTION 1: ([ADVERB])[ADJECTIVE] -> ADJECTIVE_PHRASE
ADJECTIVE PHRASE CONSTRUCTION 2: [ADJECTIVE_PHRASE][CONJUNCTION & "and" | "or"][ADJECTIVE_PHRASE] -> ADJECTIVE_PHRASE
VERB CONSTRUCTION 1: [VERB & "is" | "was" | "will" | "have" | "has" | "to" | "will be" | "have been" | "has been" | "to be" | "will have been" | "be"][VERB] -> VERB
GERUNDIVE ING: [VERB & "*ing"] -> GERUNDIVE_VERB
GERUNDIVE ED: [VERB & "*ed"] -> GERUNDIVE_VERB
PLAIN NOUN PHRASE CONSTRUCTION:([DEFINITE_ARTICLE | INDEFINITE_ARTICLE])([ORDINAL_NUMBER])([CARDINAL_NUMBER])([ADJECTIVE_PHRASE])[NOUN | PLURAL | PROPER_NOUN | TIME | DATE | PRONOUN] -> NOUN_PHRASE
NOUN PHRASE UNION: [NOUN_PHRASE][CONJUNCTION & "and" | "or" | "until" | "before" | "since"][NOUN_PHRASE] -> NOUN_PHRASE
PREPOSITION PHRASE CONSTRUCTION 1: [PREPOSITION][NOUN_PHRASE] -> PREPOSITION_PHRASE
PREPOSITION PHRASE CONSTRUCTION 2: [PREPOSITION_PHRASE][PREPOSITION_PHRASE] -> PREPOSITION_PHRASE
VERB PHRASE CONSTRUCTION 1: [VERB][NOUN_PHRASE]([PREPOSITION_PHRASE]) -> VERB_PHRASE
VERB PHRASE CONSTRUCTION 2: [VERB][PREPOSITION_PHRASE] -> VERB_PHRASE
VERB PHRASE CONSTRUCTION 3: [ADJECTIVE_PHRASE][PREPOSITION][VERB] -> VERB_PHRASE
GERUNDIVE PHRASE CONSTRUCTION:[GERUNDIVE_VERB]([NOUN_PHRASE])([VERB_PHRASE])([ADVERB]) -> GERUNDIVE_PHRASE
NOUN PHRASE CONST WITH GERUNDIVE: [NOUN_PHRASE][GERUNDIVE_PHRASE]([GERUNDIVE_PHRASE])([GERUNDIVE_PHRASE]) -> NOUN_PHRASE
PREPOSITION PHRASE CONSTRUCTION 3: [PREPOSITION][GERUNDIVE_PHRASE] -> PREPOSITION_PHRASE
RESTRICTIVE RELATIVE CLAUSE: [WH_PRONOUN & "who" | "that" | "where" | "when"][VERB_PHRASE] -> REL_CLAUSE
NOUN PHRASE WITH REL_CLAUSE: [NOUN_PHRASE][REL_CLAUSE] -> NOUN_PHRASE
VERB PHRASE WITH REL_CLAUSE: [VERB_PHRASE][REL_CLAUSE] -> VERB_PHRASE
VERB PHRASE CONSTRUCTION 4: [VERB][NOUN_PHRASE][REL_CLAUSE]([PREPOSITION_PHRASE]) -> VERB_PHRASE
WH_PRONOUN CONSTRUCTION 1: [WH_PRONOUN][CONJUNCTION & "and" | "or"][WH_PRONOUN] -> WH_PRONOUN
VERB PHRASE CONSTRUCTION 5: [VERB][NOUN_PHRASE][GERUNDIVE_PHRASE]([GERUNDIVE_PHRASE])([GERUNDIVE_PHRASE])([PREPOSITION_PHRASE]) -> VERB_PHRASE
VERB PHRASE CONSTRUCTION 6: [VERB][NOUN_PHRASE][ADJECTIVE_PHRASE] -> VERB_PHRASE
VERB PHRASE CONSTRUCTION 7: [VERB][NOUN_PHRASE][VERB] -> VERB_PHRASE
VERB PHRASE CONSTRUCTION 8: [VERB_PHRASE][NOUN_PHRASE][GERUNDIVE_PHRASE]([GERUNDIVE_PHRASE])([GERUNDIVE_PHRASE])([PREPOSITION_PHRASE]) -> VERB_PHRASE
VERB PHRASE CONSTRUCTION 9: [VERB_PHRASE]([NOUN_PHRASE])[ADJECTIVE_PHRASE] -> VERB_PHRASE
VERB PHRASE CONSTRUCTION 10: [WH_PRONOUN][VERB_PHRASE] -> VERB_PHRASE
VERB PHRASE CONSTRUCTION 11: [VERB_PHRASE][NOUN_PHRASE]([PREPOSITION_PHRASE | GERUNDIVE_PHRASE]) -> VERB_PHRASE
WH_NP CONSTRUCTION 1: [WH_PRONOUN][NOUN_PHRASE] -> WH_NP
WH_NP CONSTRUCTION 2: [WH_PRONOUN][ADJECTIVE]([ADVERB]) -> WH_NP
WH_NP CONSTRUCTION 3: [WH_PRONOUN][ADVERB][ADJECTIVE] -> WH_NP
WH_NP CONSTRUCTION 4: [WH_NP][CONJUNCTION & "and" | "or"][WH_NP | WH_PRONOUN] -> WH_NP
SENTENCE CONSTRUCTION 1: ([WH_NP])[VERB_PHRASE]([PREPOSITION & "at" | "in" | "of" | "on" | "for" | "into" | "from"]) -> SENTENCE
SENTENCE CONSTRUCTION 2: ([WH_NP])([AUX])[NOUN_PHRASE][VERB_PHRASE | VERB]([PREPOSITION & "at" | "in" | "of" | "on" | "for"])([ADVERB]) -> SENTENCE
SENTENCE CONSTRUCTION 3: [NOUN_PHRASE]([PREPOSITION & "at" | "in" | "of" | "on" | "for" | "into" | "from"])[WH_NP] -> SENTENCE
SENTENCE CONSTRUCTION 4: [SENTENCE]([CONJUNCTION & "and" | "or" | "if"])[SENTENCE] -> SENTENCE
|
When executing a Syntax Transform Script, only sequences
of potentially spoken words that respect the rules stated AND that also
respected pronunciation boundaries (as stored during the Sound Analysis
process) will be successfully transformed into new nodes.
IMPORTANT: As a
final note related to Syntax Transform Scripts, it is important to keep in mind
that by changing those rules, one can increase or decrease the performance of a
conceptual speech recognition system significantly. Indeed, syntactic analysis
can be the performance bottleneck of Conceptual Speech Recognition. Syntactic
Analysis must be dealt with carefully since it is fairly easy to produce rules
that are just, but also that also end-up requiring so much processing time that
their use becomes impractical. The purpose of Syntactic Analysis is not to
identify sequences of potentially spoken words that are syntactically correct,
but, rather to identify sequences of potentially spoken words that have a high
potential of being syntactically correct. Syntactic Analysis works closely with
Conceptual Analysis and both processes working together result in identifying
spoken concepts. It is often easier and more efficient to create conceptual
rules in Predicate Builder Scripts that will exclude node sequences than it is
to create syntactic rules that will stay generic enough to be reused while
being specific enough to exclude undesired cases.
The Conceptual Analysis Process
Once a highly probable valid syntactic sequence of
potentially spoken words has been detected, Conceptual Analysis gets to work. As
stated earlier, Syntactic Analysis does not exclusively identify the sequence
of potentially spoken words that have a high potential of being syntactically
valid, it also generates a Syntactic Hierarchy associated with the sequence (as
seen in Figure 3). Each Syntactic Hierarchy generated by the Syntactic Analysis
process is the input for Conceptual Analysis.
Constructs, Predicates and Predicate Builder Scripts are
widely used in Conceptual Analysis. To better understand Conceptual Analysis,
they first need to be defined, and some of their behavior also needs to be
explored.
Predicate Builder Script:
A script used by the Predicate Builder Parser in order to generate Predicates.
Predicate Builder Scripts are written in the Predicate Builder Language.
Predicate Builder Script example. The
Predicate Builder Script of the NOUN “arrival time” in AirlineSystemData.
IFNOT (OBJECT;NULL)
CLEARCONSTRUCT(<temp.result>)
IFNOT(WORKINGPREDICATEADDRESS;NULL)
SETEVALCONSTRUCT(<temp.result>) {REPLACEFILLER(WORKINGPREDICATEADDRESS;<MOOD
[CLASS:INTEROGATIVE] [QUERY:?]>;<AIRLINEPOSTANALYSIS [OPERATION:REPORT
[VALUE:ARRIVALTIME] [OBJECT:>OBJECT<]]>)}
ENDIF
IF (TEMP.RESULT;NULL)
SETEVALCONSTRUCT(<temp.result>) {<MOOD [CLASS:INTEROGATIVE] [QUERY:AIRLINEPOSTANALYSIS
[OPERATION:REPORT [VALUE:ARRIVALTIME] [OBJECT:>OBJECT<]]]>}
ENDIF
TEMP.RESULT
CLEARCONSTRUCT(<temp.result>)
ENDIF
Construct: A
content holder that may be parsed through a Predicate Builder Parser so that
its content can be transformed based on the content of other constructs.
Predicate Builder Parser:
A process that evaluates Predicate Builder Scripts and generates a Predicate as
a result. In Conceptual Speech Commander and CLUE, multiple instances of Predicate
Builder Parser can coexist.
A Predicate Builder Script is text that is tokenized and
then interpreted as Conceptual Speech Commander attempts to build Predicates.
Anything between the characters ‘<’ and ‘>’, is interpreted and anything
outside is simply copied. If the content is not between the ‘<’ and ‘>’
characters, it expects the content to be a construct name (which will be
replaced by its parsed associated content).
To define a construct, the Dictionary Explorer can be
used, or the SETCONSTRUCT and SETEVALCONSTRUCT built-in constructs can be used
in a Predicate Builder Script:
SETCONSTRUCT(<CONSTRUCT_NAME_1>)
{<Non-interpreted content>}
SETEVALCONSTRUCT(<CONSTRUCT_NAME_2>) {<This construct contains
>CONSTRUCT_NAME_1}
Upon evaluation of the second line, the construct CONSTRUCT_NAME_2
shall then hold the content <This construct contains Non-interpreted
content>.
There are three types of constructs. Built-in constructs,
global constructs, and local constructs.
- Built-in constructs:
These are constructs like ‘IF’, ‘ELSE’ and ‘ENDIF’ that are hard-coded in
Conceptual Speech Commander and CLUE; effectively part of the Predicate Builder
Language.
- Global constructs:
Constructs that are defined by the user and that start with a ‘.’ Character.
These are accessible to all Predicate Builder Parser instances.
- Local constructs:
Constructs that are defined by the user and that do not start with a ‘.’
Character. These are specific to a given Parser instance.
Conceptual Analysis:
The process by which the medium and syntactic aspect of language is excluded in
order to expose its concept in a standardized representation named a conceptual
representation.
Conceptual Analysis is actually an expression of
Conceptual Dependency (CD). Conceptual Dependency was invented in the 1970’s
under the vision and leadership of Roger C. Schank[1].
Conceptual Analysis in Conceptual Speech Recognition is based on CD, but it is
not CD. Many books were written on CD. As a reference, the following books can
be consulted to get a wider view on the subject.
Schank, Roger C. and Colby,
Kenneth M., Computer models of thought
and language, W.H. Freeman and Company, San Francisco, 1973, ISBN
0-7167-0834-5.
Riesbeck, Christopher K. and
Schank, Roger C., Inside case-based
reasoning, Lawrence Erlbaum associates publishers, New Jersey, 1989, ISBN
0-89859-767-6.
Riesbeck, Christopher K. and
Schank, Roger C., Inside computer
understanding, Lawrence Erlbaum associates publishers, New Jersey, 1981,
ISBN 0-89859-071-X.
Dyer, Michael G., In-Depth understanding, The MIT Press,
Massachusetts, 1986, ISBN 0-262-04073-5.
Whether or not you are familiar with Conceptual
Dependency, the following should be of interest to you to introduce the
mechanisms of Conceptual Analysis in Conceptual Speech Commander and CLUE.
Words are often thought to be associated meanings. That is
not entirely accurate. Words are in fact associated rules necessary in order to
build meanings provided that they are put in relationship with other words.
For example, if someone speaks “build”, unless there is a
context that makes the communication obvious or the spoken word is completed
with gesture to fill the void left by that partial verbal communication;
nothing can be understood. So, “build”, as well as all other words, does not
have a concept associated with it. On the other hand, if “Build a paper plane”
is spoken, a complete concept can be induced by such communication. So, putting
the words ‘build’, ‘a’, ‘paper’ and ‘plane’ in relationship results in a valid
concept being produced. But what does the “build a paper plane” concept
contain?
- The concept conveyed is an order.
- The concept conveyed originated by the speaker who asks
the listener(s) to build something.
- That something to build is a flying object made of
paper material.
So, each word used to produce that concept needs to hold
rules in order to produce the concept provided that they were put in
relationship together in a single utterance. In Conceptual Speech Recognition,
those rules are called Predicate Builder Scripts (PBS). Predicate Builder
Scripts shall be explained, but first we need to understand the nature of
Predicates.
A Predicate is an expression of the type:
For a Predicate to be valid there must be at least one
role-filler pair and the order of role-filler pairs in the Predicate is
irrelevant. The primitive used is limited to reduce atom actions that can be
performed in the context or predetermined state primitives. Fillers may be a
value, a Predicate, or a construct.
In CD, but not in Conceptual Speech Recognition’s
Conceptual Analysis, there are a limited set of primitives that are expected.
This set of primitives is interesting since it can be used as a guideline for
early conceptual design required in the development of a Conceptual Speech
Recognition systems.
Physical actions:
PROPEL: application of physical force to an object.
MOVE: movement of a body part by its
owner.
INGEST: ingestion of an object by an
animal (e.g. eat).
EXPEL: expulsion of something from
the body of an animal (e.g. cry).
GRASP: grasping of an object by an
actor.
Actions, which might be characterized through the resulting state
changes:
PTRANS: transfer of the physical location of an object.
ATRANS: transfer of an abstract
relationship (e.g. ownership).
Actions, which might be seen as an instrument for other actions:
SPEAK: production of sounds.
ATTEND: focusing of a sense organ
towards a stimulus.
Mental actions:
MTRANS: transfer of mental information.
MBUILD: building new information out
of old (e.g. decide).
Any other action:
DO: non-primitive action described
through role-filler pairs.
A key state primitive in CD is PP. PP stands for Picture
Producers. Any physical object that can produce a picture in someone’s mind is
a Picture Producer (PP). A person, “John” as an example, can be a PP if he’s
intended to be described as a physical object. We all know that “John” also has
some dreams, aspirations, and a couple of defects, he has a family and also
friends. So, “John” is more than a physical entity. But, as an example,
depending on the system that is built, “John” could be reduced to a useful
representation for the context analyzed as following:
[CLASS: HUMAN]
[ID: 8745356]
[NAME: John Walker]
[ADDRESS: 51 Future Avenue, Utopia City, CA 98456]
[TELEPHONE: 555-1212]
In CD, as well as during Conceptual Analysis in Conceptual
Speech Recognition, Predicates are used to hold conceptual representations. An
utterance like “Build a paper plane” could result in the following Predicate:
MOOD [CLASS: ORDER]
[ORIGIN: PP [CLASS: HUMAN] [ID: SPEAKER]]
[DESTINATION: PP [CLASS: HUMAN] [ID: LISTENER]]
[OBJECT: CONSTRUCT [OBJECT: PP [CLASS: PLANE] [MATERIAL: PP [CLASS: PAPER]]]
Note that many valid Predicates could represent that
concept, there is not one solution that fits all. In the preceding Predicate,
the CONSTRUCT primitive was used. The CONSTRUCT primitive is not a CD
primitive. But, for the purpose of the development of a Conceptual Speech
Recognition system, we may choose for CONSTRUCT to become a primitive.
Depending on the context of the analysis, a deeper conceptual representation
could be attained (going as far as using exclusively Schank’s primitives if desired). But, as far as
conceptual analysis in the context of Conceptual Speech Recognition is
concerned, reducing primitives to atom primitives of our choosing in that context
only is sufficient. The most important thing in Conceptual Analysis is not to
be right, but to be consistent. That is, an observer could not come into the
construction of a Conceptual Speech Recognition system late in the process,
look at a Predicate and simply state that it is incorrect. That is irrelevant.
What truly is relevant is that Predicate Builder Scripts be generic enough for
the language to be used relatively loosely, while generating Predicates that
are well handled by the Post-Analysis process which generates a corresponding
and expected response.
Predicates are interesting since many useful operations
can be performed on them. As an example, a car could be described as:
PP [CLASS:VEHICLE]
[TYPE:CAR]
And a “blue car” as:
PP [CLASS: VEHICLE]
[TYPE:CAR]
[COLOR:BLUE]
If someone asks “is a blue car a car”? That can be
translated to a simple Predicate Builder Script operation in Conceptual Speech
Commander:
ISEQUAL(<PP[CLASS:VEHICLE][TYPE:CAR][COLOR:BLUE]>;<PP[CLASS:VEHICLE][TYPE:CAR]>)
The ISEQUAL built-in construct is one of many extremely
powerful built-in constructs. In this case, the ISEQUAL Predicate Builder
Script operation would return OK (meaning that it succeeded). The behavior of
the ISEQUAL Predicate Builder Script operation is as following. If the first
predicate passed to it has more role-filler pairs than the second one, then the
operation returns OK if all role-filler pairs are the same and the primitive is
the same. If both predicates would be inverted, indirectly asking “is a car a
blue car?”, the ISEQUAL Predicate Builder Script operation would not return OK
(meaning that it failed) since a car is not necessarily blue.
But there is even more power to a simple Predicate Builder
Script command like ISEQUAL. Constructs can be used in the ISEQUAL Predicate
Builder Script command (as well as many others). A useful ISEQUAL command using
constructs could read as following:
ISEQUAL(<PP[CLASS:VEHICLE][TYPE:CAR][COLOR:BLUE]>;
<PP[CLASS:VEHICLE][TYPE:CAR][COLOR:>CONSTRUCT_COLOR<]>)
That could
be translated to “is a blue car a car that has a color defined?”, in which case
the ISEQUAL Predicate Builder Script operation would return OK (meaning that it
succeeded), and upon return the construct named CONSTRUCT_COLOR would hold the
value “BLUE” meaning that it not only detected that it had a COLOR role, but
BLUE is the filler associated with it.
The
HASPREDICATE is as powerful as ISEQUAL, and behaves in a similar fashion except
that it can detect the targeted Predicate (second parameter) deep into the
given Predicate (first parameter). As an example, the statement “John
remembered that he gave his blue car to Paul” could have the following
Predicate as a conceptual representation:
MTRANS [ACTOR:
PP[CLASS:HUMAN][ID:JOHN]]
[MOBJECT: ATRANS[OBJECT:
PP [CLASS: VEHICLE]
[TYPE:CAR]
[COLOR:BLUE]]
[ORIGIN: PP [CLASS:HUMAN][ID:JOHN]]
[DESTINATION: PP [CLASS:HUMAN][ID:PAUL]]
[TIME:
PAST]]
[FROM: LTM[2][TO: CP]
[TIME: PAST]
We can put
this Predicate in the construct named ‘TRANSACTION’ for later processing.
Someone
may ask “Does the fact that John remembered that he gave his blue car to Paul
have anything to do with a car”? Such statement could be translated into the
following Predicate Builder Script line:
HASPREDICATE(TRANSACTION;<PP[CLASS:VEHICLE][TYPE:CAR]>)
In which
case, the HASPREDICATE operation would return OK since the predicate
PP[CLASS:VEHICLE][TYPE:CAR] can indeed be found in the construct TRANSACTION.
But, once again, the full power of Predicate Builder Scripts can be unleashed
through the use of constructs. Assuming that we are looking for a transaction
of any type, the following operation could be invoked:
HASPREDICATE(TRANSACTION;<ATRANS[OBJECT:>OBJECT_CONSTRUCT<]
[ORIGIN:>ORIGIN_CONSTRUCT<][DESTINATION:>DESTINATION_CONSTRUCT<]>)
Such
statement could be translated to something like “Does the fact that John
remembered that he gave his blue car to Paul have anything to do with an object
changing possession”? In which case, the result will be OK (meaning a success)
and, after evaluation, the construct OBJECT_CONSTRUCT will hold the content
<PP[CLASS:VEHICLE][TYPE:CAR][COLOR:BLUE]>, the construct ORIGIN_CONSTRUCT
will hold the content <PP[CLASS:HUMAN][ID:JOHN]> and the construct
DESTINATION_CONSTRUCT will hold the content <PP[CLASS:HUMAN][ID:PAUL]>.
The power of
Predicates with Predicate Builder Scripts to efficiently manipulate conceptual
representations is enormous.
For those who are experts in CD, it is important to note
that Conceptual Speech Technologies, LLC takes into account and is aware that
CD is well known to have some limitations. More specifically, three major
limitations are often associated with CD.
- Reducing every conceptual representation to primitives
is extremely difficult.
- The choice of primitives to represent every possible
concept is a source of debate more than a source of consensus.
- Efforts related to frames (acquired knowledge) and
scripting (flow of concepts) in CD have not succeeded in producing strong
results yet.
- To overcome these limitations, Conceptual Analysis in
Conceptual Speech Recognition is built around the following principles:
- There is no need to reduce conceptual representations
to primitives.
-
If reducing concepts to primitives is the programmer’s preference, the set of primitives used is open. Consequently, the mostly academic debate is avoided and any set of primitives may be used.
-
Frames and scripting are not used by Conceptual Speech
Recognition. The approach only requires a limited conceptual understanding in
order to achieve its useful purpose, and although they may be useful in the
context of dictation and transcription, they are not required for command and
control environments like the one in Conceptual Speech Commander and CLUE.
Now that
Constructs, Predicates and Predicate Builder Scripts have been defined and some
of their behavior explored, it is possible to get deeper into the mechanism of
Conceptual Analysis.
For each
Syntactic Hierarchy that was successfully constructed in the Syntactic Analysis
process, Conceptual Analysis is called, and the SENTENCE part of speech node
goes through Conceptual Analysis. This process continues until all possible
Syntactic Hierarchies have been processed.
A SENTENCE
part of speech node may include other SENTENCE parts of speech nodes as descendants
in its Syntactic Hierarchy. Conceptual Speech Commander and CLUE will always parse the
lowest SENTENCE part of speech node from the Syntactic Hierarchy. For example,
take the sentence “Has flight 600 been delayed and how late is it”? The
SENTENCE part of speech node is actually built from a SENTENCE part of speech
node with the CONJUNCTION “and” and another SENTENCE part of speech node.
Accordingly, Conceptual Speech Commander and CLUE start parsing from the first SENTENCE
part of speech node prior to the conjunction “and”.
Understanding the Working
Predicate is a key concept that is necessary in understanding
Conceptual Analysis. The Working Predicate is the unique Predicate that is
built during Predicate Builder parsing. It holds the Predicate that starts with
no value (NULL) and then is augmented to the point where it contains the
Predicate describing the complete SENTENCE part of speech node when all the
word’s Predicate Builder Scripts have been parsed. The Working Predicate is
kept and maintained in the local construct WORKINGPREDICATE.
The first
thing to do when parsing a SENTENCE part of speech node is to identify its
Object of Knowledge. The Object of Knowledge is kept in the local construct
OBJECT and refers to the element that the SENTENCE part of speech node is
about. As an example, for the SENTENCE part of speech node “has flight 600 been
delayed”, the Object of Knowledge is “flight 600”. What is being talked about?
What object is the inquiry about? Those are the questions that help in
identifying what should become the Object of Knowledge. Objects of knowledge
are generally NOUN_PHRASE that have no GERUNDIVE_PHRASE or REL_CLAUSE in their
Syntactic Hierarchy.
Here are some examples of Objects of Knowledge for
SENTENCE part of speech nodes.
|
SENTENCE part of speech node
|
Object of Knowledge
|
|
Has
flight 600 been delayed?
|
“flight
600”.
|
|
Are
flight 600 and 122 late?
|
“flight
600 and 122”.
|
|
Tell
me how long before flight 600 will arrive.
|
“me”
is the Object of Knowledge for the node “tell me” and “flight 600” is the
Object of Knowledge for the node “how long before flight 600 will arrive”.
|
Consequently,
in order to identify the Object of Knowledge for the current SENTENCE part of
speech node being parsed, each NOUN_PHRASE parts of speech node’s Predicate
Builder Script under the SENTENCE part of speech node that do not have a
GERUNDIVE_PHRASE and a REL_CLAUSE part of speech node will be invoked with the
OBJECT construct being NULL (identifying that no Object of Knowledge was yet
identified) and the construct WORKINGPREDICATE is also NULL.
As an
example, the Predicate Builder Script of the PRONOUN “me” can then be defined
as following:
<PP [CLASS:HUMAN] [ID:SPEAKER]>
Which
identifies the Object of Knowledge for the cases where the word “me” is the
NOUN_PHRASE under a SENTENCE part of speech node to parse.
The Predicate Builder Script of the node “me” is
relatively simple, but things get more complex with nodes like “flight 600”.
Just think about all the different ways that someone may identify a flight (as
described in the Syntax Transform Script of the AirlineSystemData module):
-
Flight
600
-
American
Airlines flight 600
-
AA 600
- AA
number 600
-
AA
Flight number 600
-
AA
flight six zero zero
-
AA six
zero zero
And that is
just to name a few. Rules needed in order to build FLIGHT part of speech nodes
that are stored in the Syntax Transform Script of the AirlineSystemData module.
([NOUN & "flight" |
"flights"])[AIRLINECODE]([NOUN & "flight" |
"flights"])([NOUN & "number" |
"numbers"])[CARDINAL_NUMBER] -> FLIGHT
([<AIRLINE NAME>:AIRLINE])[NOUN &
"flight" | "flights"]([NOUN & "number" |
"numbers"])[<FLIGHT NUMBER>:CARDINAL_NUMBER] -> FLIGHT
[<AIRLINE NAME>:AIRLINE]([NOUN &
"number" | "numbers"])[<FLIGHT
NUMBER>:CARDINAL_NUMBER] -> FLIGHT
[<AIRLINE
NAME>:AIRLINECODE]([NOUN & "number" |
"numbers"])[<FLIGHT NUMBER>:CARDINAL_NUMBER] -> FLIGHT
[FLIGHT] ->
NOUN
Although we
know for a fact that the NOUN_PHRASE part of speech nodes like “flight 600” are
the Object of Knowledge, we can not have a different Predicate Builder Script
for “flight 600” and “AA 600” and “AA six zero zero” and so on (also remember
that “flight 122” is also valid, i.e. other flight numbers are also possible).
It is critical for Conceptual Speech Commander and CLUE to be able to work efficiently
to handle a FLIGHT part of speech node in a generic fashion. That is done
through Auto-Triggered Predicate Builder Scripts. Indeed, a Predicate Builder
Script can be associated with a Part of Speech so that each time a node of that
type has to be analyzed, that Auto-Triggered Predicate Builder Script will be
parsed.
For our
case, the Auto-Triggered Predicate Builder Script for the FLIGHT part of speech
reads as following (no need to fully understand every script line, the
following Auto-triggered Predicate Builder Script is provided as a reference
only):
CLEARCONSTRUCT(<script.keeppacket>)
CLEARCONSTRUCT(<script.number>)
CLEARCONSTRUCT(<script.workingpredicate>)
SETEVALCONSTRUCT(<script.keeppacket>)
{GETCURRENTNODE}
IF (PARENTNODE;<OK>)
IF(PARTOFSPEECH;<NOUN>)
IF(PREVIOUSNODE;<OK>)
IF(PARTOFSPEECH;<CARDINAL_NUMBER>)
REJECTCONCEPTUALANALYSIS
SETEVALCONSTRUCT(<script.workingpredicate>)
{<REJECT>}
ENDIF
ENDIF
ENDIF
ENDIF
SETCURRENTNODE(SCRIPT.KEEPPACKET)
IFNOT
(SCRIPT.WORKINGPREDICATE;<REJECT>)
IF(FINDNODE(<SAMENODE>;<SAMENODELEVELORLOWER>;<[CARDINAL_NUMBER]>);<OK>)
IF(GETASSOCIATEDVALUE;NULL)
SETEVALCONSTRUCT(<script.number>) {GETSPELLING}
ELSE
SETEVALCONSTRUCT(<script.number>) {GETASSOCIATEDVALUE}
ENDIF
ENDIF
SETCURRENTNODE(SCRIPT.KEEPPACKET)
SETCONSTRUCT(<script.spokencompany>) {<NONE>}
IF(FINDNODE(<SAMENODE>;<LOWERNODELEVEL>;<[AIRLINE]>);<OK>)
SETEVALCONSTRUCT(<script.workingpredicate>)
{SETROLEFILLERPAIR(PARSENODE;<NUMBER>;SCRIPT.NUMBER)}
EXTRACTFILLER(SCRIPT.WORKINGPREDICATE;<PP [CLASS:VEHICLE]
[TYPE:AIRPLANE] [COMPANY:>SCRIPT.SPOKENCOMPANY<]>)
ELSE
IF(FINDNODE(<SAMENODE>;<LOWERNODELEVEL>;<[AIRLINECODE]>);<OK>)
SETEVALCONSTRUCT(<script.workingpredicate>)
{SETROLEFILLERPAIR(PARSENODE;<NUMBER>;SCRIPT.NUMBER)}
EXTRACTFILLER(SCRIPT.WORKINGPREDICATE;<PP [CLASS:VEHICLE]
[TYPE:AIRPLANE] [COMPANY:>SCRIPT.SPOKENCOMPANY<]>)
ELSE
SETEVALCONSTRUCT(<script.workingpredicate>) {<PP
[CLASS:VEHICLE] [TYPE:AIRPLANE] [COMPANY:?] [NUMBER:>SCRIPT.NUMBER<]>}
ENDIF
ENDIF
.AIRDBACCESS(SCRIPT.NUMBER)
IF(AIRDB&DEFINED;<TRUE>)
SETEVALCONSTRUCT(<script.workingpredicate>)
{SETROLEFILLERPAIR(SCRIPT.WORKINGPREDICATE;<ORIGIN>;AIRDB&ORIGIN)}
SETEVALCONSTRUCT(<script.workingpredicate>)
{SETROLEFILLERPAIR(SCRIPT.WORKINGPREDICATE;<DESTINATION>;AIRDB&DESTINATION)}
SETEVALCONSTRUCT(<script.workingpredicate>) {SETROLEFILLERPAIR(SCRIPT.WORKINGPREDICATE;<STATUS>;AIRDB&STATUS)}
SETEVALCONSTRUCT(<script.workingpredicate>)
{SETROLEFILLERPAIR(SCRIPT.WORKINGPREDICATE;<DEPARTURETIME>;AIRDB&DEPARTURETIME)}
SETEVALCONSTRUCT(<script.workingpredicate>) {SETROLEFILLERPAIR(SCRIPT.WORKINGPREDICATE;<ARRIVALTIME>;AIRDB&ARRIVALTIME)}
SETEVALCONSTRUCT(<script.workingpredicate>)
{SETROLEFILLERPAIR(SCRIPT.WORKINGPREDICATE;<INITIALDEPARTURETIME>;AIRDB&INITIALDEPARTURETIME)}
SETEVALCONSTRUCT(<script.workingpredicate>)
{SETROLEFILLERPAIR(SCRIPT.WORKINGPREDICATE;<INITIALARRIVALTIME>;AIRDB&INITIALARRIVALTIME)}
SETEVALCONSTRUCT(<script.workingpredicate>)
{SETROLEFILLERPAIR(SCRIPT.WORKINGPREDICATE;<DEPARTUREGATE>;AIRDB&DEPARTUREGATE)}
SETEVALCONSTRUCT(<script.workingpredicate>)
{SETROLEFILLERPAIR(SCRIPT.WORKINGPREDICATE;<ARRIVALGATE>;AIRDB&ARRIVALGATE)}
SETEVALCONSTRUCT(<script.workingpredicate>)
{SETROLEFILLERPAIR(SCRIPT.WORKINGPREDICATE;<COMPANY>;AIRDB&COMPANY)}
SETEVALCONSTRUCT(<script.workingpredicate>)
{SETROLEFILLERPAIR(SCRIPT.WORKINGPREDICATE;<SPOKENCOMPANY>;SCRIPT.SPOKENCOMPANY)}
ELSE
SETEVALCONSTRUCT(<script.workingpredicate>)
{SETROLEFILLERPAIR(SCRIPT.WORKINGPREDICATE;<DEFINED>;<FALSE>)}
ENDIF
CLEARCONSTRUCT(<script.spokencompany>)
SCRIPT.WORKINGPREDICATE
SETCURRENTNODE(SCRIPT.KEEPPACKET)
IF(FINDNODE(<UPONLY>;<UPPERNODELEVEL>;<[NOUN]>);<OK>)
CLONENODEANALYSISRESULT(SCRIPT.KEEPPACKET)
ENDIF
ENDIF
SETCURRENTNODE(SCRIPT.KEEPPACKET)
CLEARCONSTRUCT(<script.workingpredicate>)
CLEARCONSTRUCT(<script.keeppacket>)
CLEARCONSTRUCT(<script.number>)
The
Predicate Builder Script scans the current node’s Syntactic Hierarchy for a
CARDINAL_NUMBER node (like the “600” under “flight 600”), and also scans for an
AIRLINE node (like “AA” under “AA flight six zero zero”). Once it has both
elements - with the second one being optional, it refers to the database of
flights in order to generate a Predicate describing the FLIGHT part of speech
node. For “flight 600” it generates the following Predicate as the Object of
Knowledge:
PP [CLASS:VEHICLE]
[TYPE:AIRPLANE]
[COMPANY:UA]
[NUMBER:600]
[ORIGIN:JFK]
[DESTINATION:DFW]
[STATUS:ARRIVED]
[DEPARTURETIME:8:59]
[ARRIVALTIME:14:32]
[INITIALDEPARTURETIME:8:52]
[INITIALARRIVALTIME:14:20]
[DEPARTUREGATE:B 21]
[ARRIVALGATE:B 2]
[SPOKENCOMPANY:NONE]
Once an
Object of Knowledge has been successfully identified, the parsing of the
remainder of the SENTENCE part of speech node is executed. The identified
Object of Knowledge resides in the OBJECT local construct. Should there be no
success in identifying an Object of Knowledge for a SENTENCE part of speech
node, the parsing aborts and another Syntactic Hierarchy is put through the
Conceptual Analysis process.
While
referring to the Predicate Builder Scripts provided in the example module
AirlineSystemData, we can inspect the behavior of Conceptual Analysis parsing
in detail. For the successful parse of the SENTENCE “has flight 600 been
delayed”, the parsing proceeded as following:
1. The parser identified “flight 600” as the Object of
Knowledge.
OBJECT: flights 600 - NOUN_PHRASE
{PP
[CLASS:VEHICLE]
[TYPE:AIRPLANE]
[COMPANY:UA]
[NUMBER:600]
[ORIGIN:JFK]
[DESTINATION:DFW]
[STATUS:ARRIVED]
[DEPARTURETIME:8:59]
[ARRIVALTIME:14:32]
[INITIALDEPARTURETIME:8:52]
[INITIALARRIVALTIME:14:20]
[DEPARTUREGATE:B 21]
[ARRIVALGATE:B 2]
[SPOKENCOMPANY:NONE]}
2. The parser processes every other node below the
SENTENCE node.
WORKINGPREDICATE after parsing of
“has” (Part of Speech: VERB). The Working Predicate, at this point, states that
an interrogation is being made about ‘flight 600’ without giving any details
about the nature of the interrogation.
{MOOD
[CLASS:INTEROGATIVE]
[OBJECT:
{PP
[CLASS:VEHICLE]
[TYPE:AIRPLANE]
[COMPANY:UA]
[NUMBER:600]
[ORIGIN:JFK]
[DESTINATION:DFW]
[STATUS:ARRIVED]
[DEPARTURETIME:8:59]
[ARRIVALTIME:14:32]
[INITIALDEPARTURETIME:8:52]
[INITIALARRIVALTIME:14:20]
[DEPARTUREGATE:B 21]
[ARRIVALGATE:B 2]
[SPOKENCOMPANY:NONE]}]
[QUERY:?]
[ASSUMPTION_ON_TIME_OF_EVENT:< Mon Mar 08 15:43:07 2004]}
WORKINGPREDICATE after parsing of
“been” (Part of Speech: VERB). The parsing of the node ‘been’ did not transform
the Working Predicate. In Conceptual Analysis, nodes like ‘been’, ‘is’ and
‘was’ are often referred to as unifiers. They unite logical parts together like
in ‘A is B’ or ‘A was B’. The only effect that such a node may have on Working
Predicate is to give a sense of time related to the statement (‘A was B’ would
give a past time effect to the same representation of ‘A is B’).
{MOOD
[CLASS:INTEROGATIVE]
[OBJECT:
{PP
[CLASS:VEHICLE]
[TYPE:AIRPLANE]
[COMPANY:UA]
[NUMBER:600]
[ORIGIN:JFK]
[DESTINATION:DFW]
[STATUS:ARRIVED]
[DEPARTURETIME:8:59]
[ARRIVALTIME:14:32]
[INITIALDEPARTURETIME:8:52]
[INITIALARRIVALTIME:14:20]
[DEPARTUREGATE:B 21]
[ARRIVALGATE:B 2]
[SPOKENCOMPANY:NONE]}]
[QUERY:?]
[ASSUMPTION_ON_TIME_OF_EVENT:< Mon Mar 08 15:43:07 2004]}
WORKINGPREDICATE after parsing of
“delayed” (Part of Speech: ADJECTIVE and ADJECTIVE_PHRASE). The node ‘delayed’
transforms the general inquiry about ‘flight 600’ into a specific inquiry
related to the ‘DELTASTATUS’ (meaning how late is it without specifying if it
is the arrival or departure that is being inquired).
{MOOD
[CLASS:INTEROGATIVE]
[OBJECT:
{PP
[CLASS:VEHICLE]
[TYPE:AIRPLANE]
[COMPANY:UA]
[NUMBER:600]
[ORIGIN:JFK]
[DESTINATION:DFW]
[STATUS:ARRIVED]
[DEPARTURETIME:8:59]
[ARRIVALTIME:14:32]
[INITIALDEPARTURETIME:8:52]
[INITIALARRIVALTIME:14:20]
[DEPARTUREGATE:B 21]
[ARRIVALGATE:B 2]
[SPOKENCOMPANY:NONE]}]
[QUERY:
{AIRLINEPOSTANALYSIS
[OPERATION:
{REPORT
[VALUE:DELTASTATUS]
[OBJECT:
{PP
[CLASS:VEHICLE]
[TYPE:AIRPLANE]
[COMPANY:UA]
[NUMBER:600]
[ORIGIN:JFK]
[DESTINATION:DFW]
[STATUS:ARRIVED]
[DEPARTURETIME:8:59]
[ARRIVALTIME:14:32]
[INITIALDEPARTURETIME:8:52]
[INITIALARRIVALTIME:14:20]
[DEPARTUREGATE:B 21]
[ARRIVALGATE:B 2]
[SPOKENCOMPANY:NONE]}]}]}]
[ASSUMPTION_ON_TIME_OF_EVENT:< Mon Mar 08 15:43:07 2004]}
WORKINGPREDICATE after parsing of
“has flight 600 been delayed” (Part of Speech: VERB_PHRASE and SENTENCE). Those
nodes also do not transform the conceptual representation (Predicate) as they
are brought higher to the SENTENCE node in the Syntactic Hierarchy.
{MOOD
[CLASS:INTEROGATIVE]
[OBJECT:
{PP
[CLASS:VEHICLE]
[TYPE:AIRPLANE]
[COMPANY:UA]
[NUMBER:600]
[ORIGIN:JFK]
[DESTINATION:DFW]
[STATUS:ARRIVED]
[DEPARTURETIME:8:59]
[ARRIVALTIME:14:32]
[INITIALDEPARTURETIME:8:52]
[INITIALARRIVALTIME:14:20]
[DEPARTUREGATE:B 21]
[ARRIVALGATE:B 2]
[SPOKENCOMPANY:NONE]}]
[QUERY:
{AIRLINEPOSTANALYSIS
[OPERATION:
{REPORT
[VALUE:DELTASTATUS]
[OBJECT:
{PP
[CLASS:VEHICLE]
[TYPE:AIRPLANE]
[COMPANY:UA]
[NUMBER:600]
[ORIGIN:JFK]
[DESTINATION:DFW]
[STATUS:ARRIVED]
[DEPARTURETIME:8:59]
[ARRIVALTIME:14:32]
[INITIALDEPARTURETIME:8:52]
[INITIALARRIVALTIME:14:20]
[DEPARTUREGATE:B 21]
[ARRIVALGATE:B 2]
[SPOKENCOMPANY:NONE]}]}]}]
[ASSUMPTION_ON_TIME_OF_EVENT:< Mon Mar 08 15:43:07 2004]}
The previous Predicate then becomes the conceptual
representation of the SENTENCE “has flight 600 been delayed”.
The Post-Analysis Process
Once a conceptual representation (Predicate) has been
successfully calculated for an inquiry, a response needs to be formulated
through the Post-Analysis process. Keep in mind that a response generated by
the Post-Analysis process is not limited to a verbal response. Anything from a
verbal response to a physical action being generated or an electric signal
generated (turning the lights on or off) or a database query or update, or a
configuration of various responses could be generated by the Post-Analysis
process.
It is not because a person understands a question, that he
can formulate an answer for it. A Conceptual Speech Recognition system has the
same limitations as a human in this case. Consequently, the Post-Analysis
process tries to formulate a response Predicate provided an inquiry Predicate
with no-requirement for it to succeed. If the Post-Analysis process fails to
generate a response Predicate, the system loops and Syntactic Analysis is
invoked again in order to provide another Syntactic Hierarchy to analyze
conceptually. For the AirlineSystemData example module provided with Conceptual
Speech Commander, the Post-Analysis process is written in C++ in the module DLL
(AirlineSystemData.dll).
Customizing Conceptual Speech Commander and CLUE
Most of the customization of Conceptual Speech Commander and CLUE is
done within modules. A module is a set of logical units contained in order to
achieve one defined purpose. There are currently four types of modules:
- The Base Module.
- Interaction modules.
- Functionality modules.
- Audio processing modules.
The Base Module
The Base module contains logic that pretends to be generic
to all Interaction modules. As an example, the Base module has a Syntax
Transform Script used to process the English language. Such content does not
need to be duplicated in each and every Interaction module since the Base
module’s content is always available to all Interaction modules. There is one
Base module in Conceptual Speech Commander and CLUE. It contains the following:
- Words: including spellings and part of speech
association and Predicate Builder Script definitions.
- Constructs.
- Parts of Speech: including Auto-Triggered and Load-Time
Predicate Builder Scripts.
- Syntax
Transform Scripts.
Interaction Modules
The Interaction modules are used to manage the
user-interaction logic and maintenance environment.
The Interaction module contains the following components:
- Words: including spellings and part of speech
association and Predicate Builder Script definitions.
- Constructs.
- Parts of Speech: including Auto-Triggered and Load-Time
Predicate Builder Scripts.
- Test Cases.
- Syntax Transform Scripts.
- Custom Pronunciations.
- An optional DLL containing the C/C++ handling of the
module.
The Interaction module is used to define the vocabulary and
Predicate Builder Scripts that will drive a conversation between Conceptual
Speech Commander and a user. Test Cases are also useful for the maintenance of
the module logic in order to insure that a module is functional. The
AirlineSystemData module shipped with Conceptual Speech Commander and CLUE is a fully
functional example of an Interaction module.
Functionality Modules
The Functionality module adds functionality to all modules
by handling custom constructs that are processed like built-in constructs when
the module is loaded. The Functionality module is composed of a single
DLL. The SAPITTS module shipped with
Conceptual Speech Commander and CLUE is a fully functional example of a functionality
module. The SAPITTS functionality module handles calls like SAPITTS.SPEAK so
that a synthesized voice can be heard through a computer’s speaker.
Audio Processing Modules
The Audio Processing module is used in order to process or
emulate audio analysis so that a list of potentially spoken words can be
generated. The Audio Processing module is composed of a single DLL. Audio processing
modules are relevant only in a Conceptual Speech Commander environment. CLUE,
as stated earlier is limited to processing textual information.
Conceptual Speech Commander and CLUE Components
Conceptual Speech Commander and CLUE both have two components:
- the Dictionary Explorer
- the Console
Conceptual Speech Commander and CLUE are also shipped with
documentation and interface files to access the Conceptual Speech engine’s kernel.
the Dictionary Explorer
The Dictionary Explorer is used to manage the modules content and their settings.
Its powerful user-interface and ease of use allows a system developer to code
efficiently Interaction modules and manage all elements of Conceptual Speech
Recognition (Spellings, pronunciations, Parts of speech, Predicate Builder
Scripts, etc).
the Console
The
Console is a multi-session application used to monitor the Conceptual Speech
Recognition’s engine. It can also initiate and analyze test cases associated
with each loaded Interaction modules. It can display the session information
related to a recognized or mis-recognized utterance in detailed mode so that a
diagnostic can be reached to improve the system over time.
[1] Roger C.
Schank: One of the world's leading
Artificial Intelligence researchers, Dr. Schank is the author of more than 125
articles and publications. His books include Dynamic Memory: A Theory of
Learning in Computers and People, Tell
Me a Story: A New Look at Real and Artificial Memory, The Connoisseur's Guide to the Mind, and Engines for Education.
[2] LTM, in conceptual dependency theory,
refers to the location that stores memory in one’s mind. CP, in conceptual
dependency theory, refers to the central processor of one’s mind. A conceptual representation that MTRANS from
LTM to CP is the conceptual representation of remembering something.