Internet Draft B. Wyld Document: draft-ietf-speechsc-protocol-eval-01.txt Editor Expires: April 2003 Eloquant Version 01 November 2002 SPEECHSC Protocol Evaluation Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document is the Protocol Evaluation Document for the SPEECHSC Working Group. Section 3 provides the summary of the individual protocol comparisons (in the sections 4-N following) against the SPEECHSC requirements [1]. Table of Contents 1. Overview...................................................2 2. Protocol Proposals.........................................2 3. Protocol Evaluation Summaries..............................3 3.1. Protocol X............................................3 4. Protocol ôBeepö Complience Evaluation (Jerry Carter).......4 4.1. General notes:........................................4 4.2. Analysis of General Requirements......................4 4.3. Analysis of TTS requirements..........................5 4.4. Analysis of ASR requirements..........................6 4.5. Analysis of Speaker Identification and Verification Requirements..................................................7 4.6. Analysis of Duplexing and Parallel Operation Requirements..................................................7 4.7. Analysis of additional considerations (non-normative).7 4.8. Analysis of Security considerations...................8 5. Protocol ôSIPö Complience Evaluation (Rajiv Dharmadhikari).8 5.1. Introduction..........................................8 5.2. Analysis of General Requirements......................8 5.3. Analysis of TTS requirements..........................9 5.4. Analysis of ASR requirements.........................10 5.5. Analysis of Speaker Identification and Verification Requirements.................................................10 5.6. Analysis of Duplexing and Parallel Operation Requirements.................................................11 Wyld Expires û April 2003 [Page 1] SPEECHSC Protocol Evaluation Template November 2002 5.7. Analysis of additional considerations (non-normative)11 5.8. Analysis of Security considerations..................11 5.9. Other Criteria.......................................11 6. Protocol ôRTSPö Complience Evaluation (Brian Wyld)........12 6.1. General Introduction.................................12 6.2. Analysis of General Requirements.....................13 6.3. Analysis of TTS requirements.........................13 6.4. Analysis of ASR requirements.........................14 6.5. Analysis of Speaker Identification and Verification Requirements.................................................14 6.6. Analysis of Duplexing and Parallel Operation Requirements.................................................15 6.7. Analysis of additional considerations (non-normative)15 6.8. Analysis of Security considerations..................15 7. Protocol ôMRCPö Complience Evaluation (Sarvi Shanmugham)..15 7.1. General..............................................15 7.2. Analysis of General Requirements.....................16 7.3. Analysis of TTS requirements.........................17 7.4. Analysis of ASR requirements.........................18 7.5. Analysis of Speaker Identification and Verification Requirements.................................................19 7.6. Analysis of Duplexing and Parallel Operation Requirements.................................................19 7.7. Analysis of additional considerations (non-normative)20 7.8. Analysis of Security considerations..................20 8. Protocol ôWeb Servicesö Complience Evaluation (Stephane H. Maes) 20 8.1. General Notes:.......................................20 8.2. Analysis of General Requirements.....................21 8.3. Analysis of TTS requirements.........................23 8.4. Analysis of ASR requirements.........................24 8.5. Analysis of Speaker Identification and Verification Requirements.................................................25 8.6. Analysis of Duplexing and Parallel Operation Requirements.................................................25 8.7. Analysis of additional considerations (non-normative)26 8.8. Analysis of Security considerations..................26 9. Security Considerations...................................27 10. References................................................27 1. Overview This document provides the template for the content for the SPEECHSC Protocol Evaluation document. This section will contain an overview of the process. Section 2 contains a list of the proposed protocols submitted to WG. Section 3 provides a summary of the proposed protocols against the Requirements and framework. Sections 4 and following provide the individual protocol comparisons against the SPEECHSC requirements [1]. 2. Protocol Proposals This section contains a list of the existing protocols submitted to the SPEECHSC WG for consideration by the deadline. Wyld Expires û April 2003 [Page 2] SPEECHSC Protocol Evaluation Template November 2002 1. BEEP 2. SIP 3. RTSP 4. MRCP (initial submission) 5. Web Services Each protocol section contains a review of the protocolÆs level of compliance to each of the SPEECHSC Requirements [1] as derived from the proposed protocol documents. The following key will be used to identify the level of compliancy of each of the individual protocols: T = Total Complience. Meets the requirement fully. P = Partial Compliance. Meets some aspect of the requirement. P+ = Complience possible. Could meet the requirement with ônaturalö evolution of the protocol. F = Failed Compliance. Does not meet the requirement. 3. Protocol Evaluation Summaries To provide standalone completeness for this document, this section contains a summary of each of the protocol comparisons. The comparisons should explicitly reference the requirements as defined in [1]. TBD : This section will be completed once agreement is reached on the specific protocol analysis sections. 3.1. Protocol X 3.1.1. Protocol x Architectural Model as compared to the SPEECHSC Architectural Framework This section would contain a description of the key aspects of the architectural model of protocol X and a comparison of this model against the SPEECHSC framework, highlighting the pros/cons of the applicability of protocol x to SPEECHSC. 3.1.2. SPEECHSC requirements met by protocol X. This section contains a description of the SPEECHSC requirements met by protocol X. 3.1.3. SPEECHSC requirements partially met by protocol X. This section contains a description of the SPEECHSC requirements partially met by protocol X. [Note: ideally, there would be few or NONE of these provided the SPEECHSC requirements have been defined at the appropriate level]. 3.1.4. SPEECHSC requirements can be met by protocol X with ônaturalö evolutions. This section contains a description of the SPEECHSC requirements not currently met by protocol X but that could be met assuming some evolutions. [Note: the definition of ônaturalö evolution will no doubt be for discussion]. Wyld Expires û April 2003 [Page 3] SPEECHSC Protocol Evaluation Template November 2002 3.1.5. SPEECHSC requirements NOT met by protocol X. This section contains a description of the SPEECHSC requirements that are NOT met by protocol X. [Note: an important aspect to highlight in the individual protocol comparisons would be the work required to extend protocol X to be able to support this requirement.] 4. Protocol ôBeepö Complience Evaluation (Jerry Carter) 4.1. General notes: The BEEP protocol provides a general framework for establishing connections, defining new channels, negotiating security, and performing user authentication. Protocols build on beep must define a profile detailing how connections are established and must define a set of messages which will be delivered using BEEP. The protocol is peer-to-peer although client-server style requests could be easily handled. The following sub-sections compare each individual requirement against the protocol. 4.2. Analysis of General Requirements 4.2.1. Reuse existing protocols [5.1] T: Beep is a published protocol, listed as RFC 3080 (http://www.ietf.org/rfc/rfc3080.txt). 4.2.2. Maintain Existing Protocol Integrity [5.2] P: BEEP assumes that protocols, such as SpeechSC, will add messages. Supporting multiple clients using TCP may require some effort. 4.2.3. Avoid Duplicating Existing Protocols [5.3] T: Building SpeechSC over BEEP would allow the specification to focus on managing the ASR, media server, and SI/SV resources and the possible interactions between them. The operations for establishing connections and defining new channels would be handled by BEEP. 4.2.4. Protocol efficiency [5.4] P+: BEEP imposes a small overhead (roughly 40 bytes per message). It provides a mechanism for supporting multiple communication channels over a single port. If grouping of requests is desired, this would need to be handled by grouping the SpeechSC messages. 4.2.5. Explicit invocation of services [5.5] T: Though it is primarily a peer-to-peer protocol, BEEP may act as a traditional client server protocol. Wyld Expires û April 2003 [Page 4] SPEECHSC Protocol Evaluation Template November 2002 4.2.6. Server Location and Load Balancing [5.6] P+: This functionality is not provided by BEEP. This would need to be added as an extension. 4.2.7. Simultaneous services [5.7] T: Multiple channels providing different services is possible. Each service is simply a message type which is passed to the server using BEEP. 4.2.8. Multiple media sessions [5.8] F: BEEP assumes a 1:1 using TCP/IP. 4.3. Analysis of TTS requirements 4.3.1. Requesting Text Playback [6.1] P+: A medial playback resource would be defined as a new BEEP profile. To initiate an media playback session, the client would need to ask for this profile as part of the message which opens a new channel. 4.3.2. Text Formats [6.2] T: BEEP allows arbitrary data blocks of octets to be passed. Messages must specify a length and must end with a specific sequence of characters. Plain text, SSML, URIs, or other content could be passed within BEEP messages. 4.3.3. Plain text [6.2.1] T: See 4.3.4. SSML [6.2.2] T: See 4.3.5. Text in Control Channel [6.2.3] T: See 4.3.6. Document Type Indication [6.2.4] T: BEEP provides for 'Content-Type' specification in the message header. This may be used to determine the media type of the message. 4.3.7. Control Channel [6.3] T: A second channel might be created to pass control information. The BEEP protocol dictates that the requests for each channel must be handled in the order in which they are received. Separate channels operate independently and may service request with different priorities (though the BEEP specification does not provide a mechanism for assigning priorities). Wyld Expires û April 2003 [Page 5] SPEECHSC Protocol Evaluation Template November 2002 4.3.8. Playback Controls [6.4] T: See 6.3. These messages would presumably be transmitted over the control channel. 4.3.9. Session Parameters [6.5] T: See 6.2. Session parameters are presumably content delivered within a message. 4.3.10. Speech Markers [6.6] T: Speech markers might be delivered on a separate channel (as in 6.3). Alternatively, BEEP is a peer-to-peer protocol so events might be sent on the channel used to receive requests. 4.4. Analysis of ASR requirements 4.4.1. Requesting Automatic Speech Recognition [7.1] P+: An ASR resource would be defined as a new BEEP profile. To initiate an ASR session, the client would need to ask for this profile as part of the message which opens a new channel. 4.4.2. XML [7.2] T: See response for 6.2. 4.4.3. Grammar Specification [7.3.1] T: See response for 6.2. 4.4.4. Explicit Indication of Grammar Format [7.3.2] T: See response for 6.2.4. 4.4.5. Grammar Sharing [7.3.3] P+: Channels in BEEP are independent. If content is to be shared, this would need to be added, preferably using messages to define what content is to be shared and a corresponding identity. The retrieval and actual sharing would need to be implemented by the server. 4.4.6. Session Parameters [7.4] T: See response for 6.2. Session parameters are presumably content delivered within a message. 4.4.7. Input Capture [7.5] P+: BEEP does not provide a mechanism for storing requests. Storing of input could be added by the server. Wyld Expires û April 2003 [Page 6] SPEECHSC Protocol Evaluation Template November 2002 4.5. Analysis of Speaker Identification and Verification Requirements 4.5.1. Requesting SI/SV [8.1] P+: An SI/SV resource would be defined as a new BEEP profile. To initiate an SI/SV session, the client would need to ask for this profile as part of the message which opens a new channel. 4.5.2. Identifiers for SI/SV [8.2] T: This could be handled in the 'Content-Type' or by other means. Supported formats might be negotiated in a fashion similar to security negotiation. 4.5.3. State for multiple utterances [8.3] P+: BEEP has a concept of sequence. State would need to be added by the client. This should be easy if a channel is used for multiple utterances. 4.5.4. Input Capture [8.4] P+: BEEP does not provide a mechanism for storing requests. Storing of input could be added by the server. 4.5.5. SI/SV functional extensibility [8.5] T: Extensions could be added as additional messages. 4.6. Analysis of Duplexing and Parallel Operation Requirements 4.6.1. Duplexing and Parallel Operation Requirements [9] P+: Parallel operations may be obtained using multiple channels. A message on one channel could potentially interrupt activity happening on the second. BEEP is very flexible allowing the server to implement whatever behavior is desired. 4.6.2. Full Duplex operation [9.1.1] T: BEEP is a peer-to-peer protocol allowing full duplex communication on a single channel or parallel communication on multiple channels. 4.6.3. Multiple services in parallel [9.1.2] P+: Multiple services may be run on separate channels. Merging or T-ing of RTP must be implemented by the server. 4.6.4. Combination of services TBD 4.7. Analysis of additional considerations (non-normative) TBD Wyld Expires û April 2003 [Page 7] SPEECHSC Protocol Evaluation Template November 2002 4.8. Analysis of Security considerations 4.8.1. Security Considerations [11] P+: BEEP offers a mechanism for managing security and user authentication. SpeechSC requires managing multiple data streams and some form of unified authentication / security might be a goal. If so, BEEP security should be revisited with this in mind. 5. Protocol ôSIPö Complience Evaluation (Rajiv Dharmadhikari) 5.1. Introduction SIP is a protocol for initiating, modifying, and terminating multimedia sessions. The protocol is considered an IETF standard and its specifications can be found in [2]. The following sections provides a general statement with regards to the applicability of SIP as the control protocol for SPEECHSC. 5.1.1. SIP General Applicability SIP is a pretty mature, well understood, and frequently used session establishment protocol. It has gone through multiple revisions in the IETF standard process. There are number of commercial and public domain implementations of SIP that are available. Because of its close resemblance to HTTP and being a text based protocol, there are large number of SIP application developers available. 5.1.2. SIP Use in VOIP environment SIP is already being used to establish and redirect RTP streams from various end points. The SPEECHSC requires a protocol for controlling ASR, TTS and SV resources. When these resources are deployed in a VOIP network that requires them to process media carried in RTP, the SIP protocol is used in lot of deployments. Rather than inventing a new control protocol and introducing operational aspects of the new protocol, SIP can be reused for controlling SPEECHSC resources. 5.2. Analysis of General Requirements 5.2.1. Reuse existing protocols [5.1] T: SIP is an existing, widely used, and mature protocol defined in [2]. 5.2.2. Maintain Existing Protocol Integrity [5.2] T: Existing SIP methods and header fields will not be changed when SIP is used to control SPEECHSC resources. In case, if extensions are required, SIP allows carriage of custom payload in the body. This payload is understood only by UAs and it does not impact protocol integrity. Wyld Expires û April 2003 [Page 8] SPEECHSC Protocol Evaluation Template November 2002 5.2.3. Avoid Duplicating Existing Protocols [5.3] T: Lot of the requirements for SPEECHSC operation can easily be satisfied by SIP, e.g. establishing RTP streams or redirecting them. Without SIP, new SPEECHSC protocol will have to duplicate lot of session management functionality. 5.2.4. Protocol efficiency [5.4] T: SIP is a very light weight protocol when run over TCP or UDP. It leverages efficiency available in TCP and UDP protocols that have been around for over 20 years. 5.2.5. Explicit invocation of services [5.5] T: SIP URI mechanism allows invocation of different services. 5.2.6. Server Location and Load Balancing [5.6] P+: SIP employs standard DNS name resolution for locating resources. SIP itself does not provide load balancing features. Application level load balancers can be used to load balance SIP requests. 5.2.7. Simultaneous services [5.7] T: SIP allows simultaneous invocation of different services. SIP allows forking or splitting the same media stream to different end points as defined in [2]. 5.2.8. Multiple media sessions [5.8] T: SIP uses SDP to describe RTP stream characteristics. This allows the control of direction of RTP stream such as bi-directional or uni-directional. SIP allows a UA to establish sessions with multiple UAs for the same session. 5.3. Analysis of TTS requirements 5.3.1. Requesting Text Playback [6.1] P+: SIP does not provide primitives for text playback. New SIP URI can be defined to invoke TTS features. 5.3.2. Text Formats [6.2] T: SIP message body can carry TTS text or reference to the TTS text. 5.3.3. Plain text [6.2.1] T: 5.3.4. SSML [6.2.2] T: 5.3.5. Text in Control Channel [6.2.3] T : 5.3.6. Document Type Indication [6.2.4] T: Wyld Expires û April 2003 [Page 9] SPEECHSC Protocol Evaluation Template November 2002 5.3.7. Control Channel [6.3] T: Separate SIP URI can be defined for control channel. 5.3.8. Playback Controls [6.4] P+: See 5.3.1 5.3.9. Session Parameters [6.5] P+: Session parameters can be passed in body part of SIP or can be modified by using mid call methods of SIP. 5.3.10. Speech Markers [6.6] T: Speech markers can be delivered using a separate control channel. See 5.3.7. 5.4. Analysis of ASR requirements 5.4.1. Requesting Automatic Speech Recognition [7.1] T: New SIP URI can be defined to address ASR services and requesting ASR resources. 5.4.2. XML [7.2] T: SIP message body can carry XML. 5.4.3. Grammar Specification [7.3.1] P+: With proper definition, SIP message body can carry XML grammars or references to the XML grammars. 5.4.4. Explicit Indication of Grammar Format [7.3.2] P+: See 5.4.3. 5.4.5. Grammar Sharing [7.3.3] TBD 5.4.6. Session Parameters [7.4] T: Session parameters can be passed in body part of SIP or can be modified by using mid call methods of SIP. 5.4.7. Input Capture [7.5] T: SIP SUBSCRIBE/NOTIFY mechanism can be used initiate input capture and receive the result. 5.5. Analysis of Speaker Identification and Verification Requirements 5.5.1. Requesting SI/SV [8.1] T: New SIP URI can be defined for SI/SV. 5.5.2. Identifiers for SI/SV [8.2] T: See 5.5.1. 5.5.3. State for multiple utterances [8.3] TBD Wyld Expires û April 2003 [Page 10] SPEECHSC Protocol Evaluation Template November 2002 5.5.4. Input Capture [8.4] T: SIP SUBSCRIBE/NOTIFY mechanism can be used initiate input capture and receive the result. 5.5.5. SI/SV functional extensibility [8.5] T: SIP can easily be extended by adding new methods and header fields without impacting the operation of existing methods. The new methods and headers will only be understood by the SPEECHSC entities. If above is not acceptable, SIP message body of an existing primitive can be used to define new semantics that is understood only by SPEECHSC entities. 5.6. Analysis of Duplexing and Parallel Operation Requirements 5.6.1. Duplexing and Parallel Operation Requirements [9] T: SPEECHSC resource is a SIP UA that can handle session requests from different UAs. 5.6.2. Full Duplex operation [9.1.1] T: Each SIP UA consists of a UAC and a UAS. This allows for full duplex operation. 5.6.3. Multiple services in parallel [9.1.2] T: SIP allows simultaneous invocation of different services. SIP allows forking or splitting the same media stream to different end points as defined in [2]. 5.6.4. Combination of services T: See 5.6.3. SIP UA can invoke different services and combine the results. 5.7. Analysis of additional considerations (non-normative) TBD 5.8. Analysis of Security considerations 5.8.1. Security Considerations [11] T: SIP protocol employs different authentication schemes that are widely used in IP based protocols. 5.9. Other Criteria The following criteria were also defined by the evaluator of SIP. 5.9.1. Ability to establish session between SPEECHSC client and SPEECHSC resource T: SIP User Agent can establish a session with another SIP User Agent. 5.9.2. Ability to terminate session by either SPEECHSC client or SPEECHSC resource T: SIP User Agent can terminate a session with another SIP User Agent. Wyld Expires û April 2003 [Page 11] SPEECHSC Protocol Evaluation Template November 2002 5.9.3. Support reliable sequencing and delivery between SPEECHSC client and SPEECHSC resource P: SIP can be run over TCP or UDP. When run over TCP, this requirement is easily satisfied. When run over UDP, SIP User Agent is required to implement logic to ensure reliable sequencing and delivery. 5.9.4. Ability for SPEECHSC client to coordinate SPEECHSC resources on different machines for a single session T: SPEECHSC client can use SIP to establish SIP sessions with different machines. 5.9.5. Ability for SPEECHSC resource to handle multiple SPEECHSC clients T: SPEECHSC resource is a SIP UA that can handle session requests from different UAs. 5.9.6. The SPEECHSC resource should be able to generate asynchronous events or unsolicited messages T: SIP allows asynchronous events or unsolicited messages to be generated using SUBSCRIBE/NOTIFY mechanism. 5.9.7. The SPEECHSC client and resource should have ability for authenticating each other T: SIP protocol employs different authentication schemes that are widely used in IP based protocols. 5.9.8. Ability to determine success or failure from both SPEECHSC client and SPEECHSC resource side T: The protocol has following response codes: 200 for success, 3xx, 4xx, and 5xx for failure. 5.9.9. Support for versioning between SPEECHSC client and SPEECHSC resource P+: This will require an additional header or element in the body of SIP message for versioning. The current version field is intended for SIP protocol version. 6. Protocol ôRTSPö Complience Evaluation (Brian Wyld) 6.1. General Introduction RTSP is an existing protocol, orientated towards audio playback and recording. As such, it has support for RTP session control, with SDP used for session description, and a message set allowing operation as a player/recorder with audio ôVCRö controls. To use RTSP for speechsc, two options present themselves: - new ôoperationsö (like PLAY) would be defined with their own specific state machines after a session is created. Note that this could be seen as the merge of the MRCP proposed protocol operations into the RTSP protocol directly. - The existing operations could be æextendedÆ to describe speechsc operations eg PLAY would both be applicable for play of a audio file and for synthesis of a TTS text. In the second case new headers would be defined to allow definition of the text or of its location. The current PLAY state machine is Wyld Expires û April 2003 [Page 12] SPEECHSC Protocol Evaluation Template November 2002 exactly as required for TTS operation. Although by analogy RECORD could initiate an ASR session, with headers giving the grammer source or references, itÆs state machine does not seem as compatible, still less for SV/SI. Defining new messages/state machines would be better for these operations. The following analysis assumes that TTS operation is merged with the current PLAY semantics, while ASR/SV/SI create new messages and state operations. The following sub-sections compare each individual requirement against the protocol. 6.2. Analysis of General Requirements 6.2.1. Reuse existing protocols [5.1] T: RTSP/RTP/SDP would be reused. 6.2.2. Maintain Existing Protocol Integrity [5.2] T: The extensions to RTSP to allow speechsc use would be in the spirit of the protocol, and would not break existing servers or clients. 6.2.3. Avoid Duplicating Existing Protocols [5.3] T: Using RTSP would not recreate it. 6.2.4. Protocol efficiency [5.4] T: RTSP is a text based protocol, but is relatively succinct as messages are specific to their operation. 6.2.5. Explicit invocation of services [5.5] T: RTSP service invocation is sufficient. 6.2.6. Server Location and Load Balancing [5.6] F: RTSP does not address this topic; however it can be used with other IETF protocols such as SLP or UDDI to do so. 6.2.7. Simultaneous services [5.7] T: RTSP allows simultaneous invocation of services on the same or different control channel. 6.2.8. Multiple media sessions [5.8] T: RTSP allows multiple media sessions. 6.3. Analysis of TTS requirements 6.3.1. Requesting Text Playback [6.1] P+: by extension of the RTSP PLAY message semantics 6.3.2. Text Formats [6.2] P+: Text can be defined as all text types. 6.3.3. Plain text [6.2.1] T: Plain text may be carried directly in the message payload. Wyld Expires û April 2003 [Page 13] SPEECHSC Protocol Evaluation Template November 2002 6.3.4. SSML [6.2.2] T: Text may be in any format. 6.3.5. Text in Control Channel [6.2.3] T: Text may be attached to the control messages. 6.3.6. Document Type Indication [6.2.4] T : Via the Content-Type header 6.3.7. Control Channel [6.3] T: RTSP sessions may use a private or shared TCP connection. 6.3.8. Playback Controls [6.4] T: RTSP defines playback control messages and a state machine. 6.3.9. Session Parameters [6.5] T: RTSP defines operations for session parameter control. 6.3.10. Speech Markers [6.6] P+: Markers may be inserted in the text, but to provide the required asynchronous events when a marker is synthesized will require use specific ANNOUNCE type messages for server->client notification. 6.4. Analysis of ASR requirements 6.4.1. Requesting Automatic Speech Recognition [7.1] P+: by addition of a message and state machine. 6.4.2. XML [7.2] P+: Text can be defined as all text types. 6.4.3. Grammar Specification [7.3.1] P+: Text can be defined as all text types. 6.4.4. Explicit Indication of Grammar Format [7.3.2] T : Via the Content-Type headers 6.4.5. Grammar Sharing [7.3.3] F: TBD 6.4.6. Session Parameters [7.4] T: RTSP defines operations for session parameter control. 6.4.7. Input Capture [7.5] P+: addition of a header to the initiation message. 6.5. Analysis of Speaker Identification and Verification Requirements 6.5.1. Requesting SI/SV [8.1] P+: by addition of a message and state machine. 6.5.2. Identifiers for SI/SV [8.2] P+: by addition of specific headers. Wyld Expires û April 2003 [Page 14] SPEECHSC Protocol Evaluation Template November 2002 6.5.3. State for multiple utterances [8.3] TBD 6.5.4. Input Capture [8.4] P+: addition of a header to the initiation message. 6.5.5. SI/SV functional extensibility [8.5] TBD 6.6. Analysis of Duplexing and Parallel Operation Requirements 6.6.1. Duplexing and Parallel Operation Requirements [9] T: RTSP allows session setup that should fulfill these requirements. 6.6.2. Full Duplex operation [9.1.1] T: RTSP can create a full duplex session. 6.6.3. Multiple services in parallel [9.1.2] T: RTSP can request multiple operations of the same type on the same session. 6.6.4. Combination of services T: RTSP can request multiple operations of different types on the same session. 6.7. Analysis of additional considerations (non-normative) TBD 6.8. Analysis of Security considerations 6.8.1. Security Considerations [11] F: RTSP provides no specific security functionality at all, but depends on other IETF security protocols (as it uses TCP) to pre- validate and protect the sessions. 7. Protocol ôMRCPö Complience Evaluation (Sarvi Shanmugham) 7.1. General 7.1.1. MRCP Framework and General Applicability The overall MRCP framework, the components involved and their distribution and relationship to each other meet the framework specified by SPEECHSC. The primary advantage of MRCP is that it is a text based protocol designed to meet most of the requirements of SPEECHSC pertaining to speech recognition and Text to speech. Though Speaker Recognition (SR) and Speaker Verification (SV) are not supported in its current form, MRCP was explicitly designed to be extendable for such needs. The core MRCP definition only deals with the control of the ASR or TTS resource and the commands and responses needed to achieve it. There are multiple interoperable implementations of MRCP and hence is a proven technology. It leverages existing W3C XML standards for exchange of data between the client and the server resource. For Wyld Expires û April 2003 [Page 15] SPEECHSC Protocol Evaluation Template November 2002 Example, its uses the W3C XML grammar format (GRXML) along with W3C semantic attachments and Natural Language Semantic Markup Language to exchange data with speech recognition resource. The W3C Speech Markup Language is used when dealing with Text to speech engines. It was designed to work as a tunneled protocol, over RTSP or SIP. Hence it depends on the carrier protocol to establish a control and a media path between the client and the ASR or TTS server resource. Hence it gets most of the security and media pipe management operations for free. Once these are established, MRCP commands and responses are tunneled over, controlling the ASR or TTS resource on the server. 7.1.2. MRCP can be evolved Though MRCP directly meets many of the needs of SPEECHSC. The notion that it is a tunneled protocol disallows its independent operation. Further more the tunneled aspect is also a less efficient protocol design. But these can be addressed and the core MRCP messages can be evolved to either become standalone protocol by itself or extensions to an existing protocol such as SIP or RTSP. To make this a standalone protocol and allow MRCP to operate by itself, new session and media management messages need to be defined to allow it to operate independently. To evolve MRCP as extensions to SIP or RTSP would also be relatively simple since it is also a text based protocol with message format and headers very similar to them. In this protocol evaluation, the compliance evaluates MRCP from the perspective of evolution in one of these forms. The following sub-sections compare each individual requirement against the protocol. 7.2. Analysis of General Requirements 7.2.1. Reuse existing protocols [5.1] T: If RTSP or SIP is extended with MRCP, it will be reusing an existing protocol. If MRCP was extended to become a standalone protocol, it wouldnÆt. 7.2.2. Maintain Existing Protocol Integrity [5.2] T: If RTSP or SIP is extended with MRCP, it can be done in a way that maintains the integrity of the protocol. 7.2.3. Avoid Duplicating Existing Protocols [5.3] T: If RTSP or SIP is extended with MRCP, it would be meet this requirement. If MRCP was extended to become a standalone protocol, it wouldnÆt. 7.2.4. Protocol efficiency [5.4] P: MRCP as it exists is sub-optimal due to its tunneling. On the other hand, if it were extended as standalone protocol or with RTSP or SIP, it would meet this requirement. 7.2.5. Explicit invocation of services [5.5] T: Wyld Expires û April 2003 [Page 16] SPEECHSC Protocol Evaluation Template November 2002 7.2.6. Server Location and Load Balancing [5.6] T:If MRCP were extended as a standalone protocol it could designed to meet this requirement. If MRCP was used to extend SIP or RTSP, it would automatically inherit their capability this area and hence would meet this requirement. 7.2.7. Simultaneous services [5.7] T: MRCP as it exists does meet this requirement already. It could continue to do so even if it is extended in wither of the ways suggested here. 7.2.8. Multiple media sessions [5.8] P+: MRCP shares a single 2 way media pipe between ASR and TTS today. Considering it supports only 2 resources and they treat media in different directions it currently doesnÆt meet this requirement completely. Again, SI or SV was added to MRCP as it exists, a standalone protocol or as an extension to RTSP/SIP, it could be designed to meet this requirement. 7.3. Analysis of TTS requirements 7.3.1. Requesting Text Playback [6.1] T: MRCP has the SPEAK method for the client to request the TTS resource to playback text as an audio stream. 7.3.2. Text Formats [6.2] T: When the client requests the TTS resource to playback a text stream it can provide the content in the following formats and through the following mechanism. 1. Plain text 2. W3C XML based Speech Markup Language (SSML) 3. This content to be spoken can be provided by value directly through the control path. 4. It also supports passing the content by reference. This is achieved having an audio tag inside the SSML markup text. This URL is then fetched and played on the RTP stream in sequence with the rest of the text according to the SSML specification. When the client sends plain text, SSML or another format of speech text the content is coded as a mime-type. Hence the server knows what format the speech content is coded in, and does not have to figure it out from the content. 7.3.3. Plain text [6.2.1] T: see above 7.3.4. SSML [6.2.2] T: see above 7.3.5. Text in Control Channel [6.2.3] T : see above 7.3.6. Document Type Indication [6.2.4] T: see above Wyld Expires û April 2003 [Page 17] SPEECHSC Protocol Evaluation Template November 2002 7.3.7. Control Channel [6.3] T: In MRCP, this Reset-Audio-Channel header defined for the ASR resource allows the recognizer to re-initialize the audio characteristics that it has learnt till then. This allows a recognizer resource to be used for multiple recognition sessions. It can be used for short single utterance recognitions as well. This is by applying the Reset-Audio-Channel header to every recognition. I suspect the performance may not be as good, due to the lack of line characteristics, but this is a recognizer issue. 7.3.8. Playback Controls [6.4] T: MRCP supports the CONTROL method with the Jump-Target header can used to achieve, jumping in time or to an exact or relative location. It supports jumping in paragraphs, sentences, words and to specific markers that may be embedded in the speech content. The CONTROL method can be used with the Voice and Prosody parameters, derived from SSML, and can address the speed of speech or increasing/decreasing the volume. It also supports the PAUSE/RESUME methods to pause or resume a current SPEAK request. 7.3.9. Session Parameters [6.5] T: As mentioned the previous section, MRCP supports voice and prosody parameters which are directly derived from the W3C SSML specification. These headers can be sent using the SET-PARAMS method and applied as a default for the entire session. They can also be applied in SPEAK requests to apply per usage or in the CONTROL message to change the parameters of an active SPEAK request. 7.3.10. Speech Markers [6.6] T: Specifying speech markers in the content is supported through SSML. The CONTROL message can then be used to jump to specific marker points in the text. Also, when the TTS resource reaches specific markers in the text, the server would generate the SPEECH- MARKER method to the client. 7.4. Analysis of ASR requirements 7.4.1. Requesting Automatic Speech Recognition [7.1] T: The client uses the RECOGNIZE method in MRCP to request the recognition resource to process the audio stream in the pipe. The RECOGNIZE method also specifies parameters and grammars the recognizer should match against. 7.4.2. XML [7.2] T: Similar to the TTS resource in MRCP, ASR also uses XML data to exchange information between the client and the recognition resource. It supports the W3C GRXML to pass grammars from the client to the server. When the server is done recognizing, it uses the W3C Natural Language Semantic Markup Language (NLSML) to pass the results back to the client. It supports other grammar formats as well, as long as the server allows it. This is possible since, it uses mime-types to package this data and hence the format type is specified. 7.4.3. Grammar Specification [7.3.1] P+: MRCP supports specifying the grammar both by value and by reference. The RECOGNIZE method can carry with it grammar content Wyld Expires û April 2003 [Page 18] SPEECHSC Protocol Evaluation Template November 2002 and/or a URI referring to the grammar content. Since MRCP supports referring a grammar, the referred grammar could be located on the server itself. With respect to sharing of grammars, the grammars defined/compiled through the DEFINE-GRAMMAR primitive are not sharable across sessions on the same server. This needs to be addressed to meet this set of requirements in full. 7.4.4. Explicit Indication of Grammar Format [7.3.2] P+: see above 7.4.5. Grammar Sharing [7.3.3] TBD 7.4.6. Session Parameters [7.4] T: This requirement as defined is already fully met since MRCP is the referred standard for compliance. 7.4.7. Input Capture [7.5] T: This is achieved by setting the Waveform-url header in the RECOGNIZE method. This tells the server to record the audio of the recognition and will return a URI to the client in the completion event, which can be used to retrieve or play back the audio. 7.5. Analysis of Speaker Identification and Verification Requirements 7.5.1. Requesting SI/SV [8.1] F: not supported 7.5.2. Identifiers for SI/SV [8.2] F: not supported 7.5.3. State for multiple utterances [8.3] F: not supported 7.5.4. Input Capture [8.4] F: not supported 7.5.5. SI/SV functional extensibility [8.5] F: not supported 7.6. Analysis of Duplexing and Parallel Operation Requirements 7.6.1. Duplexing and Parallel Operation Requirements [9] 7.6.2. Full Duplex operation [9.1.1] T: MRCP supports having the ASR and TTS engines on the same server. When in this mode they use a single full-duplex audio stream to both recognize audio and generate a TTS voice stream. 7.6.3. Multiple services in parallel [9.1.2] F: MRCP doesnÆt support SI or SV. Wyld Expires û April 2003 [Page 19] SPEECHSC Protocol Evaluation Template November 2002 7.6.4. Combination of services F: MRCP assumes that the recognizer to be a combined unit and does not support separating of sub-components like semantic analyzers. 7.7. Analysis of additional considerations (non-normative) T: Since MRCP works with SIP or RTSP, it uses SDP for session description and can work with other transport schemes like ATM etc. It uses standard codecs and can work with specialized codecs like DSR. 7.8. Analysis of Security considerations 7.8.1. Security Considerations [11] T: Since MRCP works with RTSP, it inherits the security considerations of SIP or RTSP. 8. Protocol ôWeb Servicesö Complience Evaluation (Stephane H. Maes) 8.1. General Notes: Speech engines (speech recognition, speaker, recognition, speech synthesis, recorders and playback, NL parsers, and any other speech processing engines (e.g. speech detection, barge-in detection etc) etc...) as well as audio sub-systems (audio input and output sub-systems) can be considered as web services that can be described and asynchronously programmed via WSDL (on top of SOAP), combined in a flow described via WSFL, discovered via UDDI and asynchronously controlled via SOAP that also enables asynchronous exchanges between the engines. This solution presents the advantage to provide flexibility, scalability and extensibility while reusing an existing framework that fits the evolution of the web: web services and XML protocols [WS1] According to the web services framework, speech engines (audio sub-systems, engines, speech processors) can be defined as web services that are characterized by an interface that consists of some of the following ports: - "control in" port(s): It sets the engine context, i.e. all the settings required for a speech engine to run. It may include addresses where to get or send the streamed audio or results. - "control out" port(s): It produces the non-audio engine output (i.e. results and events). It may also involve some session control exchanges. - "audio in" port(s): It receives streamed input data. - "audio out" port(s): It produces streamed output data. Audio sub-systems can also be treated as web services that can produce streamed data or play incoming streamed data as specified by the control parameters. The "control in" or "control out" messages can be out-of-band or Wyld Expires û April 2003 [Page 20] SPEECHSC Protocol Evaluation Template November 2002 sent or received interleaved with "audio in or out" data. This can be determined in the context (setup) of the web services. Speech engines and audio sub-systems are pre-programmed as web services and composed into more advanced services. Once programmed by the application / controller, audio-sub-systems and engines await an incoming event (established audio session, etc...) to execute the speech processing that they have been programmed to do and send the results as programmed. Speech engines as web services are typically programmed to handle completely a particular speech processing task, including handling of possible errors. For example, as speech engine is programmed to perform recognition of the next incoming utterance with a particular grammar, to send result to a NL parser and to contact a particular error recovery process if particular errors occur. The following sub-sections compare each individual requirement against the protocol. 8.2. Analysis of General Requirements 8.2.1. Reuse existing protocols [5.1] T: Web services are is a class of protocols (framework) widely studied and developed across numerous standard bodies like W3C, OASIS, WS-I, Liberty, Parlay and adapted to numerous deployment environments issues at IETF, OMA, 3GPP, 3GPP2, JCP, etcà As an entry point, we recommend consulting the work at W3C [WS1]. 8.2.2. Maintain Existing Protocol Integrity [5.2] T: Web services is an XML-based framework that is by definition extensible to support appropriate syntax and semantics. Web services are bound on underlying transport protocols. Numerous such binding have been specified. Others are in development. By handling at SPEECHSC at the level of the Web services framework, the integrity is maintained for: - underlying transport protocols (to which the web service are bound (e.g. SOAP) - web service framework This does not prevent introducing bindings to new protocols if needed. For example, binding to SIP or BEEP could be advantageous for mobile deployments. 8.2.3. Avoid Duplicating Existing Protocols [5.3] T: By definition, the web service framework can be specified to remote control any web service. Specified syntax can be limited to avoid duplicating remote control functionalities offered by other protocols. At the same time, the extensibility inherent to the framework guarantees that it is possible to specify (standard) or define Wyld Expires û April 2003 [Page 21] SPEECHSC Protocol Evaluation Template November 2002 (application specific) remote control for other entities beyond the current scope of SPEECHSC. In that context and in view of unifying the remote control framework exposed to an application developer or a system integrator, it may be of interest to provide remote control syntax for special entities like prompt player etcà 8.2.4. Protocol efficiency [5.4] P+ to P: Web services are by definition more verbose protocols. Hence, at this stage this does not qualify work a T mark. However work is in progress (e.g. OMA, JCP) to optimize the exchanges to handle: - Client with limited resources - Constrained bandwidth These rely on protocol compression and optimization, caching and gateways. As such the protocols qualify as P+. In addition, based on the qualification of efficiency provided in [WS8], the web service framework proposed for SPEECHSC and described in [WS1] relies indeed on known efficient techniques: - Asynchronous pre-programming of the engines as web services to reduce exchanges and avoid racing conditions - Possibility to piggy back on response message if transported on optimized protocols like SIP or BEEP. - state caching in the engines that are considered as stand-alone, pre-packaged and pre-programmed engines. - etcà 8.2.5. Explicit invocation of services [5.5] T: Web service is typically used in a client-server environment. Solutions exist for peer to peer (service to service) etcà Web services have been deigned to support clients and servers at least one of which is operating directly on behalf of the user requesting the service. In addition, work on-going at OMA and JCP addresses some of these issues in mobile environment with the introduction of possible web service gateways. 8.2.6. Server Location and Load Balancing [5.6] T: Web services are widely developed for e-business applications. Numerous tools and mechanisms have been provided for service discovery ad advertisement. In addition, numerous offerings provide routing and load balancing capabilities as part of the web application server used to deploy the web service. Note that web services do not specify server location or load balancing; but they are deployed on systems that provide such functionalities. As web services are expected to be widely used in the future and central to most e-business offerings, it is to Wyld Expires û April 2003 [Page 22] SPEECHSC Protocol Evaluation Template November 2002 expect that such tools will become even more pervasive and efficient. 8.2.7. Simultaneous services [5.7] Web services allow control (interface) and composition of web services at will (e.g. WSFL). 8.2.8. Multiple media sessions [5.8] T: The framework proposed does not pre-supposes how many ports or streams are associated to the engine. Different inbound and outbound can be used at will 8.3. Analysis of TTS requirements 8.3.1. Requesting Text Playback [6.1] T: (supported û syntax to be defined; which is consistent with the web service framework) As described, TTS engines can be pre-programmed as web services to perform TTS on incoming text. This is simply a matter of agreeing on the control syntax to do so. The text to play back can be part of the control instructions transmitted in SOAP to the TTS engine. 8.3.2. Text Formats [6.2] T: Exchanged format for text can be any MIME type; including plain text. 8.3.3. Plain text [6.2.1] T: Exchanged format for text can be any MIME type; including plain text. 8.3.4. SSML [6.2.2] T: Exchanged format for text can be any MIME type; including XML and hence SSML 8.3.5. Text in Control Channel [6.2.3] TBD 8.3.6. Document Type Indication [6.2.4] T: SOAP and the web service framework built on SOAP rely on XML and MIME type to identify media types. This is at the core of data exchange in SOAP. 8.3.7. Control Channel [6.3] T: As proposed above, SOAP [WS2] and WSDL [WS3] support the remote control of the web services (engines or media processing entity). 8.3.8. Playback Controls [6.4] T: (supported û syntax to be defined; which is consistent with the web service framework) This is simply a matter of agreeing on the control syntax to do so as part of the control instructions transmitted in SOAP to the TTS engine. Wyld Expires û April 2003 [Page 23] SPEECHSC Protocol Evaluation Template November 2002 8.3.9. Session Parameters [6.5] T: Session parameters are presumably content delivered as part of the control instructions transmitted in SOAP to the TTS engine. 8.3.10. Speech Markers [6.6] T: Speech markers are presumably content delivered as part of the control instructions transmitted in SOAP to the TTS engine. [NDLR : how are the marker events returned to the client end?] 8.4. Analysis of ASR requirements 8.4.1. Requesting Automatic Speech Recognition [7.1] T: (supported û syntax to be defined; which is consistent with the web service framework) As described, ASR engines can be pre-programmed as web services to perform speech recognition on incoming audio. This is simply a matter of agreeing on the control syntax to do so. The instructions and parameters (including data files like grammars etcà) can be part of the control instructions transmitted in SOAP to the ASR engine. Results can be part of the web service messaging as supported by the web service framework. 8.4.2. XML [7.2] T: Exchanged format for message can be any MIME type; including XML and hence XML for controlling the ASR. 8.4.3. Grammar Specification [7.3.1] T: Grammar specification can be part of the messages to control the ASR. This includes any MIME type; including XML for passing grammars by values, other MIME format including binary and URI for passing grammars by reference. 8.4.4. Explicit Indication of Grammar Format [7.3.2] T: SOAP and the web service framework built on SOAP rely on XML and MIME type to identify media types. This is at the core of data exchange in SOAP. 8.4.5. Grammar Sharing [7.3.3] T: The framework described supports pre-programming of the engines per utterance, per session or in an unlimited manner. This way grammar sharing can easily be achieved and controlled by an external controller, application etcà 8.4.6. Session Parameters [7.4] T: Session parameters are presumably content delivered as part of the control instructions transmitted in SOAP to the ASR engine. 8.4.7. Input Capture [7.5] T: (supported û syntax to be defined; which is consistent with the web service framework) As described, ASR engines can be pre-programmed as web services to perform speech recognition on incoming audio. This is simply a Wyld Expires û April 2003 [Page 24] SPEECHSC Protocol Evaluation Template November 2002 matter of agreeing on the control syntax to do so. The instructions and parameters (including data files like grammars etcà) can be part of the control instructions transmitted in SOAP to the ASR engine. This cab include the syntax and instructions to capture the audio. 8.5. Analysis of Speaker Identification and Verification Requirements 8.5.1. Requesting SI/SV [8.1] T: (supported û syntax to be defined; which is consistent with the web service framework) As described, SI or SV engines can be pre-programmed as web services to perform speaker recognition on incoming audio. This is simply a matter of agreeing on the control syntax to do so. The instructions and parameters (including data files like voice prints, etcà) can be part of the control instructions transmitted in SOAP to the SI or SV engine. Results can be part of the web service messaging as supported by the web service framework. 8.5.2. Identifiers for SI/SV [8.2] T: This can be part of the control message. 8.5.3. State for multiple utterances [8.3] T: This can be achieved by appropriately programming the SI or SV engine across multiple utterances. This is simply a matter of agreeing on the control syntax to do so. The framework described in section 1 support spanning multiple utterances. 8.5.4. Input Capture [8.4] T: (supported û syntax to be defined; which is consistent with the web service framework) As described, SI or SV engines can be pre-programmed as web services to perform speaker recognition on incoming audio. This is simply a matter of agreeing on the control syntax to do so. The instructions and parameters (including data files like grammars etcà) can be part of the control instructions transmitted in SOAP to the ASR engine. This can include the syntax and instructions to capture the audio. 8.5.5. SI/SV functional extensibility [8.5] T: By definition a web service framework and XML are extensible to new functionality and describe how extensibility is achieved. 8.6. Analysis of Duplexing and Parallel Operation Requirements 8.6.1. Duplexing and Parallel Operation Requirements [9] T: As explained, web services allow control (interface) and composition of web services at will (e.g. WSFL). Also, it does not pre-supposes how many ports or streams are associated to the engine. Different inbound and outbound can be used at will; in full duplex or even between engines as supported by WSFL [WS4] and WSXl [WS7]. Wyld Expires û April 2003 [Page 25] SPEECHSC Protocol Evaluation Template November 2002 8.6.2. Full Duplex operation [9.1.1] T: 8.6.3. Multiple services in parallel [9.1.2] T: 8.6.4. Combination of services T: As explained, web services allow control (interface) and composition of web services at will (e.g. WSFL) into complex parallel, serial or coordinated combinations as supported by WSFL [WS4] and WSXl [WS7]. 8.7. Analysis of additional considerations (non-normative) The framework proposed supports: - Use of SDP to describe sessions and streams for the streamed channels - Time stamps could be transmitted as part of the control messages at the web service level or in band (e.g. with dynamic payload switch or within the payload). - The framework is compatible with any encoding scheme. This is illustrated by the work on SRF (Speech Recognition Framework) driven at 3GPP that supports conventional and DSR optimized codecs and possible exchange of speech meta-information (e.g. data that may be required to facilitate and enhance the server-side processing of the input speech and facilitate the dialog management in an automated voice service. These may include keypad events over-riding spoken input, notification that the UE is in hands-free mode, client-side collected information (speech/no-speech, barge- in), etcà.). - SOAP over SIP or BEEP to support the framework described in section 1 can also support VCR controls. - real-time messaging between engine and control is supported within the framework (e.g. via SOAP or XML events). The framework also support exchange between engines (same process; see also WSXL [WS7]). Although non-normative, the web service framework described probably deserves marks of P+ to T. 8.8. Analysis of Security considerations 8.8.1. Security Considerations [11] Web services are evolving to provide security, authentication, encryption, trust management and privacy . Details can be found for example in [WS9] and explained in [WS10]. This is now an OASIS activity [WS11]. This framework would enable SPEECHSC to employ the security mechanism provided bu WS-Security for the remote control aspects. Exchanged media can rely on security mechanism at the transport / streaming level. The web service framework described probably deserves marks of P+ to T. Wyld Expires û April 2003 [Page 26] SPEECHSC Protocol Evaluation Template November 2002 9. Security Considerations Security considerations for the SPEECHSC protocol are covered by the comparison against the specific Security requirements in the SPEECHSC requirements document [1]. 10. References [1] E. Burger, ôRequirements for Distributed Control of ASR, SR and TTS Resources" draft-ietf-speechsc-reqts-00.txt, July 31, 2002. [2] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R. Sparks, M. Handley, E.Schooler, SIP: Session Initiation Protocol, RFC3265, June 2002. (Obsoletes RFC2543) [WS1] W3C Web Services, http://www.w3c.org/2002/ws/ [WS2] Simple Object Access Protocol (SOAP) http://www.w3c.org/2002/ws/ [WS3] Web Services Description Language (WSDL 1.1), W3C Note 15 March 2001, http://www.w3.org/TR/wsdl. [WS4] Leymann, F., Web Service Flow Language, WSFL 1.0, May 2001, http://www- 4.ibm.com/software/solutions/webservices/pdf/WSFL.pdf [WS5] UDDI, http://www.uddi.org/specification.html [WS6] W3C Voice Activity, http://www.w3c.org/Voice/ [WS7] WSXL - Web Service eXperience Language submitted to OASIS WSIA and WSRP - WSXL - Web Service eXperience Language submitted to OASIS WSIA and WSRP [WS8] Requirements for Distributed Control of ASR, SI/SV and TTS Resources, draft-ietf-speechsc-reqts-01.txt [WS9] Security in a Web Services World: A Proposed Architecture and Roadmap, April 7, 2002, Version 1.0, http://www.verisign.com/wss/wss.pdf [WS10] Kapil Apshankar, WS-Security, Security for Web Services, http://www.webservicesarchitect.com/content/articles/apshankar04.as p [WS11] OASIS Web Services Security TC, http://www.oasis- open.org/committees/wss/ AuthorÆs Address Brian Wyld Eloquant SA ZA Malvaisin Phone: +33 476 77 46 92 Le Versoud, France Email: brian.wyld@eloquant.com Full Copyright Statement Copyright (C) The Internet Society (2002). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph Wyld Expires û April 2003 [Page 27] SPEECHSC Protocol Evaluation Template November 2002 are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." Wyld Expires û April 2003 [Page 28]