[wg11] FW: What is a syntax?

Ed Barkmeyer edbark at nist.gov
Thu Sep 6 18:57:42 EDT 2007


Radack, Gerald wrote:

> The Committee Draft of ISO 8000-110 contained the following in its scope 
> statement.
> 
> The following are within the scope of this part of ISO 8000:
> 
> •           syntax requirements;
> 
> •           semantic encoding requirements;
> 
> •           requirements regarding conformance to customer needs.

Wow!  This is a collection of highly ambiguous bullets.

> The following ballot comment was raised:
> 
> What is meant by syntax? Is it a data model? Not defined in the 
> definitions of Part 2
> 
>  
> 
> I would appreciate any comments on the attached document on syntax and 
> semantics.

My first reaction to Annex B is that the thinking is upside down.
More accurately, it presents the details of exchange as seen by a
decoder, as distinct from the details of exchange as seen by a
person with information to convey.

That said, we can go to the question that was asked:  "What is a
syntax?"

A syntax is a component of the definition of a language.  The
syntax is a formal specification for the constructs of the
language that are to be considered valid "sentences" in the
language.  The sentence constructs are typically compositions of
other constructs, each of which has its own specification for a
valid formation.  The "base elements" of a language are said to
be "terminals" -- things that the language assumes to be
available, such as letters or phonemes, and the syntax does not
provide rules for valid forms of these.  It is possible that
another language defines those "base elements" as its "sentences"
and provides syntax for the construction of those "sentences" in
terms of a "more fundamental" set of "base elements", such as
bits or wave forms.  For example, the base elements of EXPRESS
are characters, and EXPRESS uses ISO 10646 to specify the binary
representations of those elements.

The "semantics" of a language assigns a "meaning" to each base
element and defines the "meaning" of an intermediate
(non-terminal) construct as a new concept that uses the meanings
of the parts of the construct.  In this way, the meaning of each
"sentence" in the language is derived from the meanings of its
parts.  (A slightly different view is that meanings are assigned
only to non-terminal constructs.  In such a case, there are
constructs whose parts are all base elements, and they are said
to be "lexical constructs" -- the lowest level construct to which
meaning is assigned.  For example, EXPRESS does not assign
meaning to characters (the base elements); it assigns meaning to
keywords, identifiers, quoted strings, and numbers (the lexical
elements).)

So "syntax" is usually specified top-down, and "semantics" is
specified bottom-up.

In a language that is used to define messages, the "sentences"
are the valid messages, and they involve non-terminal constructs
that are the "components" of the message.

"Message designers" think of the "components" being pre-defined
and "assembled" into messages that convey particular bodies of
information.  In reality, the activity assumes a reference model
of all the useful information about the "business entities" of
the enterprise and selects those information units that are
relevant to the purpose of the message.

The EDI approach is to specify that reference model as a set
of components that are data structures representing the
business entities, each of which "contains" a set of optional
elements that represent all possible useful information about the
entity, including relationships to other entities (as contained
elements).  So the selection process becomes syntactic
incorporation of the corresponding component and "redeclaring"
each 'optional element' as present, absent or optional in the
message.

The STEP approach is to specify the reference entities and their
properties in a separate model and a different language
(EXPRESS).  And it defines the IRM entities to be empty, and
defines constructs by which attributes can be added to the base
entities to form Application Objects.  But the representation
language is a separate language, with its own "sentences", e.g.
"messages", and its own terminal elements.  So a "mapping" is
needed to relate the EXPRESS constructs to the representation
constructs.  And that mapping is specified in a third language.
And all three of those languages have syntax and semantics.  Note
that Part 21 defines the representation language in clause 7 and
then the mapping rules in English.  Part 28 refers to the XML
specifications that define the representation language; it
defines only the mapping, also in English.

Most messaging languages use characters as the base elements, and
appeal to Unicode/ISO 10646 to define the corresponding binary
representation.  But ISO 8825 defines the transformation from
ASN.1 message specifications directly to bits. And that is the
end of what software understands, but there are further layers
that define how those "logical bits" get converted to signals in
the physical medium of exchange.

All of this is to say that one doesn't speak of "syntax" and
"semantics" in a vacuum.  Syntax and semantics are aspects of a
"language", and multiple languages are often involved in message
definition.

So "syntax requirements" (which should read: "syntax
specification") is in scope if ISO 8000 defines a language, and
not otherwise.  (Note that defining an XML schema is, in fact, 
defining a new XML language, in which the schema specifies some 
syntax rules, the XML Schema Recommendation defines other syntax 
rules, and the XML Recommendation defines still other syntax 
rules.  One can see that as defining three tiers of syntax, or as 
defining one syntax that incorporates syntax rules defined in 
other standards.)

"semantic encoding requirements" is an inscrutable term. I assume
it refers to "specification of mappings between the information
model (in the modeling language) and the representation forms (in
the representation language)".  But there is nothing "semantic"
about that mapping -- it is specified in terms of the constructs
of the two languages.  And the semantics of the representation
language is usually so low level that there is no relationship to
the semantics of the information model.  For example, there is no
semantic relationship between EXPRESS "entity" -- a collection of
information about a single object -- and XML "element" -- a
character string with proper demarcation that conveys one or more
data units, except that we use the latter to represent the
former.  And as observed above, the EDI mechanism for this
mapping is a set of (syntactic) redeclarations of the optional
elements.  The intended semantics (of the message and the 
entities it references) guides the mapping, but the mapping 
itself is purely syntactic.

(My usual Lecture #306 kind of contribution.  Take it for what 
it's worth.)

-Ed

-- 
Edward J. Barkmeyer                        Email: edbark at nist.gov
National Institute of Standards & Technology
Manufacturing Systems Integration Division
100 Bureau Drive, Stop 8263                Tel: +1 301-975-3528
Gaithersburg, MD 20899-8263                FAX: +1 301-975-4694

"The opinions expressed above do not reflect consensus of NIST,
  and have not been reviewed by any Government authority."





More information about the wg11 mailing list