[wg11] Re: Conflict between the goals of EXPRESS modelers and AP developers

Fri Jun 18 19:17:59 EDT 2004

David Price wrote:

> Secondly, it misses the point about what technology is nice to have
> available vs. what needs to be an ISO standard.  

I am still struggling with that question: What needs to be in the standard?

So I started from this question: What is the motivation for Part 28 in 
the first place?

(1) There is a need for a standard XML representation of data modeled in 
EXPRESS in ISO standards.  An XML representation is useful for the 
following reasons:
  - the standards-conforming data elements can be processed using XML 
encode/decode libraries available in your Java, VBasic, etc., 
programming toolkits;
  - the standards-conforming documents can be displayed by browsers with 
style sheets and other specialized XML tools;
  - the standards-conforming documents can be transformed to other XML 
forms using XSLT;
  - the standards-conforming data sets can be embedded in e-business 
transaction messages and other XML documents, without requiring 
special-purpose decoders.

Using XML provides access to more and better tools, books and training, 
and a large body of educated programmers and analysts.

All of these results accrue from defining a mapping from EXPRESS models 
to XML formation rules, rather like Part 21.  There is no direct need 
for either DTD models or XML Schema models.

But programmers expect to see a DTD or XML schema model to tell them 
what the standards-conforming data elements will look like.  They don't 
expect to see an EXPRESS model and a set of rules for rendering the 
corresponding XML.  So the use of toolkits, style sheets, and XSLT is 
greatly improved by providing a DTD or XML schema model for the data. 
And DTDs are not adequate to defining elegant models, and they are out 
of favor in the perennial IT popularity contests.  So:

(2) There is a need for an XML Schema specification of the data elements 
and document constructs for a data set that corresponds to an EXPRESS model.

(3) There is a need to validate exchange data sets against the standard.
In general, there is no automated technology that can do this.  But 
there are several possible interpretations of this statement that can be 
automated:
  - Validate that the received data can be converted into the internal 
model of the target application, using the provided XML schema for the 
structure of the data and the programmers understanding of the semantic 
intent (derived from EXPRESS models and text, tutorials, etc.).
  - Validate that the received data set satisfies as many of the EXPRESS 
rules as your EXPRESS toolkit can actually implement (which is most but 
not all), without any notion of what the data means.
  - Validate that the received data set satisfies the rules stated in 
the XML schema, without any notion of what the data means.
  - Validate that the received data set is well-formed XML.

Application tools need to do the first.  This is what users need. 
Application tool vendors may use tools that do the EXPRESS-based 
validation, as a means of debugging their own output routines, and as a 
means of minimizing the confusion on input when something ugly is 
encountered.

Testbeds and other "meta-work" facilities specific to SC4 work do the 
EXPRESS-based validation without regard for the meaning, primarily as a 
means of increasing vendor and user confidence in using the standards 
and user confidence in the conformance of the tools to the intent of the 
standards.  (But they can't test conformance to intent this way -- they 
can only test "syntactic conformance".)

Much of the target audience for the XML representation does not read 
EXPRESS or standards written in it, and they will not do EXPRESS-based 
validation, ever.  They still need to do the first.  And the supporting 
tools for debugging, testbeds, etc., for them might reasonably be 
expected to do the XML schema validation, since there are available 
off-the-shelf tools that do that.  So:

(4) It is desirable to have XML schema that captures the readily 
testable constraints stated in the EXPRESS schema.  It is not necessary 
to have this, but it has value.  This level of "validation" cannot be 
equivalent to the EXPRESS-based validation:
  - All supertype/subtype constraints and most WHERE clauses cannot be 
stated in XML schema at all.  Those that can require considerable 
analysis and interpretation to be mapped to XML schema equivalents.
  - Most simple EXPRESS constraints -- bounds on string length, constant 
bounds on the sizes of aggregate values -- can be stated in XML schema.
  - UNIQUE rules can be stated in XML schema.
  - Referential constraints that arise from the XML schema 
representation of the EXPRESS notion of "object identity" can be stated 
in XML schema.

Many of these constraints use features of XML schema that are not 
commonly used in e-business transactions and other XML exchange 
standards.  They are therefore unfamiliar to many programmers and 
modelers who are otherwise "XML literate", and they have less reliable 
support in the off-the-shelf tool kits.  And the tools that will 
validate against EXPRESS models don't need them. Therefore:

(5) It is desirable to have an option, or a conformance class, that 
deletes from the derived XML schema all the constraints that use unusual 
features of XML schema:  complex type restriction, unique, key and keyref.

(6) From an entirely different starting point, it is desirable to be 
able to use the XML schema models, with documentation or tutorials, as a 
means of introducing SC4 standard models to the much larger community 
that is unfamiliar with EXPRESS.  This would require the XML schema 
models to be in some sense "natural" to folks who are familiar with 
hierarchical XML models.  So, unlike Part 21:
  - the XML schema must permit entity instances to be "contained within" 
other entity instances, as well as, or perhaps instead of, pointed to by 
"instance name".  But since EXPRESS contains no hints for which things 
should be contained and which should be pointed to, the best the XML 
schema can do is to allow both.

Because EXPRESS schemas in STEP standards are built around an 
architecture that wasn't trying for elegance in the view presented to 
the end-user or the implementor, the straightforward mapping to XML 
schema will produce somewhat ugly XML schemas, and they will be hard for 
XML literati to understand and use.  So:

(7) It is desirable to be able to modify or re-organize the EXPRESS 
schema to produce a more "accessible" XML schema version for this target 
larger community.

But this means that the rote mapping of the EXPRESS schema to the XML 
schema will not be the standard one, or not the only standard one.  But 
the standard should allow all users of a given EXPRESS-based standard to 
use a common XML schema to reach the larger audience, and more 
importantly, the standard should make it clear to the programmer exactly 
what the output data must look like and what the input data may look 
like.  So:

(8) It is required that Part 28 define one standard mapping that 
determines a single XML schema as the standard representation of a given 
EXPRESS schema.  (With respect to item (5) above, it is possible that 
there are two versions, both of which define exactly the same XML 
structures, but one of them also states additional XML schema validation 
constraints.)

This makes it necessary to provide wish-list item (7) OUTSIDE OF Part 
28, possibly using EXPRESS-X beforehand, or XSLT after the mapping.

Regrettably, the team of experts that I worked with did not share the 
opinion that (8) was a requirement.  I understand that many of them 
strongly believe in (6) and (7) and are willing to sacrifice (8) to 
achieve that.  I believe that (8) is the requirement, and (7) is merely 
"desirable", and that where sacrifice is necessary it must go the other 
way.  But in any case, I believe that SC4 needs to make a decision as to 
whether Rule (8) is a mandatory requirement.

If (8) is a requirement, the current draft is not even close to meeting 
the objectives of Part 28!  The draft allows thousands of XML schemas 
that produce radically different XML data organizations to all be 
conformant with Part 28 for one given EXPRESS schema!  If (8) isn't a 
requirement, then the current document is probably very close, and the 
remaining issue is what we do about problem (5).

I'm sure everyone has his own chain of reasoning.  The above is mine.
But IMNSHO, every chain of reasoning must still come to answering the 
question:  Is it a requirement that a given EXPRESS schema produce one 
given conforming XML schema per Part 28 (and therefore one conforming 
XML data structure) or not?

Please, please answer that question in your NB comments!  And please 
realize that any answer other than YES it is a requirement, will be 
interpreted as No, because that is the mindset of the developers!

-Ed

P.S. I would also observe that some of the configuration directives and 
design decisions cannot be excused by any of the above logic, not even 
choosing (7) over (8).

-- 
Edward J. Barkmeyer                        Email: edbark at nist.gov
National Institute of Standards & Technology
Manufacturing Systems Integration Division
100 Bureau Drive, Stop 8264                Tel: +1 301-975-3528
Gaithersburg, MD 20899-8264                FAX: +1 301-975-4694

"The opinions expressed above do not reflect consensus of NIST,
  and have not been reviewed by any Government authority."