[wg11] Part 28 features we could lose

Tue Jun 8 17:20:34 EDT 2004

I have a maybe naive question:

We know that DTD is not powerful enough, XML-Schema is too complex and
somewhere is RDF and maybe there are other ways to specify XML formats
I don't know about. (I'm not even sure if "XML format" is an
acceptable term)

Do we really need to use one of these XML format specifications at all?
Can't we simply provide our own EXPRESS based definition on how to use
XML-elements and -attributes? - ignoring all these complications from
those DTD/XML-Schema/RDF?

As I understand XML only requires that XML documents are well formed -
we could define what this means for EXPRESS.

One argument for this (if it is possible at all) is that all the
current approaches on mapping EXPRESS-to XML do not provide
sufficient validation as required by EXPRESS schemas, and what is the
worth of some "validation" if it is not complete?

Lothar

-- 
// Lothar Klein, LKSoftWare GmbH
// Steinweg 1, 36093 Kuenzell, Germany
// +49 661 933933-0, Fax: -2
// mailto:lothar.klein at lksoft.com   http://www.lksoft.com

Tuesday, June 8, 2004, 9:49:56 PM, you wrote:
> I offer the following as a guide to what one might want to do to 
> simplify Part 28.

> Not all features of the EXPRESS language need to be captured in the XML
> schemas.  I would divide the EXPRESS features into a "required" group,
> an "at most optional" group, and an "impossible" group:

> Required:
>   - entities and attributes
>   - SUBTYPE and simple inheritance of attributes
>   - multiple inheritance
>   - ANDOR subtype overlaps
>   - entity-valued attributes "by reference" (by identifier)
>   - entity-valued attributes "by value"
>   - primitive types
>   - aggregation data types
>   - ENUMERATION types
>   - SELECT types
>   - defined data types

> At most optional:
>   - ABSTRACT entity
>   - references to entity instances not in the data set
>   - INVERSE attributes
>   - DERIVEd attributes (value only)
>   - attribute redeclaration
>   - UNIQUE rules
>   - restrictions of STRING and BINARY data types (FIXED and max length)
>   - constraints on the sizes of aggregate values
>   - ordering and uniqueness in aggregates (SET/BAG vs. LIST)
>   - ARRAY OF OPTIONAL
>   - "references" to values that are not entity instances
>   - specialization rules in defined data types

> Impossible:
>   - USE/REFERENCE (you can't import the XML schema for the interfaced
> EXPRESS schema selectively, and in general, you can't subtype any of its
> entity data types or extend EXTENSIBLE types)
>   - substitution of specializations
>   - WHERE clauses
>   - expressions and FUNCTIONs
>   - SUBTYPE CONSTRAINTs and SUPERTYPE clauses
>   - EXTENSIBLE SELECT (only the extended SELECT can be mapped)
>   - EXTENSIBLE ENUMERATION (only the extended ENUM can be mapped)
>   - attribute RENAME (only the new name is available)

> To do the Required list elegantly requires all of the following features
> of XML schema:
> - XML elements
>    (for EXPRESS entities and attributes, and for some instances of 
> non-entity data types)
> - XML attributes
>    (for instance ids and other "metadata")
> - XML data types
>    (for EXPRESS data types, nearly 1-to-1)
> - sequence particles
>    (for most structures)
> - choice particles
>    (for SELECT types)
> - extensions of simpleTypes (complex types with simple content)
>    (add XML attributes to BINARY and aggregate data types)
> - restrictions of simple types
>    (ENUMERATION, and empty restrictions for other defined data types)
> - extensions of complex types
>    (attributes of subtypes)
> - restrictions of complex types with simple content
>    (only for defined data types whose underlying type is another defined
> data type and whose fundamental type is BINARY or certain aggregates)
> - restrictions of complex types with complex content
>    (only for defined data types whose underlying type is another defined
> data type and whose fundamental type is certain aggregates)
> - substitution groups
>    (subtypes)
> - XML identifiers
>    (for "entity instance names" and references to them)
> - nillable
>    (for aggregates of entities to include references)

> Support for "multiple inheritance" = SUBTYPE OF (a, b), requires 
> construction of XML data types that do not show inheritance 
> relationships at all for the type that has multiple inheritance and all
> of its supertypes.  (In 6.6, there are four rules for this.  Choosing
> inheritance="true" always only eliminates the first rule.  Choosing 
> inheritance="false" eliminates all the rules, but it also effectively
> turns all supertypes into SELECT types in the mapping to XML schema.)

> Support for ANDOR requires declaration of "partial entity instance" 
> elements distinct from the entity elements, and inclusion of an 
> "external mapping" element as an option for the content of an attribute
> whose value could be an ANDOR.  (Not currently in the CD, but needed.)

> Optional XML features:
> - abstract XML data types
>    (to support ABSTRACT, but not 1-to-1)
> - nested extensions and nested restrictions
>    (to support redeclaration, and constraints on aggregation sizes, 
> string lengths)
> - keys
>    (to support UNIQUE rules and reference validation)
> - keyrefs with simple target Xpaths
>    (to support reference validation)
> - keyrefs with compound target Xpaths
>    (to support reference validation with multiple inheritance or ANDOR)
> - block
>    (to prevent schema extension)

> Support for attribute redeclaration involves cascading complex type 
> restrictions in XML schema and several different rules (see clause 6.6.4).

> Support for ARRAY OF OPTIONAL requires a different structure from the
> ARRAY structure for primitive types, and either representation of the
> subscripts of the elements present or "nil" representation of the 
> elements absent.

> Distinction between LIST and any other aggregate can only be "supported"
> by XML attributes that may be inspected by the recipient.  There is no
> XML schema equivalent of ARRAY, SET or BAG.

> Support for aggregate bounds is easy for most instances of many 
> aggregation data types, and slightly uglier for certain types, but a
> general solution requires a new XML data type for every different 
> occurrence of an anonymous aggregate in the EXPRESS schema.

> Optional features of the CD that are not derived from EXPRESS concepts:

> - Support for entity references out of the data set requires declaration
> of "proxy elements" distinct from the entity elements.

> - Support for referenceable instances that are not entity instances 
> requires instance elements that have the XML attributes of entity 
> instance elements for all data types.

> Configuration features of the CD designed to revise or augment the 
> EXPRESS schema:
> - invert
> - entity name=
> - attribute name= map=
> - type name= map=
> - aggregate name=
> - exp-attribute="no-tag", exp-attribute="entity-tag"
> - tag-source and tag-values
> - exp-type
> - contain, use-id
> - notation
> - naming-convention
> - tagless for aggregates of STRING and BINARY

> Configuration features of the CD derived from the refusal of the 
> technical experts to make an engineering decision:
> - sparse: standardize one of "true" or "false", delete the other
> - flatten: standardize one of "true" or "false", delete the others
> - tagless: standardize the current default (true for simple values, 
> false for others), delete the options
> - exp-attribute="attribute-tag","double-tag","attribute-content": 
> standardize the current default (attribute-tag for simple values, 
> double-tag for others), delete the options

> (Note the number of subclauses and decision tables in 6.3.2, 7.8 and
> other places that would be completely eliminated by simply making these
> decisions.)

> -Ed

> P.S. I'm not really trying to make recommendations.  I'd like to focus
> the discussion of the complexity of the XML schemas on explicit issues.