XML DTD reference

Author: Razvan MIHAIU
razvan_rem@rem_mihaiu.name (please remove '_rem' and 'rem_')
From: www.mihaiu.name
Date: 20/06/2005
  1. <!ELEMENT>
    • this tag is used to define an XML element type name and its permissible sub-elements;
    • the name of an element must be a *legal* XML name:
      • Unicode letters and digits;
      • four punctuation marks: ".", "-", "_", ":";
      • colons ":" should only be used in an XML name as a namespace delimiter;
      • the first letter of an XML Name must be a Unicode letter or a colon (":") or an underscore ("_");
    • sample usage:
      • this element type can have any tag that is defined in the associated schema and XML text
        <!ELEMEMT elementName ANY>
      • this element cannot have content, but it can have attributes
        <!ELEMEMT elementName EMPTY>
      • this element can contain only text (it cannot have any child elements)
        <!ELEMENT elementName (#PCDATA)>
      • this element can have children but it cannot contain text (with the exception of whitespace)
        1. element's children are specified using a sequence list thus they must appear in the specified order
          <!ELEMENT elementName (child1, child2)>
        2. element's children are specified using a mutually exclusive choice list
          <!ELEMENT elementName (child1 | child2)>
      • mixed content model: this element can contain both childrens and text but you cannot specify the order or the number of its children
        <!ELEMENT elementName2 (#PCDATA | Grade | elementName)*>
    • Cardinality operators:
      • ? - "0 or 1"
      • * - "0 to n"
      • + - "1 to n"

    • Cardinality operator applied to each element type:

      For choice lists

      • "?" at most one element from the choice list must appear; it is legal if no element will appear
        <!ELEMENT elementName (child1 | child2)?>
      • "*" any of the elements from the choice list can appear in any order and in any number
        <!ELEMENT elementName (child1 | child2)*>
      • "+" at least one element from the choice list must appear at least once
        <!ELEMENT elementName (child1 | child2)+>

      For sequence lists

      • "?" the specified sequence list can appear 0 or 1 times; disparate elements from the sequence cannot appear
        <!ELEMENT elementName (child1, child2)?>
      • "*" the specified sequence list can appear 0 to n times; disparate elements from the sequence cannot appear
        <!ELEMENT elementName (child1, child2)*>
      • "+" the specified sequence list must appear at least once; disparate elements from the sequence cannot appear
        <!ELEMENT elementName (child1, child2)+>
  2. <!ENTITY>
    • this tag is used to define replaceable content;
    • allows references to parsed/unparsed external entities from XML documents; allows references to parsed entities from DTD documents (this kind of entity is called *parameter* entity);
    • a parsed entity is defined in the DTD in either the internal subset (internal means in the XML file itself) or external subset (in a separate DTD file);
    • if an entity is specified as parsed and it *is* referenced then its content must be valid XML;
    • sample usage:
      • internal parsed entity
        <!ENTITY name "replacement_text">
      • external parsed entity
        <!ENTITY name SYSTEM "location"> <!ENTITY name PUBLIC "identifier" "location">


        If the external parsed entity is not encoded in UTF-16 or UTF-8 then the external parsed entity must have a declaration on its first line that inform the parser that a specific encoding is used:

        <?xml version="1.x" encoding="Big5"?>
      • external unparsed entity; such an entity is always external; an unparsed entity is always associated with a notation:
        <!ENTITY name SYSTEM "location" NDATA notation_type> <!ENTITY name PUBLIC "ident" "loc" NDATA notation_type>

        - the "notation_type" must match a name in a <!NOTATION> declaration;

        - the NDATA keyword is used to differentiate between external parsed and external unparsed entities.

        - it is *illegal* to have recursive reference declarations:

        <!ENTITY self_ref "&self_ref;"> <!ENTITY ref_a "&ref_b;"> <!ENTITY ref_b "&ref_a;">
      • parameter entities are used exclusively in DTDs and must always be parsed entities;

        - the format of an internal/external *parameter* entity is (this entity can be declared in the internal or the external DTD subset):

        <!ENTITY % name "replacement_text"> <!ENTITY % name SYSTEM "location"> <!ENTITY % name PUBLIC "identifier" "location">

        - can be used to include another DTDs in the current DTD:

        <!ENTITY % AnotherDTD SYSTEM "SomeFile.dtd"> %AnotherDTD;
      • character entity references have the following formats:

        &#NNNNN; (decimal representation has up to 5 digits)

        &#XXXX; (hexa representation has up to 4 digits)

        - example: &#169 == &#A9 (this is the copywright '�' character)

        - there are 5 build-in character entity references defined in XML:

        • ampersand; (&)
        • less than; (<)
        • greater than; (>)
        • apostrophe (')
        • quote (")
  3. <!ATTLIST>

    -defines the attributes of an XML element (permissible and default values);

    Attribute definitions:

    • the attribute *must* be present in the XML document (is required)
      <!ATTLIST AnElement an_attribute CDATA #REQUIRED>
    • the attribute is optional:
      <!ATTLIST AnElement an_attribute CDATA #IMPLIED>
    • the attribute is optional, but if it appears it must have a certain predefined value:
      <!ATTLIST AnElement an_attribute CDATA #FIXED "value">
    • the attribute is optional and it has a default value; a validating parser will supply the default value if the attribute is not specified in the respective element:
      <!ATTLIST AnElement an_attribute CDATA "value">
    • the attribute is optional but it can only have values from a predefined list:
      <!ATTLIST Test6 an_attribute (value1 | value2) #IMPLIED>

    Attribute types: (there are 10 types)

    1. CDATA - in CDATA you cannot have external entities, nor contain unescaped "<" signs; the less-than sign must be encoded "<"; for an example of CDATA attribute see above;
    2. Enumerated values - all the enumerated values must be composed of NameChars; for an example see above;
    3. ID - a unique identifier in the whole document instance (regardless of the element type):
      <!ATTLIST Test6 an_attribute ID #IMPLIED> <!ATTLIST Test6 an_attribute ID #REQUIRED>
    4. IDREF/IDREFS - the value of such an attribute must be a legal XML name and must match an ID in the same document instance:
      <!ATTLIST Test5 ID ID #IMPLIED Ref IDREFS #REQUIRED Ref2 IDREF #IMPLIED > <!ELEMENT Test5 EMPTY> <!-- this element has an optional ID, a required IDREFS and an optional IDREF; the IDREFS attribute has a single value that points to the same element --> <Test5 ID="abc" Ref="abc"/>

      - the only real difference between NMTOKEN and CDATA is that the former will not allow the whitespace and some punctuation characters;

      - NMTOKEN/NMTOKENS only allow NameChar characters;

      <!ATTLIST Test5 Year NMTOKEN #IMPLIED Values NMTOKENS #REQUIRED TimeStamp NMTOKEN #FIXED "15:00" Parts NMTOKENS "A37 B100 C90" >

      - the values of such attributes must match the names of *unparsed* entity already declared in the DTD;

      <!-- DTD --> <!ELEMENT Test5 EMPTY> <!ATTLIST Test5 Img1 ENTITY #REQUIRED Img2 ENTITY #FIXED "Toto1" Img3 ENTITY #IMPLIED Img4 ENTITY "def" > <!ENTITY Toto1 PUBLIC "id" "loc" NDATA NotNo500> <!NOTATION NotNo500 PUBLIC "ident" "loc"> <!-- XML --> <Test5 Img1="Toto11"/>

      - must point to a notation that is explicitely defined in the DTD;

  4. <!NOTATION>

    - this tag is used to describe non-xml data; its a hint to the application about handling unparsable data;

    <!NOTATION name SYSTEM "location"> <!NOTATION name PUBLIC "identifier" "location">
  5. conditional sections: IGNORE & INCLUDE directives;
    <![INCLUDE [ <!ELEMENT Test7 EMPTY> ]]> <![IGNORE [ <!ELEMENT Test8 EMPTY> ]]>

    - parameter entities must be used in order to achieve the effect of conditional sections:

    <!ENTITY % TestCondition "INCLUDE"> <![%TestCondition; [ <!ELEMENT Test9 EMPTY> ]]>

Best regards,

Razvan Mihaiu � 2000 - 2017