|  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
 | | 
 |  |  |  | The new version is improved in many ways. Some general improvements
        are: significantly better conformance to the XML spec, cleaner
        internal architecture, many bug fixes, and faster speed. |  |  |  | Except for a couple of the very obscure (mostly related to
            the 'standalone' mode), this version should be quite compliant.
            We have more than a thousand tests, some collected from various
            public sources and some IBM generated, which are used to do
            regression testing. The C++ parser is now passing all but a
            handful of them. | 
 
 |  |  |  | This version has many bug fixes with regard to XML4C version 2.x.
            Some of these were reported by users and some were brought up by
            way of the conformance testing. | 
 
 |  |  |  | Much work was done to speed up this version. Some of the
            new features, such as namespaces, and conformance checks ended
            up eating up some of these gains, but overall the new version
            is significantly faster than previous versions, even while doing
            more. | 
 
 | 
 
 
 |  |  |  | The sample programs no longer use any of the unsupported
        util/xxx classes. They only existed to allow us to write
        portable samples. But, since we feel that the wide character
        APIs are supported on a lot of platforms these days, it was
        decided to go ahead and just write the samples in terms of
        these. If your system does not support these APIs, you will
        not be able to build and run the samples. On some platforms,
        these APIs might perhaps be optional packages or require
        runtime updates or some such action. More samples have been added as well. These highlight some
        of the new functionality introduced in the new code base. And
        the existing ones have been cleaned up as well. The new samples are: 
           PParse - Demonstrates 'progressive parse' (see below)StdInParse - Demonstrates use of the standard in input sourceEnumVal - Shows how to enumerate the markup decls in a DTD Validator | 
 
 |  |  |  | In the XML4C 2.x code base, there were the following parser
        classes (in the src/parsers/ source directory):
        NonValidatingSAXParser, ValidatingSAXParser,
        NonValidatingDOMParser, ValidatingDOMParser.  The
        non-validating ones were the base classes and the validating
        ones just derived from them and turned on the validation.
        This was deemed a little bit overblown, considering the tiny
        amount of code required to turn on validation and the fact
        that it makes people use a pointer to the parser in most cases
        (if they needed to support either validating or non-validating
        versions.) The new code base just has SAXParer and DOMParser
        classes. These are capable of handling both validating and
        non-validating modes, according to the state of a flag that
        you can set on them. For instance, here is a code snippet that
        shows this in action. |  |  |  |  |  | void ParseThis(const  XMLCh* const fileToParse,
               const bool validate)
{
  //
  // Create a SAXParser. It can now just be
  // created by value on the stack if we want
  // to parse something within this scope.
  //
  SAXParser myParser;
  // Tell it whether to validate or not
  myParser.setDoValidation(validate);
  // Parse and catch exceptions...
  try
  {
    myParser.parse(fileToParse);
  }
    ...
}; |  |  |  |  |  | 
We feel that this is a simpler architecture, and that it makes things
        easier for you. In the above example, for instance, the parser will be
        cleaned up for you automatically upon exit since you don't have to
        allocate it anymore. | 
 
 |  |  |  | Experimental early support for some parts of the DOM level
        2 specification have been added. These address some of the
        shortcomings in our DOM implementation,
        such as a simple, standard mechanism for tree traversal. | 
 
 |  |  |  | The new parser classes support, in addition to the
        parse() method, two new parsing methods,
        parseFirst() and parseNext().  These are
        designed to support 'progressive parsing', so that you don't
        have to depend upon throwing an exception to terminate the
        parsing operation. Calling parseFirst() will cause the DTD (or
        in the future, Schema) to be parsed (both internal and
        external subsets) and any pre-content, i.e. everything up to
        but not including the root element. Subsequent calls to
        parseNext() will cause one more pieces of markup to be parsed,
        and spit out from the core scanning code to the parser (and
        hence either on to you if using SAX or into the DOM tree if
        using DOM.) You can quit the parse any time by just not
        calling parseNext() anymore and breaking out of the loop. When
        you call parseNext() and the end of the root element is the
        next piece of markup, the parser will continue on to the end
        of the file and return false, to let you know that the parse
        is done. So a typical progressive parse loop will look like
        this: |  |  |  |  |  | // Create a progressive scan token
XMLPScanToken token;
if (!parser.parseFirst(xmlFile, token))
{
  cerr << "scanFirst() failed\n" << endl;
  return 1;
}
//
// We started ok, so lets call scanNext()
// until we find what we want or hit the end.
//
bool gotMore = true;
while (gotMore && !handler.getDone())
  gotMore = parser.parseNext(token); |  |  |  |  |  | 
In this case, our event handler object (named 'handler'
        surprisingly enough) is watching form some criteria and will
        return a status from its getDone() method. Since the handler
        sees the SAX events coming out of the SAXParser, it can tell
        when it finds what it wants. So we loop until we get no more
        data or our handler indicates that it saw what it wanted to
        see. When doing non-progressive parses, the parser can easily
        know when the parse is complete and insure that any used
        resources are cleaned up. Even in the case of a fatal parsing
        error, it can clean up all per-parse resources. However, when
        progressive parsing is done, the client code doing the parse
        loop might choose to stop the parse before the end of the
        primary file is reached. In such cases, the parser will not
        know that the parse has ended, so any resources will not be
        reclaimed until the parser is destroyed or another parse is started. This might not seem like such a bad thing; however, in this case,
        the files and sockets which were opened in order to parse the
        referenced XML entities will remain open. This could cause
        serious problems. Therefore, you should destroy the parser instance
        in such cases, or restart another parse immediately. In a future
        release, a reset method will be provided to do this more cleanly. Also note that you must create a scan token and pass it
        back in on each call. This insures that things don't get done
        out of sequence. When you call parseFirst() or parse(), any
        previous scan tokens are invalidated and will cause an error
        if used again. This prevents incorrect mixed use of the two
        different parsing schemes or incorrect calls to
        parseNext(). | 
 
 
 
 |  |  |  | The system now supoprts loadable message text, instead of
        having it hard coded into the program. The current drop still
        just supports English, but it can now support other
        languages. Anyone interested in contributing any translations
        should contact us. This would be an extremely useful
        service. In order to support the local message loading services, we
        have created a pretty flexible framework for supporting
        loadable text. Firstly, there is now an XML file, in the
        src/NLS/ directory, which contains all of the error messages.
        There is a simple program, in the Tools/NLSXlat/ directory,
        which can spit out that text in various formats. It currently
        supports a simple 'in memory' format (i.e. an array of
        strings), the Win32 resource format, and the message catalog
        format.  The 'in memory' format is intended for very simple
        installations or for use when porting to a new platform (since
        you can use it until you can get your own local message
        loading support done.) In the src/util/ directory, there is now an XMLMsgLoader
        class.  This is an abstraction from which any number of
        message loading services can be derived. Your platform driver
        file can create whichever type of message loader it wants to
        use on that platform.  We currently have versions for the in
        memory format, the Win32 resource format, and the message
        catalog format. An ICU one is present but not implemented
        yet. Some of the platforms can support multiple message
        loaders, in which case a #define token is used to control
        which one is used. You can set this in your build projects to
        control the message loader type used. Both the Java and C++ parsers emit the same messages for an XML error
        since they are being taken from the same message file. | 
 
 |  |  |  | In a preliminary move to support Schemas, and to make them
        first class citizens just like DTDs, the system has been
        reworked internally to make validators completely pluggable.
        So now the DTD validator code is under the src/validators/DTD/
        directory, with a future Schema validator probably going into
        the src/validators. The core scanner architecture now works
        completely in terms of the framework/XMLValidator abstract
        interface and knows almost nothing about DTDs or Schemas. For
        now, if you don't pass in a validator to the parsers, they
        will just create a DTDValidator. This means that,
        theoretically, you could write your own validator. But we
        would not encourage this for a while, until the semantics of
        the XMLValidator interface are completely worked out and
        proven to handle DTD and Schema cleanly. | 
 
 |  |  |  | Another abstract framework added in the src/util/ directory
        is to support pluggable transcoding services. The
        XMLTransService class is an abtract API that can be derived
        from, to support any desired transcoding
        service. XMLTranscoder is the abstract API for a particular
        instance of a transcoder for a particular encoding. The
        platform driver file decides what specific type of transcoder
        to use, which allows each platform to use its native
        transcoding services, or the ICU service if desired. Implementations are provided for Win32 native services, ICU
        services, and the iconv services available on many
        Unix platforms. The Win32 version only provides native code
        page services, so it can only handle XML code in the intrinsic
        encodings ASCII, UTF-8, UTF-16 (Big/Small Endian), UCS4
        (Big/Small Endian), EBCDIC code pages IBM037 and
        IBM1140 encodings, ISO-8859-1 (aka Latin1) and Windows-1252. The ICU version
        provides all of the encodings that ICU supports. The
        iconv version will support the encodings supported
        by the local system. You can use transcoders we provide or
        create your own if you feel ours are insufficient in some way,
        or if your platform requires an implementation that we do not
        provide. | 
 
 
 | 
 |