|  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
 | | 
 
 | |  |  |  |  |  |  Building Xerces-C++ with ICU using bundled Perl scripts on Windows |  |  |  |  |  | 
 |  |  | As mentioned earlier, Xerces-C++ may be built in stand-alone mode using
        native encoding support and also using ICU where you get support over 180
        different encodings. ICU stands for International Components for Unicode and is an
        open source distribution from IBM. You can get
        ICU libraries from
        IBM's developerWorks site
        or go to the ICU
        download page
        directly. |  | Important: Please remember that ICU and
        Xerces-C++ must be built with the same compiler,
        preferably with the same version. You cannot for example,
        build ICU with a threaded version of the xlC compiler and
        build Xerces-C++ with a non-threaded one. | 
 There are two options to build Xerces-C++ with ICU. One is to use the
        MSDEV GUI environment, and the other is to invoke the compiler from the
        command line. Using, the GUI environment, requires one to edit the project files.
        Here, we will describe only the second option. It involves using the
        perl script 'packageBinaries.pl'. Prerequisites: 
        Perl 5.004 or higherCygwin tools or MKS Toolkitzip.exe Extract Xerces-C++ source files from the .zip archive using WinZip, say
        in the root directory (an arbitrary drive x:). It should create a directory like
        'x:\xerces-c-src1_5_2'. Extract the ICU files, using WinZip, in root directory of the disk
        where you have installed Xerces-C++, sources. After extraction, there
        should be a new directory 'x:\icu' which contains all the ICU
        source files. Start a command prompt to get a new shell window. Make sure you have
        perl, cygwin tools (uname,rm,cp, ...), andzip.exesomewhere in the
        path. Next setup the environment for MSVC using
        'VCVARS32.BAT' or a similar file. Then at the prompt
        enter: |  |  |  |  |  | set XERCESCROOT=x:\xerces-c-src1_5_2
set ICUROOT=x:\icu
cd x:\xerces-c-src1_5_2\scripts
perl packageBinaries.pl -s x:\xerces-c-src1_5_2 -o x:\temp\xerces-c1_5_2-win32 -t icu |  |  |  |  |  | 
(Match the source directory to your system; the target directory can be
        anything you want.) If everything is setup right and works right, then you should see a
        binary drop created in the target directory specified above. This script
        will build both ICU and Xerces-C++, copy the files (relevant to the binary
        drop) to the target directory. For a description of options available, you can enter: | 
 
 
 
 | |  |  |  |  |  |  I wish to port Xerces to my favourite platform. Do you have any suggestions? |  |  |  |  |  | 
 |  |  | All platform dependent code in Xerces has been
      isolated to a couple of files, which should ease the porting
      effort.  Here are the basic steps that should be followed to
      port Xerces. 
        The directory src/util/Platformscontains the
        platform sensitive files whilesrc/util/Compilerscontains
        all development environment sensitive files. Each operating
        system has a file of its own and each development environment
        has another one of its own too.As an example, the Win32 platform as a
 Win32Defs.hppfile
        and the Visual C++ environment has aVCPPDefs.hppfile.
        These files set up certain define tokens, typedefs,
        constants, etc... that will drive the rest of the code to
        do the right thing for that platform and development
        environment. AIX/CSet have their ownAIXDefs.hppandCSetDefs.hppfiles, and so on. You should create new
        versions of these files for your platform and environment
        and follow the comments in them to set up your own.
        Probably the comments in the Win32 and Visual C++ will be
        the best to follow, since that is where the main
        development is done.Next, edit the file XercesDefs.hpp, which is where all
            of the fundamental stuff comes into the system. You will
            see conditional sections in there where the above
            per-platform and per-environment headers are brought in.
            Add the new ones for your platform under the appropriate
            conditionals.Now edit AutoSense.hpp. Here we set canonical Xerces
            internal#definetokens which indicate the platform and
            compiler. These definitions are based on known platform
            and compiler defines.
 AutoSense.hppis included inXercesDefs.hppand the
            canonical platform and compiler settings thus defined will
            make the particular platform and compiler headers to be
            the included at compilation.It might be a little tricky to decipher this file so be
            careful. If you are using say another compiler on Win32,
            probably it will use similar tokens so that the platform
            will get picked up already using what is already there.
Once this is done, you will then need to implement a
            version of the platform utilities for your platform.
            Each operating system has a file which implements some
            methods of the XMLPlatformUtils class, specific to that
            operating system. These are not terribly complex, so it
            should not be a lot of work. The Win32 verions is called
            Win32PlatformUtils.cpp, the AIX version isAIXPlatformUtils.cppand so on. Create one for your
            platform, with the correct name, and empty out all of the
            implementation so that just the empty shells of the
            methods are there (with dummy returns where needed to make
            the compiler happy.) Once you've done that, you can start
            to get it to build without any real implementation.Once you have the system building, then start
            implementing your own platform utilties methods. Follow
            the comments in the Win32 version as to what they do, the
            comments will be improved in subsequent versions, but they
            should be fairly obvious now. Once you have these
            implementations done, you should be able to start
            debugging the system using the demo programs. Other concerns are: 
        Does ICU compile on your platform? If not, then you'll need to
            create a transcoder implementation that uses your local transcoding
            services. The Iconv transcoder should work for you, though perhaps
            with some modifications.What message loader will you use? To get started, you can use the
            "in memory" one, which is very simple and easy. Then, once you get
            going, you may want to adapt the message catalog message loader, or
            write one of your own that uses local services. That is the work required in a nutshell! | 
 
 | |  |  |  |  |  |  What should I define XMLCh to be? |  |  |  |  |  | 
 |  |  | XMLCh should be defined to be a type suitable for holding a
           utf-16 encoded (16 bit) value, usually an unsigned short. All XML data is handled within Xerces-C++ as strings of
           XMLCh characters.  Regardless of the size of the
           type chosen, the data stored in variables of type XMLCh
           will always be utf-16 encoded values.  Unlike XMLCh, the  encoding
               of wchar_t is platform dependent.  Sometimes it is utf-16
               (AIX, Windows), sometimes ucs-4 (Solaris,
               Linux), sometimes it is not based on Unicode at all
               (HP/UX, AS/400, system 390).   Some earlier releases of xerce-c defined XMLCh to be the
           same type as wchar_t on most platforms, with the goal of making
           it possible to pass XMLCh strings to library or system functions
           that were expecting wchar_t paramters.  This approach has
           been abandonded because of 
              
                 Portability problems with any code that assumes that
                 the types of XMLCh and wchar_t are compatible
              Excessive memory usage, especially in the DOM, on
                  platforms with 32 bit wchar_t.
              utf-16 encoded XMLCh is not always compatible with
                  ucs-4 encoded wchar_t on Solaris and Linux.  The
                  problem occurs with Unicode characters with values
                  greater than 64k; in ucs-4 the value is stored as
                  a single 32 bit quatity.  With utf-16, the value
                  will be stored as a "surrogate pair" of two 16 bit
                  values.  Even with XMLCh equated to wchar_t, xerces will
                  still create the utf-16 encoded surrogate pairs, which
                  are illegal in ucs-4 encoded wchar_t strings.
                | 
 
 
 | 
 |