Purging Composite Instances : Oracle SOA Suite 11g


To purge all instances in Oracle SOA Suite 11g (11.1.1.4), simply run the following PL/SQL package. Note that the underlying table structure is slightly different in 11.1.1.2 versus 11.1.1.4, so these instructions are specific to 11.1.1.4, but may or may not work on older or later versions.
sqlplus dev_soainfra/welcome1@orcl

DECLARE
  FILTER INSTANCE_FILTER := INSTANCE_FILTER();
  DELETED_INSTANCES NUMBER;
BEGIN
  FILTER.MIN_CREATED_DATE := TO_TIMESTAMP('2001-01-01','YYYY-MM-DD');
  FILTER.MAX_CREATED_DATE := SYSDATE;
  DELETED_INSTANCES := FABRIC.DELETE_ALL(FILTER, 40000000, true);
END;
/

Applicable Versions:

·                     Oracle SOA Suite 11g (11.1.1.4)

Performance Tuning :- Oracle SOA Suite 11g


When installed out-of-the-box, Oracle SOA Suite 11g performs adequately for a development environment. In this post, I describe the key areas you need to focus on to improve your performance.

I have conducted light and heavy load tests on both synchronous and asynchronous Mediator services (not BPEL yet), and my findings and recommendations are documented below.
 

In summary, here are my findings:

·                     Upgrading to PS3 addresses memory instability issues
·                     Size your JVM appropriately
·                     Moving from Sun JDK to JRockit results in a 32% performance improvement
·                     Increasing the Mediator worker threads results in a 30% performance improvement for async services
·                     Changing the audit level from Development to Production results in a 46% performance improvement
·                     Changing the audit level from Production to Off results in a further 61% performance improvement
·                     Tuning the audit configuration causes Production and Off audit levels to perform equally the same
·                     Implementing parallel processing of routing rules may improve performance of Mediator anywhere from 4% to 509%
·                     I recommend using parallel garbage collection, but don't have statistics


1. Apply the PS3 patchset

I cannot stress this enough. Upgrade to SOA Suite PS3 (11.1.1.4). This resolves an enormous amount of memory issues and will save you a lot of pain.

2. Determine the size needed for your Java Heap Space

If you are running AIA Foundation Pack, then realistically you will need to set your Java Heap Space from 6GB to 10GB, depending on the amount of EBOs you are loading and the amount of composites you have. If you are running just SOA Suite, then you could manage with 2GB to 4GB.

1. Start up your SOA server, and wait until all composites are loaded2. Log in to the WebLogic Admin Console (at http://soaserver:7001/console)
3. Navigate to
 Home --> Servers --> soa_server1 --> Monitoring --> Performance
4. Click on "
Garbage Collect"
5. Inspect the
 Heap Free Current
6. Though every environment is different, increase your heap space until you at least have 2 GB available

The Java Heap Space is configured in
$MW_HOME/user_projects/domains/soa_domain/bin/setSOADomainEnv.sh as follows:
PORT_MEM_ARGS="-Xms6144m -Xmx6144m ...

3. Set your PermSize (for Sun JDK only)

I recommend a min and max of either 1GB or 1.5GB for your PermSize.

The PermSize is configured in
 $MW_HOME/user_projects/domains/soa_domain/bin/setSOADomainEnv.sh as follows:
PORT_MEM_ARGS="... -XX:PermSize=1024m -XX:MaxPermSize=1024m ...

4. Set your Nursery Size (for JRockit only)

Generally speaking, the Nursery Size should be around 30% of your Java Heap Space.

The Nursery Size is configured in
 $MW_HOME/user_projects/domains/soa_domain/bin/setSOADomainEnv.shas follows:
PORT_MEM_ARGS="... -Xns2048m ...

5. Use parallel garbage collection (for Sun JDK only)

Garbage collection (and some other recommended settings) is configured in
$MW_HOME/user_projects/domains/soa_domain/bin/setSOADomainEnv.sh as follows:
PORT_MEM_ARGS="... -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+ExplicitGCInvokesConcurrent -XX:-TraceClassUnloading ...

6. Use JRockit instead of Sun JDK

My performance testing has shown an average of
 32% performance improvement when using JRockit versus Sun JDK.

7. Modify and tune the Audit Level

Adjusting the Audit Level and tuning it is necessary.

1. Log in to the Fusion Middleware Console (at http://soaserver:7001/em)
2. Navigate to
 Farm_soa_domain --> SOA --> (right-click on) soa-infra --> SOA Administration --> Common Properties --> More SOA Infra Advanced Configuration Properties...
3. Click on "
Audit Config"
4. Set the following values and click on "
Apply" afterwards
AuditConfig/compositeInstanceStateEnabled = false    <-- you must set this for it to work!

AuditConfig/level =
 Production
AuditConfig/policies/Element_0/isActive =
 false
AuditConfig/policies/Element_0/name =
 Immediate
AuditConfig/policies/Element_1/isActive =
 true
AuditConfig/policies/Element_1/name =
 Deferred
AuditConfig/policies/Element_1/properties/Element_0/name =
 maxThreads
AuditConfig/policies/Element_1/properties/Element_0/value =
 10
AuditConfig/policies/Element_1/properties/Element_1/name =
 flushDelay
AuditConfig/policies/Element_1/properties/Element_1/value =
 5000
AuditConfig/policies/Element_1/properties/Element_2/name =
 batchSize
AuditConfig/policies/Element_1/properties/Element_3/value =
 100
My performance tests have shown that changing from Development to Production results in an average of 46% performance improvement (without tuning the audit settings).

My performance tests have shown that changing from
 Production to Off results in an average of  61% performance improvement (without tuning the audit settings). 

My performance tests have shown that if you apply the audit tuning described above, both
 Production and Off perform the same, effectively translating to a 61% performance improvement of the Production audit level.

8. Modify Mediator Service Engine properties (for Mediator only)

Manipulating the threads will have a direct impact on the ability of the server to process asynchronous Mediator services.

1. Log in to the Fusion Middleware Console (at http://soaserver:7001/em)
2. Navigate to
 Farm_soa_domain --> SOA --> (right-click on) soa-infra --> SOA Administration --> Mediator Properties
3. Edit the settings as follows:
Metrics Level:                                        Disabled
Parallel Worker Threads:                        12              <-- experiment increasing this value
Parallel Maximum Rows Retrieved:         600            <-- set to 50x the above setting
Parallel Locker Sleep Thread:                 1                <-- reduces waits for parallel routing rules
My performance tests have shown that changing from 4 worker threads and  200 max rows to 12 worker threads and 600 rows results in an average of 30% performance improvement for asynchronous Mediator services and 0% performance improvement for synchronous Mediator services.

9. Reduce soa-infra Log Levels

Unless required, reduce the SOA Suite Log Levels to error.

1. Log in to the Fusion Middleware Console (at http://soaserver:7001/em)
2. Navigate to
 Farm_soa_domain --> SOA --> (right-click on) soa-infra --> Logs --> Log Configuration
3. Set all Log Levels to
 ERROR:1 (SEVERE)
4. This must be repeated for all soa_server managed servers in your cluster
 

My performance tests have shown that reducing this from
 NOTIFICATION:1 (INFO) to ERROR:1 (SEVERE) results in an average of  7% performance improvement.

10. Perform Database Tuning

These are the database parameters I used in a large scale implementation. However, remember that they are based on the database AWR reports, so you may not want to just go ahead and apply them blindly. They are here for your reference and have resulted in some improvement in our environment.
AUDIT_TRAIL            = NONE
DB_FILES               = 1024
DISK_ASYNCH_IO         = TRUE 
FILESYSTEMIO_OPTIONS   = SETALL
JOB_QUEUE_PROCESSES    = 10
LOG_BUFFER             = 209715200 
NLS_SORT               = BINARY
SESSION_CACHED_CURSORS = 500
PROCESSES              = 1500
SESSION_MAX_OPEN_FILES = 50
UNDO_MANAGEMENT        = AUTO
PLSQL_CODE_TYPE        = NATIVE
MEMORY_TARGET          = 0
SGA_TARGET             = 6g 
PGA_AGGREGATE_TARGET   = 2g 
TRACE_ENABLED          = FALSE

11. Modify BPEL Process Manager Properties (for BPEL only)

Manipulating the threads will have a direct impact on the ability of the server to process asynchronous BPEL processes.

1. Log in to the Fusion Middleware Console (at http://soaserver:7001/em)
2. Navigate to
 Farm_soa_domain --> SOA --> (right-click on) soa-infra --> SOA Administration --> BPEL Properties
3. Edit the settings as follows:
Dispatcher System Threads:       10            <-- increase at least from the default value of 2 to 10
Dispatcher Invoke Threads:         20            <-- depends if your targets can handle the load
Dispatcher Engine Threads:        30            <-- should not exceed the sum of the two values above
Payload Validation:                    Disabled
Honestly, these numbers will vary based on the types of process designs you have.


Applicable Versions:

·                     Oracle SOA Suite 11g (11.1.1.4)

Oracle SOA - BPEL Master & Detail Co-ordination Using Signals



Introduction:

A BPEL process can communicate with another BPEL process just like it can communicate with any Web Service – as BPEL processes expose WebService interfaces to the world – or at least to their fellow components in the same Composite Application. When one process – the master in this discussion – calls another one – it can have several types of interaction and dependency on that other process – we will call it the detail process for the purpose of this article: 
  • it is not interested at all in the detail process – its call was a one-way fire and forget
  • it is interested in the response and it will wait for the response before it can continue processing (synchronous calls will always do this, asynchronous calls could have some activity going on while the detail process is churning away)
  • it is interested in the fact that the detail process has reached a certain stage – but it does not actually need a response (it wants a signal but no data)
The Signal and ReceiveSignal activities are Oracle extensions to BPEL – that only work on the Oracle BPEL engine – that help us to implement the third scenario.

As part of the Invoke activity from a BPEL process to another process, we can specify that the called process should be considered a Detail process (and therefore the calling process as the Master process). When we have established this Master-Detail relationship, we can next create a Signal-ReceiveSignal connection between the two. These connections can be created in both directions: the master sends a signal to the Detail (and the Detail waits to receive the signal) and vice versa the Detail process sends a signal that the master is waiting for. Unfortunately, as we will see in this article, we cannot have multiple such interactions between a master-and-Detail pair.

Typical use cases for the signal pattern are situations where a master process can only proceed when detail processes have completed or at least reached a certain state (the master process should only send the email to the customer when the detail process has handed the order to the shipping department) or when a master process calls a detail process to start processing and then needs to do some additional work before the detail process(es) can continue to their next step (master process asks detail to start gathering quotes from car rental companies, than continues to establish the credit-worthiness of the customer and when that has been taken care of indicates to the detail process that it may continue processing).

Note: there is nothing signal .and receiveSignal can do that we cannot also achieve using asynchronous, correlation driven calls. However, when we can achieve our goals using signaling, it is usually much easier to implement and lighter-weight to execute than the full blown correlation based solution.

Implementing BPEL Master-and-Detail processes:

The requirement is our case is that the Master process invokes the detail process to have it start its work. Then the Master performs some additional work and when that is done it signals the detail process to indicate that it may perform the next step of its work. Then the master has to wait for the completion of that work in the detail process. The detail process informs the master of the fact that it is done and the master can continue processing.

We can implement this interaction using Signal and ReceiveSignal.





Step 1:- The Invoke activity that has the check box Invoke as Detail checked. This results in a correlation being created in the BPEL engine between the Master process instance that makes the call and the Detail process instance that is being called.
               partnerLink="DetailProcess.detailprocess_client"
               portType="ns1:DetailProcess" operation="process"
               bpelx:invokeAsDetail="true"/>

Step 2:- The Signal from the Master to the associated Detail instance – allowing the Detail process to continue processing.
               for="'PT30S'"/>

Step 3:- The Receive Signal [from Detail process] in the Master: the master will at that point sit and wait until the signal is received from the correlated Detail process instance.

In this example is the DoImportantMasterStuffThatShouldBeCompleteBeforeDetailProceedsactivity a wait activity that lasts for 30 seconds. The Detail process is implemented as shown in this figure: 



In this case, the 1 indicates the initial creation of the Detail process instance – associated with a specific master instance. The detail process is a one-way service – no response is sent to the master process that invokes it.

The detail process starts with a first activity, ExecuteFirstStagesOfDetailProcessThatIsIndependentOfMaster.

Then at step 2: it reaches a stage where it has to have a signal from its master that is may continue processing. The ReceiveSignal activity is configured to listen for a signal from the master process. While it is waiting, processing stops in the detail process.
                     label="primary"
                     from="master"/>
When the signal has been received, the detail process continues with DoYourDetailThings – a wait activity in this demo that lasts for 10 seconds. When the DoYourDetailThings step it is complete, the detail process sends out a signal to its master.

step 3: Signal_to_master_that_detail_is_done - and continues processing. When we run the Master process, we also get an instance of the detail process. The message flow trace in the SOA console looks something like this:





What is not supported?

I would like to be able to have the master send multiple signals to the same detail process and vice versa. For example to have the master clear the detail process for phase 1, phase 2 and phase 3 of what it does for the master. And something similar for the master – to have t waits for the detail process to reach certain stages in its execution. Here somewhat sophisticated handshake between master and detail. Like the one shown in the next picture: Unfortunately, this does not work.

We cannot have a second signal sent to a detail process – only one Signal (and ReceiveSignal) can be associated with each Invoke of a detail process.
I tried to implement the requirement that the Master process invokes the detail, then performs some additional work and then signals the detail that it may perform stage one. Then the master has to wait for the completion of stage one in the detail process. (So far it all works fine).
These next two additional signal exchanges are not allowed: Then the master does some additional work and subsequently clears the detail process for the execution of its stage 3. The master next waits for a signal from the detail that it has completed stage 2.

Resources:-
Download JDeveloper 11g Application: MasterDetail.zip.
SOA Suite Developer's – Chapter 15




XML Parsers Performance Analysis in Java: - JAXB vs STax vs Woodstox


Introduction

Last couple of weeks I started working on how to deal with large amounts of XML data in a resource-friendly way considering performance and other factors. The main problem that I was looking to solve was how to process large XML files to manipulate the data in chunks while at the same time providing upstream/downstream systems with some data to process.
Of course, for a long time we have been using JAXB technology; the main advantage of using JAXB is the quick time-to-market; if one possesses an XML schema, there are tools out there to auto-generate the corresponding Java domain model classes automatically (Eclipse Indigo, Maven jaxb plugins, ant tasks, to name a few). The JAXB API then offers a Marshaller and an Unmarshaller to write/read XML data, mapping the Java domain model.
The JAXB as a solution has a disadvantage that JAXB keeps the whole objectification of the XML schema in memory, so the obvious question was: "How would our infrastructure cope with large XML files (e.g. in my case with a number of elements > 10,000) if we were to use JAXB?” I could have simply produced a large XML file, then a client for it and find out about memory consumption.
As one probably knows there are mainly two approaches to processing XML data in Java: DOM and SAX. With DOM, the XML document is represented into memory as a tree; DOM is useful if one needs cherry-pick access to the tree nodes or if one needs to write brief XML documents. On the other side of the spectrum there is SAX, an event-driven technology, where the whole document is parsed one XML element at the time, and for each XML significative event,  callbacks are "pushed" to a Java client which then deals with them (such as START_DOCUMENT, START_ELEMENT, END_ELEMENT, etc). Since SAX does not bring the whole document into memory but it applies a cursor like approach to XML processing it does not consume huge amounts of memory. The drawback with SAX is that it processes the whole document start to finish; this might not be necessarily what one wants for large XML documents. In my scenario, for instance, I'd like to be able to pass to downstream systems XML elements as they are available, but at the same time maybe I'd like to pass only 100 elements at the time, implementing some sort of pagination solution. DOM seems too demanding from a memory-consumption point of view, whereas SAX seems to coarse-grained for my needs. 
I remembered reading something about STax, a Java technology which offered a middle ground between the capability to pull XML elements (as opposed to pushing XML elements, e.g. SAX) while being RAM-friendly. I then looked into the technology and decided that STax was probably the compromise I was looking for; however I wanted to keep the easy programming model offered by JAXB for manipulating the data of the xml elements, so I really needed a combination of the two. While investigating STax, I came across Woodstox;  this open source project promises to be a faster XML parser than many others, so I decided to include it in my benchmark as well.  I now had all elements to create a performance analysis to give me memory consumption and processing speed metrics when processing large XML documents.
The Performance Analysis plan
In order to create a benchmark (performance analysis) I needed to do the following:
  • Create an XML schema which defined my domain model. This would be the input for JAXB to create the Java domain model
  • Create three large XML files representing the model, with 1,000 / 10,000 / 100,000 / 1,000,000 elements respectively
  • Have a pure JAXB client which would unmarshall the large XML files completely in memory
  • Have a STax/JAXB client which would combine the low-memory consumption of SAX technologies with the ease of programming model offered by JAXB
  • Have a Woodstox/JAXB client with the same characteristics of the STax/JAXB client (in few words I just wanted to change the underlying parser and see if I could obtain any performance boost)
  • Record both memory consumption and speed of processing (e.g. how quickly would each solution make XML chunks available in memory as JAXB domain model classes)
  • Make the results available graphically, since, as we know, one picture tells one thousand words.

 The Domain Model XML Schema




    
        
            
            
            
            
            
            
            
        
        
    

    
        
            
        
    

    
    



I decided for a relatively easy domain model, with XML elements representing people, with their names and addresses. I also wanted to record whether a person was active.


JAXB to create the Java model

I wanted to use Maven to utilize the ease it provides build systems. This is the POM I defined for this little benchmark program:


    4.0.0

     com.poc.xmlparser.example
    large-xml-parser
    1.0
    jar

    large-xml-parser
    http://www.jemos.co.uk

    
        UTF-8
    

    
        
            
                org.apache.maven.plugins
                maven-compiler-plugin
                2.3.2
                
                    1.6
                    1.6
                
            
            
                org.jvnet.jaxb2.maven2
                maven-jaxb2-plugin
                0.7.5
                
                    
                        
                            generate
                        
                    
                
                
                    ${basedir}/src/main/resources
                    
                        **/*.xsd
                    
                    true
                    
                        -enableIntrospection
                        -XtoString
                        -Xequals
                        -XhashCode
                    
                    true
                    true
                    
                        
                            org.jvnet.jaxb2_commons
                            jaxb2-basics
                            0.6.1
                        
                    
                
            
            
                org.apache.maven.plugins
                maven-jar-plugin
                2.3.1
                
                    
                        
                            true
                             com.poc.xmlparser.tests.xml.XmlPullBenchmarker
                        
                    
                
            
            
                org.apache.maven.plugins
                maven-assembly-plugin
                2.2
                
                    ${project.build.directory}/site/downloads
                    
                        src/main/assembly/project.xml
                        src/main/assembly/bin.xml
                    
                
            
        
    

    
        
            junit
            junit
            4.5
            test
        
        
            uk.co.jemos.podam
            podam
            2.3.11.RELEASE
        
        
            commons-io
            commons-io
            2.0.1
        
        
        
            com.sun.xml.bind
            jaxb-impl
            2.1.3
        
        
            org.jvnet.jaxb2_commons
            jaxb2-basics-runtime
            0.6.0
        
        
            org.codehaus.woodstox
            stax2-api
            3.0.3
        
    



Just few things to notice about this pom.xml.
  • I use Java 6, since starting from this version Java contains all the XML libraries for JAXB, DOM, SAX and STax. 
  • To auto-generate the domain model classes from the XSD schema, I used the excellent maven-jaxb2-plugin, which allows, amongst other things, to obtain POJOs with toString, equals and hashcode support.
The POM also has the declaration for the jar plug-in, to create an executable jar for the benchmark program and the assembly plug-in to distribute an executable version of the benchmark program. The source code for the analysis attached to this post, so if you want to build it and run it yourself, just unzip the project file, open a command line and run:
$ mvn clean install assembly:assembly
This command will place *-bin.* files into the folder target/site/downloads. To run the benchmark program use (-Dcreate.xml=true will generate the XML files. Don't pass it if you have these files already, e.g. after the first run):
$ java -jar -Dcreate.xml=true large-xml-parser-1.0.jar

Test Data Creation

To successfully run this program I needed test data for which I used PODAM, a Java tool to auto-fill POJOs and JavaBeans with data. The code is as simple as:

JAXBContext context = JAXBContext
                .newInstance("example.xmlparser.poc.com.large_file");

        Marshaller marshaller = context.createMarshaller();
        marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
        marshaller.setProperty(Marshaller.JAXB_ENCODING, "UTF-8");

        PersonsType personsType = new ObjectFactory().createPersonsType();
        List persons = personsType.getPerson();
        PodamFactory factory = new PodamFactoryImpl();
        for (int i = 0; i < nbrElements; i++) {
            persons.add(factory.manufacturePojo(PersonType.class));
        }

        JAXBElement toWrite = new ObjectFactory()
                .createPersons(personsType);

        File file = new File(fileName);
        BufferedOutputStream bos = new BufferedOutputStream(
                new FileOutputStream(file), 4096);

        try {
            marshaller.marshal(toWrite, bos);
            bos.flush();
        } finally {
            IOUtils.closeQuietly(bos);
        }




The XmlPullBenchmarker generates three large XML files under ~/xml-benchmark:
  • large-person-10000.xml (Approx 3M)
  • large-person-100000.xml (Approx 30M)
  • large-person-1000000.xml (Approx 300M)
Each file looks like the following:


    
        oEfblFDpgh
        HKOOm6SdqG
        _q7o2rOY7g
        2uoBmuwHCp
        zkXRzrnBR3
        JFFgyz3p06
        ejAFKehS2P
    
    
        ha2qkY3An3
        Ql8Jb4D6Tu
        2wEio0AEDE
        nahAfgIiWS
        5c7s8jirqY
        kr1rA7jLbH
        QyPtAomZBI
    

    [..etc]





Each file contains 1,000 / 10,000 / 100,000 / 1,000,000 <person> elements.

The Execution environments

This benchmark program is run on different platforms / environments to analyze the results:
  • Ubuntu 10, 32-bit running as Virtual Machine on a Windows XP, with CPU Core 2 Duo, P8400 @2.26GHz, 4GB RAM of which 2GB dedicated to the VM. JVM: 1.6.0_25, Hotspot
  • Windows XP, hosting the above VM, therefore with same processor. JVM, 1.6.0_24, Hotspot
  • Ubuntu 10, 32-bit, 2GB RAM, dual core. JVM, 1.6.0_24, OpenJDK


Strategy for XML unmarshalling

For the XML unmarshalling, three different strategies are used:
  • Pure JAXB
  • STax + JAXB
  • Woodstox + JAXB

Pure JAXB unmarshalling

The code to unmarshall large XML files using JAXB is below:

private void readLargeFileWithJaxb(File file, int nbrRecords)
            throws Exception {

        JAXBContext ucontext = JAXBContext
                .newInstance("example.xmlparser.poc.com.large_file");
        Unmarshaller unmarshaller = ucontext.createUnmarshaller();

        BufferedInputStream bis = new BufferedInputStream(new FileInputStream(
                file));

        long start = System.currentTimeMillis();
        long memstart = Runtime.getRuntime().freeMemory();
        long memend = 0L;

        try {
            JAXBElement root = (JAXBElement) unmarshaller
                    .unmarshal(bis);

            root.getValue().getPerson().size();

            memend = Runtime.getRuntime().freeMemory();

            long end = System.currentTimeMillis();

            LOG.info("JAXB (" + nbrRecords + "): - Total Memory used: "
                    + (memstart - memend));

            LOG.info("JAXB (" + nbrRecords + "): Time taken in ms: "
                    + (end - start));

        } finally {
            IOUtils.closeQuietly(bis);
        }

    }



I also accessed the size of the underlying PersonType collection to "touch" in memory data. BTW, debugging the application showed that all 10,000 elements were indeed available in memory after this line of code.


JAXB + STax

For STax, XMLStreamReader is used to iterate through all <person> elements, and pass each in turn to JAXB to unmarshall it into a PersonType domain model object. The code follows:

        // set up a StAX reader
        XMLInputFactory xmlif = XMLInputFactory.newInstance();
        XMLStreamReader xmlr = xmlif
                .createXMLStreamReader(new FileReader(file));

        JAXBContext ucontext = JAXBContext.newInstance(PersonType.class);

        Unmarshaller unmarshaller = ucontext.createUnmarshaller();

        long start = System.currentTimeMillis();
        long memstart = Runtime.getRuntime().freeMemory();
        long memend = 0L;

        try {
            xmlr.nextTag();
            xmlr.require(XMLStreamConstants.START_ELEMENT, null, "persons");

            xmlr.nextTag();
            while (xmlr.getEventType() == XMLStreamConstants.START_ELEMENT) {

                JAXBElement pt = unmarshaller.unmarshal(xmlr,
                        PersonType.class);

                if (xmlr.getEventType() == XMLStreamConstants.CHARACTERS) {
                    xmlr.next();
                }
            }

            memend = Runtime.getRuntime().freeMemory();

            long end = System.currentTimeMillis();

            LOG.info("STax - (" + nbrRecords + "): - Total memory used: "
                    + (memstart - memend));

            LOG.info("STax - (" + nbrRecords + "): Time taken in ms: "
                    + (end - start));

        } finally {
            xmlr.close();
        }

    }



Note that this time when creating the context, I had to specify that it was for the PersonType object, and when invoking the JAXB unmarshalling I had to pass also the desired returned class type, with:
JAXBElement<PersonType> pt = unmarshaller.unmarshal(xmlr,
                        PersonType.class);
Note that I don't do anything with the object; just create it, to keep the benchmark as truthful and possible by not introducing any unnecessary steps.


JAXB + Woodstox

For Woodstox, the approach is very similar to the one used with STax. In fact Woodstox provides a STax2 compatible API, so all I had to do was to provide the correct factory and...bang! I had Woodstox under the cover working.

    private void readLargeXmlWithFasterStax(File file, int nbrRecords)
            throws FactoryConfigurationError, XMLStreamException,
            FileNotFoundException, JAXBException {

        // set up a Woodstox reader
        XMLInputFactory xmlif = XMLInputFactory2.newInstance();
        XMLStreamReader xmlr = xmlif
                .createXMLStreamReader(new FileReader(file));

        JAXBContext ucontext = JAXBContext.newInstance(PersonType.class);

        Unmarshaller unmarshaller = ucontext.createUnmarshaller();

        long start = System.currentTimeMillis();
        long memstart = Runtime.getRuntime().freeMemory();
        long memend = 0L;

        try {
            xmlr.nextTag();
            xmlr.require(XMLStreamConstants.START_ELEMENT, null, "persons");

            xmlr.nextTag();
            while (xmlr.getEventType() == XMLStreamConstants.START_ELEMENT) {

                JAXBElement pt = unmarshaller.unmarshal(xmlr,
                        PersonType.class);

                if (xmlr.getEventType() == XMLStreamConstants.CHARACTERS) {
                    xmlr.next();
                }
            }

            memend = Runtime.getRuntime().freeMemory();

            long end = System.currentTimeMillis();

            LOG.info("Woodstox - (" + nbrRecords + "): Total memory used: "
                    + (memstart - memend));

            LOG.info("Woodstox - (" + nbrRecords + "): Time taken in ms: "
                    + (end - start));

        } finally {
            xmlr.close();
        }

    }



In the above code, I pass STax2 XMLInputFactory. This uses the Woodstox implementation.

The Java main loop

When all the required files are generated and are in place (you obtain this by passing -Dcreate.xml=true as mentioned above), the main performs the following:

            System.gc();
            System.gc();

            for (int i = 0; i < 10; i++) {

                main.readLargeFileWithJaxb(new File(OUTPUT_FOLDER
                        + File.separatorChar + "large-person-1000.xml"), 1000);
                main.readLargeFileWithJaxb(new File(OUTPUT_FOLDER
                        + File.separatorChar + "large-person-10000.xml"), 10000);
                main.readLargeFileWithJaxb(new File(OUTPUT_FOLDER
                        + File.separatorChar + "large-person-100000.xml"),
                        100000);
                main.readLargeFileWithJaxb(new File(OUTPUT_FOLDER
                        + File.separatorChar + "large-person-1000000.xml"),
                        1000000);

                main.readLargeXmlWithStax(new File(OUTPUT_FOLDER
                        + File.separatorChar + "large-person-1000.xml"), 1000);
                main.readLargeXmlWithStax(new File(OUTPUT_FOLDER
                        + File.separatorChar + "large-person-10000.xml"), 10000);
                main.readLargeXmlWithStax(new File(OUTPUT_FOLDER
                        + File.separatorChar + "large-person-100000.xml"),
                        100000);
                main.readLargeXmlWithStax(new File(OUTPUT_FOLDER
                        + File.separatorChar + "large-person-1000000.xml"),
                        1000000);

                main.readLargeXmlWithFasterStax(new File(OUTPUT_FOLDER
                        + File.separatorChar + "large-person-1000.xml"), 1000);
                main.readLargeXmlWithFasterStax(new File(OUTPUT_FOLDER
                        + File.separatorChar + "large-person-10000.xml"), 10000);
                main.readLargeXmlWithFasterStax(new File(OUTPUT_FOLDER
                        + File.separatorChar + "large-person-100000.xml"),
                        100000);
                main.readLargeXmlWithFasterStax(new File(OUTPUT_FOLDER
                        + File.separatorChar + "large-person-1000000.xml"),
                        1000000);
            }



First, it invites the GC to run although as we all know this is at the GC Thread discretion. Then it executes each strategy 10 times, to normalize RAM and CPU consumption. The final data are then collected by running an average on the ten runs.


The Performance Analysis benchmark results

Below are the diagrams which show memory consumption across the different running / execution environments, when unmarshalling 1,000 / 10,000 / 100,000 / 1,000,000 files respectively.
You will probably notice that memory consumption for STax related strategies often show a negative value. This means that there was freer memory after unmarshalling all elements than there was at the beginning of the unmarshalling loop; this, in turn, suggests that the GC ran a lot more with STax than with JAXB. This is logical if one thinks about it; since with STax we don't keep all objects into memory there are more objects available for garbage collection. In this particular case I believe the PersonType object created in the while loop gets eligible for GC and enters the young generation area and then it gets reclamed by the GC. This, however, should have a minimum impact on performance, since we know that claiming objects from the young generation space is done very efficiently.

Performance in Windows Environment:-


 
 
 
 
Performance in Ubuntu Environment:-

 
 
 

Conclusion

From the above analysis, the results on all three different environments, although with some differences, tell us the same story:
  • If you are looking for performance (e.g. XML unmarshalling speed), choose JAXB
  • If you are looking for low-memory usage (and are ready to sacrifice some performance speed), then use STax. 
Based on the above Test Scenario (for my personal need) my opinion is that I wouldn't go for Woodstox, but I'd choose either JAXB (if I needed processing speed and could afford the RAM) or STax (if I didn't need top speed and was low on infrastructure resources). Both these technologies are Java standards and part of the JDK starting from Java 6.

Reference

Source Code: Dowload large-xml-parser-1.0-project.zip

Executable: Download large-xml-parser-1.0-bin.zip

Data files: Download Jaxb vs Stax vs Woodstox.zip