[XMLSCHEMA-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: performance testing of schemas

From: <noah_mendelsohn@us.ibm.com>
Date: Thu, 8 Dec 2005 17:32:52 -0500
To: Bryan Rasmussen <brs@itst.dk>
Cc: "'xmlschema-dev@w3.org'" <xmlschema-dev@w3.org>
Message-ID: <OFBB37570B.62970ADA-ON852570D1.007B4E0B-852570D1.007C5B52@lotus.com>
types of performance testing

Bryan Rasmussen writes:

> I was wondering if anyone has done any comparative performance testing 
of schema validation in various processors. 

No, but I can give you some theoretical answers based both on experience 
and intuition. I think the results are going to depend a lot on the 
particular processor.  Almost all the features you list can be very well 
optimized, but doing so is not always easy.  For example, substitution 
groups can turn into an ordinary choice once you find all the schema 
documents.  For namespaces and xsi:type, it's difficult to avoid some 
backtracking, but there's a lot you can do if you try hard.   The problem 
is that it's often the possibility that you'd use these constructs that 
makes things slow.  So, it's much easier to build a fast parser that 
doesn't know how to do namespaces.  If you have a parser that's namespace 
aware it may be slower even if your particular instance doesn't use them. 
Of course, there's no limit to the goofy ways people might code a 
particular processor, so you really have to test.

Features like include/import/redefine are mostly handled as the schema 
documents are read in.  As Henry said, good processors will be capable of 
caching the result of such composition or compiling the resulting schema. 
In such cases, they shouldn't cost you anything on validations 2-n.

> effect of size of schema

Really tough to say or to benchmark well.  Most of the algorithms are 
inherently independent of the size of the overall schema, but you can lose 
locality when things get big.  If your processor cache suddenly won't hold 
the code or data structures, performance can fall off in ways that are 
hard to predict.  Similarly, in a language like Java, there might be a 
question as to whether a given implementation is doing object creation 
dynamically or statically, whether somehow you're getting extra garbage 
collection (e.g. because you created so many static objects for the schema 
that all the other dynamic stuff you're doing triggers GC more often.) So, 
you'd not only have to test different processors, you'd want to do it on 
lots of different hardware, vary the memory sizes, try different Java 
JITs, fiddle with GC and heapsize parameters, etc.  I wouldn't expect a 
simple stable curve that would apply in a large variety of cases in 
relating performance as a function of schema size or complexity.

As Henry says, compiling or composing the schema documents is in any case 
high overhead and should be considered separately.

Noah

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------








Bryan Rasmussen <brs@itst.dk>
Sent by: xmlschema-dev-request@w3.org
12/08/05 04:41 AM
 
        To:     "'xmlschema-dev@w3.org'" <xmlschema-dev@w3.org>
        cc:     (bcc: Noah Mendelsohn/Cambridge/IBM)
        Subject:        performance testing of schemas



Hey
I was wondering if anyone has done any comparative performance testing of
schema validation in various processors. 

Off-hand the metrics that I suppose would be interesting are:
effect of multiple namespaces on performance,
effect of number of includes/imports/redefines
effect of using substitution groups
effect of xsi:type
effect of size of schema
effect of number of constructs - elements/complexTypes 

How much does reuse of types effect performance.

Enumeration lists. 

any of these items under testing would be really good to know. 
Received on Thursday, 8 December 2005 22:33:04 GMT

Subscribe to the Stylus Scoop newsletter for helpful XML tips and tutorials.
Email
First Name
Last Name
Company

Download Stylus Studio 6 XML Enterprise Edition

Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2007 All Rights Reserved.