[XMLSCHEMA-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: regex help

From: Xan Gregg <xan.gregg@jmp.com>
Date: Wed, 3 Jan 2007 15:45:39 -0500
Message-Id: <F34E0C65-891D-4C78-A642-37CDC7DD6918@jmp.com>
Cc: "'Tsao, Scott'" <scott.tsao@boeing.com>, <xmlschema-dev@w3.org>
To: "Michael Kay" <mike@saxonica.com>
Re: regex help

I think Michael's list option won't work because the quoted string  
items can contain whitespace.  The all-in-one pattern requires  
trailing whitespace, so I offer the following derivative which doesn't:

   <xs:pattern value='("[^"]*"(\s+"[^"]*")*)?'/>

The final '?' is to allow the empty list (no items). Remove it if  
that is not desired.

xan

On Jan 3, 2007, at 2:57 PM, Michael Kay wrote:

>
> Looks to me something like
>
> <xs:simpleType name="quotedString">
>   <xs:restriction base="xs:string">
>     <xs:pattern value='".*"'/>
>   </xs:restriction>
> </xs:simpleType>
>
> <xs:simpleType name="listOfQuotedStrings">
>   <xs:list itemType="quotedString"/>
> </xs:simpleType>
>
> or if you don't want to use a list type,
>
> <xs:simpleType name="listOfQuotedStrings">
>   <xs:restriction base="xs:string">
>     <xs:pattern value='(("[^"]*")\s+)*'/>
>   </xs:restriction></xs:simpleType>
> </xs:simpleType>
>
> ...
>
>> -----Original Message-----
>> From: xmlschema-dev-request@w3.org
>>
>> ...
>>
>> I'm trying to design a W3C XML Schema type description for an
>> element containing an arbitrary number of quoted strings
>> separated by arbitrary whitespace.  The contents of the
>> quoted items are themselves limited to alphanumerics,
>> whitespace, and common punctuation characters, excluding
>> embedded quote characters.  (The double quote here is chosen
>> as an arbitrary delimeter and has no special significance.)
>>
>> Example:
>> "abc" "de f" "123_456"
>> "foo bar" "etc."
>>
>> I'm not aware of a "built-in" XML Schema type that can
>> support this representation directly.  It also appears that
>> the W3C XML Schema "pattern"
>> facet (allowing the specification of a regular expression for a type
>> format) does not support the "non-greedy" quantifier syntax,
>> e.g., "*?", "+?" that is common in many regular expression engines.
>>
>> Can anyone suggest a regex to define this format without the
>> non-greedy quantifiers, or perhaps an XML Schema
>> representation that can handle this format directly?
>>
>
>
Received on Wednesday, 3 January 2007 20:48:00 GMT

Subscribe to the Stylus Scoop newsletter for helpful XML tips and tutorials.
Email
First Name
Last Name
Company

Download Stylus Studio 6 XML Enterprise Edition

Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2007 All Rights Reserved.