[IWE] Gruesome Perl Regular Expression Problem

Ben Tilly iwe@warhead.org.uk
Thu, 1 Dec 2005 07:03:19 -0800


On 12/1/05, Peter Whysall <peter.whysall@serco.com> wrote:
> Here's a record from my data that I want to process:
>
> ! Site
> A1M/9545A,3,S060,L,N,N,N,N, \
> 0,  ! Additional Sites \
> 1,  ! Adjacent Sites \
> M1/5191A,OAJ, \
> 2,  ! Downstream Sites \
> A1M/9553A,NDV,0, \
> M1/5200A,ODV,0, \
> N,  ! Partial Closures Prohibited \
> 1,  ! Upstream Sites \
> A1M/9537A,8,1,3,1,1,ORD,2,2,ORD,3,3,ORD \
>
> As you can see, it's a rather obscure counted format.

Repeat after me, "Regular expressions are for matching, not parsing."=20
This looks like a parsing problem.  For parsing problems you want to
use the regular expression engine for extracting pieces (if you want
the whole format, then tokens, if you just need specific information,
then you can grab bigger chunks to find what you want), then process
any complex formatting logic in code.

> I need to produce a report of which of these records has the partial
> closures prohibited flag set to Y.
>
> I'm cool with multiline and extended regexes, and I've got the matching f=
or
> the individual bits down pat.
>
> What I'm struggling to find in the documentation is how to extract the
> number of sites then match that number of sites before matching the next =
bit
> of the record.

Do a match in scalar context with the /g flag, extract the number of
sites, then go back and match again using the \G assertion and /g flag
to match the sites.

> How do I match the digit then use the value of the digit to match a numbe=
r
> of things?

You can't.

Well actually you can with embedded code assertions, but you really
don't want to go there.

That's why you should use the other approach.

Cheers,
Ben