Extended MASK functionality with MASK$(<...>)

I often need to do more complex analyses of strings than MASK offers, so have decided to write a function to provide extended mask functionality. I am attaching it here because it may help others with similar tasks.
However, please be aware this has not been used in Production code yet and so should be considered as only ‘alpha’ tested.
Here’s the program header info:
(Note. I have had to convert (<…>) to (…) below because otherwise this message editor truncates all parameters but the first. Also ext .NS7 is not allowed so I renamed the attached file to .TXT)


* Function : MASK$(string,mask,[set],[start],[length])
*
* Action   : Applies a variable MASK test with extended mask chars
*
* Returns  : TRUE or FALSE   (L)
*
* Parameters: String          (A)
*
*             Mask            (A1-250)
*
*             Set             (A1-250)       Optional
*
*             Start           (Numeric)      Optional
*
*             Length          (Numeric)      Optional
*
*  String : Any string
*
*           Max length of data within the string variable is 2Gb
*
*  Mask  :  Any Natural variable mask characters
*           plus extended characters as follows:
*
*     @  =  Any letter or digit, but not space
*     ^  =  Space
*     +  =  Tab
*     |  =  Any of the delimiters Tab, Pipe, Comma or Semi-Colon.
*     -  =  Any of the date format delimiters Dash, Slash or Period.
*     {} =  Hex string eg {0D0A09} = carriage return, line feed, tab
*     ~  =  Any number (or none) of the next mask char. See Note!
*           Can be followed by m[:n] where m and n 
*           are the min and max number of the char to find:
*              ~0x is the same as ~x
*              ~1x is at least one of the mask character x
*              ~2:5x scans for 2 to 5 instances of the character x
*           Note that this test finds the longest match it can.
*           Therefore, without a max, the subsequent mask test char
*           (after x) must not be a subset of x, as this will fail,
*           since all the string x's will have been passed by the ~x.
*     ¬  =  Any one char which is NOT the next mask char. See Note!
*           Can be used with ~[m:n], eg ~1:2¬x
*           (but not directly before the ~)
*     [] =  Optional characters, to be accepted if possible.
*           Note that preceding mask chars are tested independently,
*           whereas subsequent are tested together with the optional,
*           so YY[YY] means YY, optionally followed by another YY,
*           whereas [YY]YY means either YY or YYYY.
*
*   Date elements are tested as one date 
*   until a date element is repeated.
*   Thus 'dd-mm...yy-mm-dd' means 
*   'dd-mm...yy' followed by '-mm-dd'
*
*   Quote literals with single quotes (apostrophe), bearing in mind
*   that two adjacent single quotes always represent a single quote.
*         e.g. '~"AB"' = any 'A', then a 'B'
*         '~"A""B"' = any 'A', then a single quote, then a 'B'
*         With TQ off use '' in place of " in the examples above.
*
*  Note!   With ~ and ¬ and in Set (below), 
*    you may only use mask chars which represent a single char
*    and which are not a wildcard or range.
*    For example A,N,S,P,^,+,@ and 'x' are valid;
*    likewise the same with a leading ¬
*    But .,?,*,%,~,DD,JJJ,MM,YY,0,0-0,[] and / are not valid
*    The one exception is that ¬/ is allowed at the end of the mask.
*    Results for invalid settings are unpredictable.
*
*  Set    : An optional set of mask chars,
*           represented by $ in the mask.
*           For example, the set @^ is equivalent to A.
*           $ must not be used in the mask unless Set is specified.
*           See Note above.
*
*  Start  : In  - Only look for a match from position Start
*           Out - The start position in the WHOLE string 
*                 at which the match was found,
*                 not counting a leading * or % or ~ match portion.
*
*  Length:  In  - Treat the portion of string as of length Length
*           Out - How long the first piece of string was
*                 that matched the mask,
*                 not counting a leading * or % or ~ match portion.
*                 Note. The max matched length may exceed
*                 the string's length if the mask after that point
*                 allows blanks.
*
* Mask Examples:
*
*     ¬A       Any character except a letter.
*     ~@       Any letters or digits.
*     ~¬C      Any chars which are not letters or digits or space.
*     ~^~$     Any spaces, then any chars which are in Set.
*     ¬$       Not a Set char, 
*                 e.g. if Set = '+-*/=' then any char but +-*/=
*     ~1¬@     One or more chars which are not letters or digits.
*     *A¬/     String contains a letter which is not at the end.
*     *P{0D}P  String contains a carriage return 
*              surrounded by printable characters.
*
* Call Examples:
*
*     1. IF MASK$( #String,'*|*¬|/' )
*
*           is TRUE if #String contains a delimiter,
*                      and does not end with a delimiter.
*
*     2. IF MASK$( #String,'~1:5$/','"AEIOUaeiou"' )
*
*           is TRUE if #String contains 1 to 5 vowels only.
*
*     3. IF MASK$( #String,'*DD-MM-[YY]YY',,50,20 )
*
*           is TRUE if SUBSTRING(#String,50,20) 
*                      contains a date as dd,mm,yy or yyyy
*                      separated by dashes, slashes or periods.
*
*     4. RESET #S #L
*        IF MASK$( #String,'*U*","N',,#S,#L )
*
*           is TRUE if #String contains an upper case letter
*                   followed later by a comma & digit.
*                   #S returns the position of U in the string
*                   and #L returns the length
*                   from U to first ,N inclusive.

MASK.TXT (27 KB)

1 Like

… sounds interesting. I’ll try it out next time I need it.

BTW: I think Software AG should implement some kind of regular expressions. The MASK option is not that powerful as regex in other languages.

Maybe the checkbox “Disable HTML in this post” helps…