How to remove trailing characters?

Martin_Breiner · July 1, 2019, 6:53pm

I want to remove all trailing “_” from a alphanumeric field defined as A30.

Containing for example “HELLOWORL_D__”. The rest of the field will be filled by Natural with trailing blanks of course.
Expected result should be “_HELLOWORL_D” with trailing blanks.

But how to accomplish that?

With EXAMINE I can search backward and could delete the characters, but how to tell EXAMINE that as soon any other character as “" has been found it should stop?
EXAMINE DIRECTION BACKWARD #FIELD-NAME FOR "” DELETE

With EXAMINE I can also use the pattern method, but how to tell Natural that the “" in the pattern is the to be searched character, as "” indicates a single position that is not to be examined?
EXAMINE #FIELD-NAME FOR PATTERN “*_” DELETE

Anybody got ideas on that.
Thanks!

Jerome_LeBlanc · July 1, 2019, 7:06pm

There may be a more elegant approach, but you could always use brute force:


DEFINE DATA LOCAL                        
1 FIELD (A30)  INIT <'H_LLO_WOR_D_____'> 
1 REDEFINE FIELD                         
  2 FIELD-A (A1/30)                      
1 #IX (I4)                               
END-DEFINE                               
*                                        
WRITE '=' FIELD                          
FOR.                                     
FOR #IX = 30 TO 1 STEP -1                
  DECIDE ON FIRST VALUE OF FIELD-A(#IX)  
    VALUE '_'   RESET FIELD-A(#IX)       
    VALUE ' '   IGNORE                   
    NONE VALUE  ESCAPE BOTTOM (FOR.)     
  END-DECIDE                             
END-FOR                                  
WRITE '=' FIELD                          
END

Ralph_Zbrog · July 1, 2019, 11:01pm

I will presume that Jerome’s logic is acceptable to Martin. That is,"HELLOWORL_D __" (note the embedded space in the trailing underscores) is correctly translated into "_HELLOWORL_D".

The solution is simple once you have a list of "good" characters - all values excluding space and underscore. Then, examine backwards for the first good character in the string, and reset all bytes following that position.

DEFINE DATA LOCAL
1 #TEXT (A10)      INIT <'_del_x_ _'>
1 #P (I4)
1 #GOOD (A256)     INIT <H'000102030405060708090A0B0C0D0E0F'
                       - H'101112131415161718191A1B1C1D1E1F'
                       - H'202122232425262728292A2B2C2D2E2F'
                       - H'303132333435363738393A3B3C3D3E3F'
                       - H'404142434445464748494A4B4C4D4E4F'
                       - H'505152535455565758595A5B5C5D5E5F'
                       - H'606162636465666768696A6B6C6D6E6F'
                       - H'707172737475767778797A7B7C7D7E7F'
                       - H'808182838485868788898A8B8C8D8E8F'
                       - H'909192939495969798999A9B9C9D9E9F'
                       - H'A0A1A2A3A4A5A6A7A8A9AAABACADAEAF'
                       - H'B0B1B2B3B4B5B6B7B8B9BABBBCBDBEBF'
                       - H'C0C1C2C3C4C5C6C7C8C9CACBCCCDCECF'
                       - H'D0D1D2D3D4D5D6D7D8D9DADBDCDDDEDF'
                       - H'E0E1E2E3E4E5E6E7E8E9EAEBECEDEEEF'
                       - H'F0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF'
                        >
                   1 REDEFINE #GOOD
  2 #G (B1/0:255)
END-DEFINE
EXAMINE #GOOD FOR ' ' DELETE /* SPACE      is not a good character
EXAMINE #GOOD FOR '_' DELETE /* Underscore is not a good charaacter
*
EXAMINE DIRECTION BACKWARD
        #TEXT FOR #G (0:253) /* Exclude blanks at the end of #GOOD
        GIVING POSITION #P
ADD 1 TO #P
MOVE ' ' TO SUBSTRING (#TEXT, #P)
WRITE 'after:' #TEXT
END

If “HELLOWORL_D” (with a trailing underscore due to the embedded space) is the correct solution, remove line 24 to make SPACE a good character.

Jerome_LeBlanc · July 1, 2019, 11:35pm

I knew Ralph would have a slick way to do it. Now if Steve Robinson will chime in on performance!

Steve_Robinson · July 7, 2019, 11:14pm

Been a bit hectic here; we just moved 30 miles north to escape urban blight, so response a bit late.

If Martin’s post is a complete description of the problem, I think the most efficient solution would be akin to Jerome’s solution.

Something like:

FOR.
FOR #IX = 30 TO 1 STEP -1
if FIELD-A(#IX) eq ‘_’
RESET FIELD-A(#IX)
else
ESCAPE BOTTOM (FOR.)
END-FOR
WRITE ‘=’ FIELD

In other words, if it is an underscore, change it to a blank. As soon as you see an non underscore, stop.

Helmut_Spichtinger · July 8, 2019, 3:07pm

Another solution. The expression IF #FIELD = MASK (*‘_’/) examines, whether the last character in the character string is an underscore.

0010 DEFINE DATA LOCAL
0020 1 #FIELD (A30) INIT <‘H_LLO_WOR_D X ___ __‘>
0030 END-DEFINE
0040 *
0050 WRITE ‘before:’ #FIELD
0060 REP.
0070 REPEAT
0080 IF #FIELD = MASK (*’’/)
0090 EXAMINE DIRECTION BACKWARD #FIELD FOR ‘_’ DELETE FIRST
0100 ELSE
0110 ESCAPE BOTTOM (REP.)
0120 END-IF
0130 END-REPEAT
0140 WRITE ‘after :’ #FIELD
0150 END

Output:
before: H_LLO_WOR_D _X ___ __
after : H_LLO_WOR_D _X

Ralph_Zbrog · July 8, 2019, 9:39pm

When very few underscores are present, my version is not the fastest (although pretty close), but with a reasonable number of trailing underscores, my CPU usage is half that of the other techniques. Here are the results (in CPU seconds) with 100k iterations.

Page     1                                                   07/08/19  12:19:31
 
Text: 12345678901234567890123456789012345678901234567890
 Ralph: 000.07
 Steve: 000.04
Helmut: 000.07
 
Text: 1234567890123456789012345678901234567890123456789_
 Ralph: 000.12
 Steve: 000.10
Helmut: 000.15
 
Text: 1234567890123456789012345_________________________
 Ralph: 000.73
 Steve: 001.35
Helmut: 002.33
 
Text: 1_________________________________________________
 Ralph: 001.29
 Steve: 002.61
Helmut: 004.17
 
Text: __________________________________________________
 Ralph: 001.34
 Steve: 002.68
Helmut: 004.21

As you can see in the code, I reduced the table of valid characters to those found on a US keyboard. Your mileage may vary.

DEFINE DATA LOCAL
1 #MT (I4)         CONST <5>
1 #TL (I4)         CONST <50>
1 #TEXT (A50/#MT)  INIT <'12345678901234567890123456789012345678901234567890'
                        ,'1234567890123456789012345678901234567890123456789_'
                        ,'1234567890123456789012345_________________________'
                        ,'1_________________________________________________'
                        ,'__________________________________________________'
                        >
                   (HD='Text')
                   1 REDEFINE #TEXT
  2 #OCC (#MT)
    3 #T (A1/#TL)
1 #MG (I4)         CONST <93>
1 #GOOD (A93)                          /* excludes '_'
                   INIT <'abcdefghijklmnopqrstuvqxyz'
                        -&quot;ABCDEFGHIJKLMNOPQRSTUVQXYZ&quot;
                        -'0123456789'
                        -'`~!@#$%^&*()-=+[{]}\|''&quot;;:,<.>/?'
                        >
                   1 REDEFINE #GOOD
  2 #G (B1/#MG)
1 #CPU (I4)
1 #L (I4)          CONST <100000>
1 #FOR (I4)
1 #C (I4)
1 #I (I4)
1 #J (I4)
1 #K (I4)
1 #P (I4)
END-DEFINE
FORMAT PS=30
ASSIGN #CPU = *CPU-TIME
FOR #I = 1 #L
  RESET INITIAL #TEXT (*)
END-FOR
ASSIGN #FOR = *CPU-TIME - #CPU
*
FOR #I = 1 #MT
  WRITE '=' #TEXT (#I)
  /*
  /*                                   Ralph
  ASSIGN #CPU = *CPU-TIME
  FOR #J = 1 #L
    RESET INITIAL #TEXT (*)
    EXAMINE DIRECTION BACKWARD
            #TEXT (#I) FOR #G (*)
            GIVING POSITION #P
    IF  #P <> #TL
      THEN
        ADD 1 TO #P
        MOVE ' ' TO SUBSTRING (#TEXT (#I), #P)
    END-IF
  END-FOR
  ASSIGN #C = *CPU-TIME - #CPU - #FOR
  WRITE ' Ralph:' #C (EM=999'.'99)
  /*
  /*                                   Steve
  ASSIGN #CPU = *CPU-TIME
  FOR #J = 1 #L
    RESET INITIAL #TEXT (*)
    F. 
    FOR #K = #TL 1 -1
      IF  #T (#I, #K) = '_'
          RESET #T (#I, #K)
        ELSE 
          ESCAPE BOTTOM (F.) 
      END-IF
    END-FOR
  END-FOR
  ASSIGN #C = *CPU-TIME - #CPU - #FOR
  WRITE ' Steve:' #C (EM=999'.'99)
  /*
  /*                                   Helmut
  ASSIGN #CPU = *CPU-TIME
  FOR #J = 1 #L
    RESET INITIAL #TEXT (*)
    R. 
    REPEAT 
      IF  #TEXT (#I) = MASK (*'_'/)
          EXAMINE DIRECTION BACKWARD #TEXT (#I) FOR '_' DELETE FIRST 
        ELSE 
          ESCAPE BOTTOM (R.) 
      END-IF 
    END-REPEAT
  END-FOR
  ASSIGN #C = *CPU-TIME - #CPU - #FOR
  WRITE 'Helmut:' #C (EM=999'.'99)
  SKIP 1
END-FOR
END

Steve_Robinson · July 9, 2019, 3:23am

The performance comparison shown above is hardly valid. Somehow, “my code” (which has no two dimensional arrays), has two uses of a two dimensional array (lines 64 and 65), while Ralph’s code has one use of a two dimensional array. Not surprising that when the data causes many executions of 64 & 65 Ralph’s code performs more efficiently.
Also, the original posting is for an A30 field, not an A50 field.
When I get a chance (see earlier posting about being in the middle of a Move), I will rewrite the performance comparison and post results here.

Ralph_Zbrog · July 9, 2019, 9:02am

Ralph & Helmut examine a string while Steve redefines the string as an array to interrogate individual bytes.

I added an array of strings to compare different counts of underscores, This might skew the results a bit, but even without the additional array, RalphCPU < SteveCPU < HelmutCPU when there are more than a handful of underscores.

Martin_Breiner · July 9, 2019, 10:22am

Thanks for all the good replies. I was inspired by the solution of Jerome and will be using this.

system · July 9, 2019, 11:55am

Variation of Ralph’s solution (include only "good" characters in scan table, no DELETE required)

    DEFINE DATA LOCAL  
    1 #TEXT (A10)      INIT <'_del_x_ _'>  
    1 #P (I4)  
    1 #GOOD (A62)     INIT <'abcdefghijklmnopqrstuvwxyz' 
                                           - 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
                                           - '01234567890'
                                         >
    1 REDEFINE #GOOD  
      2 #G (B1/1:62)  
    END-DEFINE  
    *  
    EXAMINE DIRECTION BACKWARD  
            #TEXT FOR #G (*)
            ABSOLUTE
            GIVING POSITION #P  
    ADD 1 TO #P  
    MOVE ' ' TO SUBSTRING (#TEXT, #P)  
    WRITE 'after:' #TEXT  
    END

George_Cooper · July 10, 2019, 1:33am

I’m glad Martin found a solution to his liking. I doubt it is the most efficient, but it is straight forward coding, easy to understand, and gets the job done.

I think what this type of problem cries out for is an EXAMINE FOR LEADING, or EXAMINE FOR … DELETE/REPLACE LEADING, like the COBOL EXAMINE statement has always had. I’ve always missed having the “LEADING” option, which could also function as a “TRAILING” option when combined with EXAMINE BACKWARD. The NATURAL developers added the “FIRST” option a while back, but it’s not quite the same. As Helmut’s excellent example illustrates, use of DELETE FIRST for this problem requires use of an extra MASK (*‘"/) test to determine if the '’ is actually a trailing character before using an EXAMINE to delete it. (Cool - I had forgotten the / in the MASK checks for a trailing pattern.) Also, a loop is required to delete or replace multiple '_'s. An EXAMINE with a “LEADING” option could do it all with just one statement. (Hint, hint, Wolfgang

I’ve been trying to test the efficiency of the first 2 versions, Jerome’s & Ralph’s, and my version similar to Helmut’s on my home PC with NaturalONE CE. Using the original (A30) “HELLOWORL_D__” test string, 10,000 iterations, my & Helmut’s version came in first, CPU=.03 sec, Ralph’s 2nd, .06 sec, and Jerome’s (& Steve’s?) 3rd at .17 sec. I need to change Ralph’s version to the smaller #GOOD array, since the NatONE Profiler showed most of the CPU time spent on the EXAMINE statement running through the #GOOD(A256) array. I expected Ralph’s to be fastest of all, but I’m not sure why his test results for Helmet’s version are so slow. More testing to come, just for grins, bump my loop count up to 100,000 for better resolution, and maybe follow Ralph’s logic to subtract out the FOR loop CPU time.

I can understand Steve’s objection to the 2nd subscript added to the character array method, probably causing extra CPU costs. Maybe Ralph’s code could be changed to rename the #TEXT (A50/5) to #TEXT-ARY (A50/5), add a 1 dimension #TEXT (A50), redefined by #T(A1/50) . Instead of RESET INITIAL #TEXT (*) at the top of each loop, just move the next occurrence, #TEXT-ARY (#J) to #TEXT.

Hope Steve’s move goes well, and he can recuperate somewhat before getting back to this. I’m curious why he prefers indexing through a redefined character array for this situation. I thought he almost always in the past had showed the EXAMINE working on a string to be the most efficient, CPU-wise.

Cheers to all,
George

Steve_Robinson · July 10, 2019, 11:54am

Hi George;

Yes, I have always advocated EXAMINE strings over other approaches, when doing things like ascertaining the number of occurrences in a long (usually more than 10 characters) string. Exceptions, which always seem to rear their head, are when you can exit a FOR loop after just a few iterations. Based on the original post, I thought that was what Martin was dealing with.

Helmut_Spichtinger · July 10, 2019, 4:09pm

If Martin’s post is a complete description of the problem, I have two other solutions:

Solution #1:


DEFINE DATA LOCAL                                                
1 #FIELD         (A30)  INIT <'_HELLOWORL_D_                 '>  
1 #POS           (I4)                                            
1 #LENGTH        (I4)                                            
END-DEFINE                                                       
*                                                                
WRITE 'before:' #FIELD                                           
IF #FIELD = MASK (*'_'/)                                         
  EXAMINE #FIELD FOR '_' WITH DELIMITERS                         
    GIVING POSITION IN #POS                                      
    GIVING LENGTH   IN #LENGTH                                   
* WRITE '=' #POS '=' #LENGTH                                     
  IF #POS > 1                                                    
    #LENGTH := #POS - 1                                          
  END-IF                                                         
  COMPRESS ' ' TO SUBSTRING(#FIELD,#LENGTH)                      
END-IF                                                           
WRITE 'after :' #FIELD                                           
END

Solution #2:


DEFINE DATA LOCAL                                                    
1 #FIELD         (A30)  INIT <'_HELLOWORL_D_________         '>      
END-DEFINE                                                           
*                                                                    
WRITE 'before:' #FIELD                                               
IF #FIELD = MASK (*'_'/)                                             
  EXAMINE DIRECTION BACKWARD #FIELD FOR '_' WITH DELIMITERS '_'      
    REPLACE ' '                                                      
/* one underscore will always remain at the end of the string        
  EXAMINE DIRECTION BACKWARD #FIELD FOR '_' REPLACE FIRST WITH ' '   
END-IF                                                               
WRITE 'after :' #FIELD                                               
END

Jerome_LeBlanc · July 10, 2019, 4:51pm

Can you stand one more solution? I can do it in two lines of code. No promises about efficiency.


DEFINE DATA LOCAL                                   
1 #FIELD (A30)  INIT <'H_LLO_WOR_D_____'>            
1 #ARRAY (A30/30)                                   
END-DEFINE                                          
*                                                   
WRITE '#FIELD:' #FIELD                              
*
SEPARATE FIELD INTO #ARRAY (*) WITH DELIMITER '_'   
COMPRESS #ARRAY (*) INTO FIELD-2 WITH DELIMITER '_' 
*
WRITE '#FIELD:' #FIELD
*                            
END

Ralph_Zbrog · July 10, 2019, 7:41pm

Nice, but I can’t imagine Martin not having an example with multiple embedded underscores.

For example, O__W starts with 2 underscores which are reduced to 1.

#FIELD: H_LLO__WOR_D_____
#FIELD: H_LLO_WOR_D

And I would expect this not to perform too well.

Helmut_Spichtinger · July 11, 2019, 9:44am

:-)

sagi_achituv1 · August 26, 2019, 12:22pm

I would just point you to the TRIM , that enables to supress leading or trailing blanks -
so in combination with Exanime statment to replace the ‘_’ char to blank - you may have a simple solution.

The TRIM function is relativly new, so you may missed it.

Sagi

Ralph_Zbrog · August 28, 2019, 5:52pm

More likely that no one mentioned *TRIM because it applies only to DYNAMIC variables, which were not involved in this thread.

sagi_achituv1 · August 29, 2019, 10:45am

*TRIM is not limited to only DYNAMIC variables, as it can be used for static varibales, as well.
see [url]https://techcommunity.softwareag.com/ecosystem/documentation/natural/nat911mf/func/func_trim.htm?hi=trim+trimmed

Topic		Replies	Views
Remove extra character Adabas-Natural , Natural , Natural-on-Mainframes	4	3696	April 2, 2021
Finding Out the Numeric Position Adabas-Natural , Natural , Natural-Code-Samples	17	42320	April 2, 2021
Doing a right trim for a string Adabas-Natural , Natural , Natural-on-Mainframes , Natural-on-Windows-Unix	4	5763	April 2, 2021
*TRIM function on Dynamic fields Adabas-Natural , Natural , Natural-on-Mainframes	5	1805	April 2, 2021
examine for pattern Adabas-Natural , Natural , Natural-on-Mainframes	18	12807	April 2, 2021

How to remove trailing characters?

Related topics