Variable Delimited dataset

I have a flat file which is comma delimited .The length of the file is so huge that it cannot be directly fed in to single variable and has to be read through a group variable so i cannot use SEPARATE command.This is the first concern to tackel that i used COBOL .
The next issue is that being a comma delimited file in one the fields the client is sending a comma as a part of the field due to whicih the that field is gettin splitted up in two parts: i could declare a group variable for that field if that comma was consistent but it is NOT. They are not agreeing to change their layout and ask to recitfy the data on coders end.Also that field i enclosed in " that is case1 :,“field,field”, case2: ," field field", hence the scenario is not consistent.Also from there end the reply is to use the " as an escape character.
How to run a precode processing for this file to remove the , between the field ? And no client lienancy expected believe me i tried to level best :cry:

Education, education, education.

First, if you define a variable as DYNAMIC one variable should suffice. Yes, you can SEPARATE a DYNAMIC variable.

Next problem. Is this the only field that has the quotes? If so, you could first do a SEPARATE with delimiter ‘"’, or just use a Hex delimiter for the quote.

This would give you three “pieces” as output. The string before the field, the field itself, and the string after the field.

Now do a SEPARATE on the first and third pieces. Move the second piece into the field. Presto, you are done.

steve

Hi Aseem,

You can proceed with Steve’s suggestion. Below are my few cents:

I assume you run latest Natural version. If so,

  1. You can define a single alphanumeric variable (max length 1GB now if I am not wrong) to read records from the file. This shouldn’t be an issue.
  2. Then you can use SEPARATE … INTO … WITH DELIMITER
  3. Then COMPRESS the two (where you expect unnecessary delimiters in input) back into single.

This should resolve your issue.

Hope this helps!.

Hi Ats;

Your procedure will not work.

Step 1 is fine.

Step 2 & 3, if I read the problem correctly, will not work.

As Aseem wrote, there is no indicator as to whether there is:

case1 :,“field,field”, or case2: ," field field",

If Aseem does your Step 2, then he must do a Step 2A to search for quotes. That is why I asked (no answer as yet) whether the “unusual field” is the only one that can have a quote mark.

If not, then there must be some criteria to allow this field to by identified.

If yes, either before the SEPARATE with a comma delimiter (my code) there must be a resolution of the unusual field, or, after the SEPARATE (your code) there must be a resolution of the field.

steve

What do you think about this?


0010 DEFINE DATA LOCAL
0020 1 #A      (A) DYNAMIC
0030 1 #POS1   (I4)
0040 1 #LEN    (I4)
0050 1 #FIELDS (A50/100)
0060 END-DEFINE
0070 *
0080 MOVE 'abc,def,ghi,"hkl, mno",pqr,stu,"vwx","yz,abc"'   TO #A
0090 *
0091 PRINT '=' #A
0092 *
0100 REPEAT
0110   EXAMINE #A FOR '"' GIVING POSITION #POS1
0120   IF #POS1 = 0
0130     ESCAPE BOTTOM
0140   END-IF
0150   EXAMINE SUBSTR (#A,#POS1) FOR '"' DELETE FIRST
0160   EXAMINE SUBSTR (#A,#POS1) FOR '"' GIVING POSITION #LEN
0170   EXAMINE SUBSTR (#A,#POS1,#LEN) FOR ',' REPLACE WITH H'FF'
0180   EXAMINE SUBSTR (#A,#POS1) FOR '"' DELETE FIRST
0190 END-REPEAT
0200 *
0210 SEPARATE #A INTO #FIELDS (*) WITH DELIMITERS ','
0220 EXAMINE #FIELDS (*) FOR H'FF' REPLACE WITH ','
0230 *
0240 DISPLAY (ES=ON) #FIELDS (*)
0250 *
0260 END

I acknowledge that Aseem posted this question under the category of Natural on Mainframes, but CSV file processing should be left to the PC platform. To handle all the strange combinations of commas, semi-colons, double quotes, and other special characters, he’ll spend much too much time creating a mainframe program that uses much too much CPU time.

A simpler solution (and less costly when you consider the programming effort), is to purchase a copy of Natural for Windows.

With the Configuration Utility, define the WORK file as CSV with a comma delimiter (semi-colon is the default). The Natural program is very simple:

DEFINE DATA LOCAL 
1 #W 
  2 #A (A3) 
  2 #B (A3) 
  2 #C (A3) 
  2 #D (A3) 
  2 #E (A3) 
END-DEFINE 
DEFINE WORK FILE 1 'C:\csv.txt' TYPE 'CSV' 
READ WORK 1 #W 
  DISPLAY #W 
END-WORK 
END

Here’s the WORK file:

a,b,"c","d,D",e 
001,002,003,004,"X,Y"

Here’s the output:

Page     1                                                   03/23/11  12:14:51 
  
        #W 
  
#A  #B  #C  #D  #E 
--- --- --- --- --- 
  
a   b   c   d,D e 
001 002 003 004 X,Y

No DYNAMIC variables, no SEPARATEs, no EXAMINEs, no concern for embedded commas. Let Natural do all the hard work!

Hi Jerome & Ats,

We will have to wait for Aseem to clarify the problem.

Aseem,

We need a clarification of the problem:

Is there only one field that will have any quotes?

Is there always a field with two quotes?

If there is ,“field1,field2”, what do you want? Is it ,field1 field2, so that the value for one entry is field1 field2?

If there is ,“field1 field2”, what do you want? The same as above? (just a value for the entry of field1 field2?

steve

Ats: the task has to be performed on the mainframe :frowning: wish some other option was there.
Steve : Appreciate the immeadiate response the clarification is as follows:
the records in the dataset are , delimited so whenever the , are encountered they need to splitted up however following
scenarios are present for the records , so the record may appear in one of the following layout:
Please note the vertical layout is to show how a record appers they are not separate records just different fields of a single record.
record1: a1,a2,a3,a4,“a5,a6”,a7,a8,“a9”,a10
output:
a1
a2
a3
a4
a5 a6
a7
a8
a9
a10

record2: a1,a2,“a3”,a4,“a5 a6”,a7,a8,“a9”,a10
output:
a1
a2
a3
a4
a5 a6
a7
a8
a9
a10

the reason i have put a3 in " is because we know the field in which " can come but the occurence is not consistent for that
also if , is comin in one of the above mentioned field it would be enclosed in " like if a5 and a6 are to come in a single field in the
output record then they can come as “a5,a6” or “a5 a6”. The feedback from the client is that whenever is a non delimiter ,
is encountered it would be enclosed in a " and those " has to be used as a escape character for the , that means these , inside the " are to be ignored.
Hope the above clarifications help . Appreciate the help but keep em comin :slight_smile:

To add to “a3” field concept i will state the actual scenario:
the work file has name feilds for the perosn :
like : First Name, last name, full name
Full name is always in " that is "full name " but it can come in two scenarios: either "full name " or “full,name”
for the first name it will appear as follows:
First,last,“full”
but if the name is having a middle name then it would be shown as:
"first middle " ,last,“full name”.
Also i know the fields which may or maynot have the " shown

I had a feeling you were dealing with names.

I assumed that you would want to preserve the comma in the field that is within quotes. From your example, it appears you do not want the comma within your fields. In that case, you can change the examine at line 0170 above to REPLACE WITH ’ ’ and remove the examine at line 220. This code seems to work fine for both of your examples.


0010 DEFINE DATA LOCAL      
0020 1 #A      (A) DYNAMIC  
0030 1 #POS1   (I4)         
0040 1 #LEN    (I4)         
0050 1 #FIELDS (A50/100)    
0060 END-DEFINE             
0070 *                      
0080 MOVE 'a1,a2,a3,a4,"a5,a6",a7,a8,"a9",a10'              TO #A      
0090 *                                                                 
0100 PRINT 'Input String:' #A     // 'Parsed Data:'                    
0110 *                                                                 
0120 REPEAT                                                            
0130   EXAMINE #A FOR '"' GIVING POSITION #POS1                        
0140   IF #POS1 = 0                                                    
0150     ESCAPE BOTTOM                                                 
0160   END-IF                                                          
0170   EXAMINE SUBSTR (#A,#POS1) FOR '"' DELETE FIRST                  
0180   EXAMINE SUBSTR (#A,#POS1) FOR '"' GIVING POSITION #LEN          
0190   EXAMINE SUBSTR (#A,#POS1,#LEN) FOR ',' REPLACE WITH ' '         
0200   EXAMINE SUBSTR (#A,#POS1) FOR '"' DELETE FIRST                  
0210 END-REPEAT                                                        
0220 *                                                                 
0230 SEPARATE #A INTO #FIELDS (*) WITH DELIMITERS ','                  
0240 *                                                                 
0250 DISPLAY NOHDR (ES=ON) #FIELDS (*)                                       
0260 *                                                                 
0270 END                                                               

Output:


Input String: a1,a2,a3,a4,"a5,a6",a7,a8,"a9",a10 
                                                 
Parsed Data:                                     
a1                                               
a2                                               
a3                                               
a4                                               
a5 a6                                            
a7                                               
a8                                               
a9                                               
a10                                              

Input String: a1,a2,"a3",a4,"a5 a6",a7,a8,"a9",a10  
                                                    
Parsed Data:                                        
a1                                                  
a2                                                  
a3                                                  
a4                                                  
a5 a6                                               
a7                                                  
a8                                                  
a9                                                  
a10                                                 

Aseem: Even Natural 4 on Mainframe supports Alphanumeric variables defining with max length of 1073741824 i.e. 1GB and shouldn’t be a problem. Yes, you can define it as DYNAMIC too (can save space :idea: ) as Steve & Jerome suggested.

Yes, I agree with Jerome too, to use EXAMINE and achieve the result, it works fine in your cases.

Ralph’s solution is the great way but I understand that your client is running on Mainframe. I read in your post that

so it will be too hard :?: for you to get Natural for windows :roll:

BTW folks, defining work file with csv type is supported on Unix too, on MF it’s just FORMATTED/UNFORMATTED, are we expecting to have other types on MF as well in next version or SM?