Reading a tsv file

Hello all,

I have a tsv(tab separated value) file as input to my program.
Is there any way in Natural to read a tsv file.
It would be very great if anybody can help me in this regard.

Thanks in advance

Kalyani

I’m not sure if the help you seek is reading the records into your Natural program or parsing the data once it is read.

The following assumes that the file contains end of line characters that will allow you to read it one record at a time. If you are saving it from a program like Excel, that should be the case.

You will need to read the file using a text string that is longer than the longest record on your file (including tabs). Then separate the string into pieces using the tab character as you separator. Here is a sample of using separate:

DEFINE DATA LOCAL
1 #STRING (A40) INIT <'asdf~1234~~4321~fdsa'>
1 #GROUP
  2 FIRST (A10)  
  2 SECOND (A5)
  2 THIRD  (A5) 
  2 FOURTH (A5)
*
END-DEFINE
*
SEPARATE #STRING
  LEFT JUSTIFIED INTO #GROUP
  IGNORE                  /* #group only has 4 fields, discard the rest of #string
  WITH DELIMITERS '~' 
*
DISPLAY #STRING #GROUP
*
END

For clarity, I have replaced the tab character with ‘~’. You will need to find out the hexadecimal code for TAB on your system. H’05’ works for us. So…

WITH DELIMITERS '~'

would be…

WITH DELIMITERS H'05'

instead.

The fields within #GROUP must be longer than the value that will be assigned to them our you will get an error. So if your first field on your file is a name, and names can be up to 50 characters long, FIELD1 must be at least (A60). If you don’t know what the longest value is, make all of your fields (A250) and then you can convert them to whatever length you need.

I haven’t tried tab-delimited, but I have processed character-delimited files on the mainframe. To use that technique here, I converted all the tab characters (h’05’) to colons using the ISPF editor.

Then I use an INPUT statement to separate the fields.

DEFINE DATA LOCAL                                                         
1 #A (A20/10)                                                             
1 #N1 (N7.2)                                                              
1 #N2 (N7.2)                                                              
1 #N3 (N7.2)                                                              
END-DEFINE                                                                
*                                                                         
REPEAT                                                                    
  INPUT (AD=A)                                                            
        #A (*)                                                            
  IF  NOT #A (4) IS (N7.2)              /* Header                         
    THEN                                                                  
      ESCAPE TOP                                                          
  END-IF                                                                  
  EXAMINE #A (*) FOR H'7F' DELETE       /* Remove double-quotes           
  EXAMINE #A (*) FOR ','   DELETE       /* Remove commas                  
  IF  #A (4) <> ' '               /* Convert specific fields to Numeric   
    THEN                                                                  
      ASSIGN #N1 = VAL (#A (4))                                           
    ELSE                                                                  
      RESET  #N1                                                 
  END-IF                                                         
  DISPLAY (2)                           /* Other than screen     
          #A (1)                                                 
          #A (2)                                                 
          #A (3)                                                 
          #N1                                                    
END-REPEAT                                                       
END                                                              

You may need more logic to handle headers and blank lines.

In the JCL, the delimited file is presented as input as follows.

//CMSYNIN DD *   
%Q   
LOGON xxx   
GLOBALS IM=D ID=:   
TABDEL   
/*   
//       DD DSN=CHAR.DELIMITD,DISP=SHR   
//       DD *   
%/   
FIN   
/*   
//

NOTE: I have NOT benchmarked INPUT vs SEPARATE.

You can read the tab delimited file into a dynamic variable, then use the SEPARATE as Jerome suggests. The #GROUP elements could also be dynamic variables, allowing them to be of any length, then assigned (eg MOVE BY NAME) to another group with fixed lengths, if need be.

Something like:


** read tab delimited file using dynamic vars   
** 
DEFINE DATA  
LOCAL            
01 #STRING(A) DYNAMIC    
01 #GROUP     
  2 FIRST (A10)                   
  2 SECOND (A5)   
  2 THIRD  (A5)     
  2 FOURTH (A5)    
*                          
01 #LEN (I4) INIT<0> 
01 #I (I4)                  
END-DEFINE               
*                             
** if you know the max, skip the first read loop
READ WORK FILE 5 #STRING GIVING LENGTH #I       
  IF #I > #LEN          
    #LEN := #I           
END-WORK               
CLOSE WORK FILE 5  
EXPAND DYNAMIC #STRING TO #LEN  
MOVE ALL ' ' TO #STRING UNTIL #LEN 
*       
READ WORK FILE 5  #STRING  
  SEPARATE #STRING              
    LEFT JUSTIFIED INTO #GROUP
    IGNORE /* can also save and parse overflow, if needed    
    WITH DELIMITER H'05' /* tab char
  DISPLAY #GROUP                               
END-WORK     
END 

Thanks everyone for the quick response :smiley: .
I will try out the separate command option with tab delimiter
and will let you all know.

Thanks again

Kalyani

Since #STRING is DYNAMIC, the first loop in Doug’s code (to find the max) is not necessary. Simply remove the first READ loop and the program still runs fine.

steve

Thanks everyone. Separate command with delimiter H’05’ worked perfectly for me.

Kalyani

What version of Natural do you need to code the DYNAMIC variable? I can’t seem to get the program to compile the way it is written in the sample program. It gives me the following error: “NAT0169 Error in EM, HD, PM definition or in format/length.”

Assuming I can’t use the DYNAMIC option, is there another way I can read a tab delimited file with 2800 bytes record length?

Thank you.
James Martinez
email: jmartinez@baxglobal.com
phone: 714-442-7568
BAX Global

Natural 4 and above.

DYNAMIC variables are available with Natural 4.x on the mainframe and 6.x on Open Systems.

Trying to make it even more dynamic

I was experimenting with dynamic variables and x-arrays and am trying to take the challenge one step further. I want to read a workfile (csv in my case, could be tsv as well by choosing another separator character) whose number of rows and number of columns I do not know beforehand into an x-array of dynamic alpha-fields. The program I wrote (see below) seems to work ok, but there is a major drawback: The #INPUT-Variable I need cannot be dynamic: the compiler tells me that this is only possible with type ‘unformatted’ (< corrected) and type ‘portable’. (But in that case I would get the whole file into the dynamic #INPUT-String like in Douglas’ suggestion which I could not use here because I want to preserve the column / row dimension of the input.)

Does anyone have a suggestion to that?

* excel into x-array with dynamic alpha-fields
DEFINE DATA LOCAL
1 #ARRAY1
  2 #ROWS (1:*)
    3 #COLS (1:*)
      4 #FIELD (A) DYNAMIC
1 #ROW       (I4)
1 #COL       (I4)
1 #ROW-MAX   (I4) INIT <1>
1 #COL-MAX   (I4) INIT <1>
1 #INPUT     (A555)  
1 #I         (I4)
END-DEFINE
*
DEFINE WORK FILE 2 'C:\some_workfile_here'
*
* * first read to determine no. of columns
READ WORK 2 ONCE #INPUT 
EXAMINE #INPUT FOR ';' GIVING NUMBER IN #COL-MAX
ADD 1 TO #COL-MAX     
WRITE 'number of columns in work file (=EXCEL-sheet):' #COL-MAX
*
* * materialize the x-array
EXPAND ARRAY #ROWS TO (1:#ROW-MAX)
EXPAND ARRAY #COLS TO (*,1:#COL-MAX)
*
READ WORK 2 #INPUT
  SEPARATE #INPUT INTO #ROWS(#ROW-MAX) WITH DELIMITER ';'
  ADD 1 TO #ROW-MAX
* * materialize the rows-dimension of x-array with every read
  EXPAND ARRAY #ROWS TO (1:#ROW-MAX)
END-WORK
*
PRINT '=' #ROW-MAX / 
'=' #COL-MAX
*
#ROW := 3
#COL := 2
SKIP 2
PRINT 'Field' #ROW #COL ':' #FIELD(#ROW, #COL)
*
END

Here is some code which shows one way to go about this. I have a CSV file which has three columns and two rows.

DEFINE DATA LOCAL
1 #A (A) DYNAMIC
1 #B (A70)
1 #POS (I2)
1 #NUM (I2)
1 #NUM2 (I2)
END-DEFINE
*
DEFINE WORK FILE 1 ‘i:/playdim1.csv’ TYPE ‘UNFORMATTED’
*
READ WORK FILE 1 ONCE #A
*
PRINT #A //
EXAMINE #A FOR H’0D0A’ GIVING NUMBER #NUM2 GIVING POSITION #POS
EXAMINE SUBSTRING (#A,1,#POS) FOR ‘,’ GIVING NUMBER #NUM
ADD 1 TO #NUM
WRITE ‘=’ #NUM 5X ‘=’ #NUM2 //
*
DEFINE WORK FILE 1 ‘i:/playdim1.txt’
*
READ WORK FILE 1 #B
WRITE #B
END-WORK
*
END

And the output:

Page 1 07-08-18 15:36:00

"Qwe11","Qwe12","Qwe13"
"Zxc21","Zxc22","Zxc23"

#NUM: 3 #NUM2: 2

“Qwe11”,“Qwe12”,“Qwe13”
“Zxc21”,“Zxc22”,“Zxc23”

In other words, you start off by treating the workfile as unformatted, so you can read it into a DYNAMIC variable.

You EXAMINE the dynamic variable to ascertain the number of rows and columns.

Then you can EXPAND rows and columns as you did above. For simplicity, I just read them into a single alpha variable.

The key point is that with a different DEFINE WORK FILE, you can solve the problem you described above.

steve

Great! I wasn’t aware that I could examine the dynamic variable for columns AND rows: in other words I wasn’t aware that the CRLF (viz. H’0D0A’ ) was still there, even with type ‘unformatted’ and the workfile in one long string.

If reading the unformatted file into a dynamic variable is a problem, just read one character at a time. This is just another alternative (it wil probably be a little slower than reading bigger chuncks).