Hello all,
I have a tsv(tab separated value) file as input to my program.
Is there any way in Natural to read a tsv file.
It would be very great if anybody can help me in this regard.
Thanks in advance
Kalyani
Hello all,
I have a tsv(tab separated value) file as input to my program.
Is there any way in Natural to read a tsv file.
It would be very great if anybody can help me in this regard.
Thanks in advance
Kalyani
I’m not sure if the help you seek is reading the records into your Natural program or parsing the data once it is read.
The following assumes that the file contains end of line characters that will allow you to read it one record at a time. If you are saving it from a program like Excel, that should be the case.
You will need to read the file using a text string that is longer than the longest record on your file (including tabs). Then separate the string into pieces using the tab character as you separator. Here is a sample of using separate:
DEFINE DATA LOCAL
1 #STRING (A40) INIT <'asdf~1234~~4321~fdsa'>
1 #GROUP
2 FIRST (A10)
2 SECOND (A5)
2 THIRD (A5)
2 FOURTH (A5)
*
END-DEFINE
*
SEPARATE #STRING
LEFT JUSTIFIED INTO #GROUP
IGNORE /* #group only has 4 fields, discard the rest of #string
WITH DELIMITERS '~'
*
DISPLAY #STRING #GROUP
*
END
For clarity, I have replaced the tab character with ‘~’. You will need to find out the hexadecimal code for TAB on your system. H’05’ works for us. So…
WITH DELIMITERS '~'
would be…
WITH DELIMITERS H'05'
instead.
The fields within #GROUP must be longer than the value that will be assigned to them our you will get an error. So if your first field on your file is a name, and names can be up to 50 characters long, FIELD1 must be at least (A60). If you don’t know what the longest value is, make all of your fields (A250) and then you can convert them to whatever length you need.
I haven’t tried tab-delimited, but I have processed character-delimited files on the mainframe. To use that technique here, I converted all the tab characters (h’05’) to colons using the ISPF editor.
Then I use an INPUT statement to separate the fields.
DEFINE DATA LOCAL
1 #A (A20/10)
1 #N1 (N7.2)
1 #N2 (N7.2)
1 #N3 (N7.2)
END-DEFINE
*
REPEAT
INPUT (AD=A)
#A (*)
IF NOT #A (4) IS (N7.2) /* Header
THEN
ESCAPE TOP
END-IF
EXAMINE #A (*) FOR H'7F' DELETE /* Remove double-quotes
EXAMINE #A (*) FOR ',' DELETE /* Remove commas
IF #A (4) <> ' ' /* Convert specific fields to Numeric
THEN
ASSIGN #N1 = VAL (#A (4))
ELSE
RESET #N1
END-IF
DISPLAY (2) /* Other than screen
#A (1)
#A (2)
#A (3)
#N1
END-REPEAT
END
You may need more logic to handle headers and blank lines.
In the JCL, the delimited file is presented as input as follows.
//CMSYNIN DD *
%Q
LOGON xxx
GLOBALS IM=D ID=:
TABDEL
/*
// DD DSN=CHAR.DELIMITD,DISP=SHR
// DD *
%/
FIN
/*
//
NOTE: I have NOT benchmarked INPUT vs SEPARATE.
You can read the tab delimited file into a dynamic variable, then use the SEPARATE as Jerome suggests. The #GROUP elements could also be dynamic variables, allowing them to be of any length, then assigned (eg MOVE BY NAME) to another group with fixed lengths, if need be.
Something like:
** read tab delimited file using dynamic vars
**
DEFINE DATA
LOCAL
01 #STRING(A) DYNAMIC
01 #GROUP
2 FIRST (A10)
2 SECOND (A5)
2 THIRD (A5)
2 FOURTH (A5)
*
01 #LEN (I4) INIT<0>
01 #I (I4)
END-DEFINE
*
** if you know the max, skip the first read loop
READ WORK FILE 5 #STRING GIVING LENGTH #I
IF #I > #LEN
#LEN := #I
END-WORK
CLOSE WORK FILE 5
EXPAND DYNAMIC #STRING TO #LEN
MOVE ALL ' ' TO #STRING UNTIL #LEN
*
READ WORK FILE 5 #STRING
SEPARATE #STRING
LEFT JUSTIFIED INTO #GROUP
IGNORE /* can also save and parse overflow, if needed
WITH DELIMITER H'05' /* tab char
DISPLAY #GROUP
END-WORK
END
Thanks everyone for the quick response .
I will try out the separate command option with tab delimiter
and will let you all know.
Thanks again
Kalyani
Since #STRING is DYNAMIC, the first loop in Doug’s code (to find the max) is not necessary. Simply remove the first READ loop and the program still runs fine.
steve
Thanks everyone. Separate command with delimiter H’05’ worked perfectly for me.
Kalyani
What version of Natural do you need to code the DYNAMIC variable? I can’t seem to get the program to compile the way it is written in the sample program. It gives me the following error: “NAT0169 Error in EM, HD, PM definition or in format/length.”
Assuming I can’t use the DYNAMIC option, is there another way I can read a tab delimited file with 2800 bytes record length?
Thank you.
James Martinez
email: jmartinez@baxglobal.com
phone: 714-442-7568
BAX Global
Natural 4 and above.
DYNAMIC variables are available with Natural 4.x on the mainframe and 6.x on Open Systems.
Trying to make it even more dynamic
I was experimenting with dynamic variables and x-arrays and am trying to take the challenge one step further. I want to read a workfile (csv in my case, could be tsv as well by choosing another separator character) whose number of rows and number of columns I do not know beforehand into an x-array of dynamic alpha-fields. The program I wrote (see below) seems to work ok, but there is a major drawback: The #INPUT-Variable I need cannot be dynamic: the compiler tells me that this is only possible with type ‘unformatted’ (< corrected) and type ‘portable’. (But in that case I would get the whole file into the dynamic #INPUT-String like in Douglas’ suggestion which I could not use here because I want to preserve the column / row dimension of the input.)
Does anyone have a suggestion to that?
* excel into x-array with dynamic alpha-fields
DEFINE DATA LOCAL
1 #ARRAY1
2 #ROWS (1:*)
3 #COLS (1:*)
4 #FIELD (A) DYNAMIC
1 #ROW (I4)
1 #COL (I4)
1 #ROW-MAX (I4) INIT <1>
1 #COL-MAX (I4) INIT <1>
1 #INPUT (A555)
1 #I (I4)
END-DEFINE
*
DEFINE WORK FILE 2 'C:\some_workfile_here'
*
* * first read to determine no. of columns
READ WORK 2 ONCE #INPUT
EXAMINE #INPUT FOR ';' GIVING NUMBER IN #COL-MAX
ADD 1 TO #COL-MAX
WRITE 'number of columns in work file (=EXCEL-sheet):' #COL-MAX
*
* * materialize the x-array
EXPAND ARRAY #ROWS TO (1:#ROW-MAX)
EXPAND ARRAY #COLS TO (*,1:#COL-MAX)
*
READ WORK 2 #INPUT
SEPARATE #INPUT INTO #ROWS(#ROW-MAX) WITH DELIMITER ';'
ADD 1 TO #ROW-MAX
* * materialize the rows-dimension of x-array with every read
EXPAND ARRAY #ROWS TO (1:#ROW-MAX)
END-WORK
*
PRINT '=' #ROW-MAX /
'=' #COL-MAX
*
#ROW := 3
#COL := 2
SKIP 2
PRINT 'Field' #ROW #COL ':' #FIELD(#ROW, #COL)
*
END
Here is some code which shows one way to go about this. I have a CSV file which has three columns and two rows.
DEFINE DATA LOCAL
1 #A (A) DYNAMIC
1 #B (A70)
1 #POS (I2)
1 #NUM (I2)
1 #NUM2 (I2)
END-DEFINE
*
DEFINE WORK FILE 1 ‘i:/playdim1.csv’ TYPE ‘UNFORMATTED’
*
READ WORK FILE 1 ONCE #A
*
PRINT #A //
EXAMINE #A FOR H’0D0A’ GIVING NUMBER #NUM2 GIVING POSITION #POS
EXAMINE SUBSTRING (#A,1,#POS) FOR ‘,’ GIVING NUMBER #NUM
ADD 1 TO #NUM
WRITE ‘=’ #NUM 5X ‘=’ #NUM2 //
*
DEFINE WORK FILE 1 ‘i:/playdim1.txt’
*
READ WORK FILE 1 #B
WRITE #B
END-WORK
*
END
And the output:
Page 1 07-08-18 15:36:00
"Qwe11","Qwe12","Qwe13"
"Zxc21","Zxc22","Zxc23"
#NUM: 3 #NUM2: 2
“Qwe11”,“Qwe12”,“Qwe13”
“Zxc21”,“Zxc22”,“Zxc23”
In other words, you start off by treating the workfile as unformatted, so you can read it into a DYNAMIC variable.
You EXAMINE the dynamic variable to ascertain the number of rows and columns.
Then you can EXPAND rows and columns as you did above. For simplicity, I just read them into a single alpha variable.
The key point is that with a different DEFINE WORK FILE, you can solve the problem you described above.
steve
Great! I wasn’t aware that I could examine the dynamic variable for columns AND rows: in other words I wasn’t aware that the CRLF (viz. H’0D0A’ ) was still there, even with type ‘unformatted’ and the workfile in one long string.
If reading the unformatted file into a dynamic variable is a problem, just read one character at a time. This is just another alternative (it wil probably be a little slower than reading bigger chuncks).