I need to read and ‘separate’ a CSV format file into 55 fields. My problem is that the length of each record in the file can be up to 858 characters in length.
When I try something like the following, there is no problem.
1 #CSV-INPUT-REC (A253)
.
.
READ WORK FILE RECORD #CSV-INPUT-REC
.
.
SEPARATE #CSV-INPUT-REC INTO #A#B etc.
However NAT 3.1.5 on Z/OS does not allow the variable to be defined any larger and SEPARATE does not appear to work with group fields or arrays. For obvious reasons I would prefer to use the SEPARATE statement rather than a manual process.
Does anyone have any suggestions? Thanks in advance.
if the CSV file is simple, i.e. strings not surrounded by double quotes, a combination of manual and automatic separation is possible. Obviously you already have defined the whole record into portions of A253 (or similar) fields. Separate each of these portions into its fields. Then combine the last field of one portion with the first field of the prior portion and you will get what you want. You have to catch the special cases that the last character of a portion is the delimiter character or blank.
If the CSV is complex, i.e. strings may be sourrounded by double quotes, this is more complex. It makes no difference if you are in NAT3 or NAT4. You have to delete the double quotes and to keep delimiters within double quotes. So you have to transfer byte by byte.
As long as SAG does not add a TYPE CSV [[WITH] [DELIMITER] delim] to the DEFINE WORK FILE, this is still a manual doing.
I guess you have to do it completely manually. Because with your logic, you don’t recognize if the last field of a portion is complete or not.
A suitalbe solution would be to redefine each portion with (A1/1:253).
Otherwise you are forced to work with a separate-loop and a remainder. And this is not very fast, I guess…
DEFINE DATA LOCAL
1 C_ANZ-FIELDS(I2) CONST <100> /* Max number of Fieles expected
1 #CSV-FIELD(A50/C_ANZ-FIELDS) /* Max length of a Field is 50 Byte!
1 #INFILE(A100/10) /* 1000 Bytes to read
1 #CSV-INDEX(I2) /* Pointer to the current CSV-Field
1 #WORK-INDEX(I2) /* Pointer to the #infile()
1 #A250(A250) / Workfield
1 I(I2) /* Workinteger
END-DEFINE
READ WORK 1 #INFILE(*)
PERFORM S-SEPARATE2CSV
FOR I = 1 TO 50
IF #CSV-FIELD(I) NE ´ ´
DISPLAY I #CSV-FIELD(I)
END-IF
END-FOR
END-WORK
DEFINE S-SEPARATE2CSV /*
RESET #WORK-INDEX #CSV-INDEX #A250
REPEAT
EXAMINE #A250 FOR ´;´ GIVING NUMBER I /* Count delimiters
IF I < 2 AND #WORK-INDEX < 10 /* only one or less
ADD 1 TO #WORK-INDEX /* take next #infile
COMPRESS #A250#INFILE(#WORK-INDEX) TO #A250 LEAVING NO SPACE
END-IF
ADD 1 TO #CSV-INDEX
SEPARATE #A250 INTO #CSV-FIELD(#CSV-INDEX)
REMAINDER #A250
WITH DELIMITER ´;´
WHILE #A250 NE ´ ´
END-REPEAT
*
END-SUBROUTINE
*
END
Now do a separate #a into #array1() giving number #num1
separate #B into #array2() giving number #num2
separate #c into #array3() giving number #num3
separate #d into #array4() giving number #num4
Now consider the “interface” between #array1 and #array2.
two possibilities
either an “exact split” between #a and #b (delimiter at #a position 250
or #b position 1)
not
If 1) - you are done there
If 2) - you must “concatenate” #a (#num1) and #b (1)
For your information I have requested that the file be provided to me with fixed length fields rather than CSV format.
I have discovered that it is possible that " characters are present in the CSV file which complicates things more than I would like.
The processing becomes more complex when attempting to rejoin the fields that may have been inadvertently separated (E.g. fields containing embedded blanks) and also when separating the array variables into the actual variables and I really want to keep it as simple as possible.
Example
The following simulation of what may occur when joining the last occurrence of the first array with the first occurrence of the second array.
DEFINE DATA
LOCAL
1 #A(A50/2)
1 #B(A50/2)
1 #C(A50)
END-DEFINE
* #A(1) := ‘BLAH BLAH BLAH’ #A(2) := ‘JOH’ #B(1) := ‘N SMITH’
*
COMPRESS #A(2) #B(1) INTO #C
WRITE #C
END
Without checking the original field how can you know whether the second field contained ‘JOHN SMITH’ or ‘JOH N SMITH’ or ‘JOH N SMITH’? No doubt it can be done but the whole thing is becoming quite messy.
Anyway the providers of the CSV file have agreed to change it to a fixed length file with fixed length fields so I dont have to worry about it, but I guess someone might need to one day.
When you have finished this thread, could you please post the working code sample you’ve agreed on to the Natural Open Source forum so we can find it in the future?
I don’t agree. Even with Nat4 you can’t separate CSV files, where text fields are sourrounded by double quotes (and text fields may contain the delimiter character).
And Dave has got such a file:
But I agree if you only have CSV-like files, where fields are separated by the delimiter character, text fields are not surrounded by double quotes and text fields do not contain the delimiter character.