'Separating' a 'long' record CSV file

I need to read and ‘separate’ a CSV format file into 55 fields. My problem is that the length of each record in the file can be up to 858 characters in length.

When I try something like the following, there is no problem.

1 #CSV-INPUT-REC (A253)
.
.
READ WORK FILE RECORD #CSV-INPUT-REC
.
.
SEPARATE #CSV-INPUT-REC INTO #A #B etc.

However NAT 3.1.5 on Z/OS does not allow the variable to be defined any larger and SEPARATE does not appear to work with group fields or arrays. For obvious reasons I would prefer to use the SEPARATE statement rather than a manual process.

Does anyone have any suggestions? Thanks in advance.

Dave,

if the CSV file is simple, i.e. strings not surrounded by double quotes, a combination of manual and automatic separation is possible. Obviously you already have defined the whole record into portions of A253 (or similar) fields. Separate each of these portions into its fields. Then combine the last field of one portion with the first field of the prior portion and you will get what you want. You have to catch the special cases that the last character of a portion is the delimiter character or blank.

If the CSV is complex, i.e. strings may be sourrounded by double quotes, this is more complex. It makes no difference if you are in NAT3 or NAT4. You have to delete the double quotes and to keep delimiters within double quotes. So you have to transfer byte by byte.

As long as SAG does not add a TYPE CSV [[WITH] [DELIMITER] delim] to the DEFINE WORK FILE, this is still a manual doing.

I guess you have to do it completely manually. Because with your logic, you don’t recognize if the last field of a portion is complete or not.
A suitalbe solution would be to redefine each portion with (A1/1:253).
Otherwise you are forced to work with a separate-loop and a remainder. And this is not very fast, I guess…

By the way: Does the following definition work?

define data local
01 #a1_array (A1/1:1000)
end-define

Jep !
in nat315 you can define arrays up to 32Kb
Finn

DEFINE DATA LOCAL
01

Ok, I swapped “first” and “last”.

The statement in question was SEPARATE and SEPARATE does not work on arrays!

Will this Code Work?

DEFINE DATA LOCAL
1 C_ANZ-FIELDS(I2) CONST <100> /* Max number of Fieles expected
1 #CSV-FIELD(A50/C_ANZ-FIELDS) /* Max length of a Field is 50 Byte!
1 #INFILE(A100/10) /* 1000 Bytes to read
1 #CSV-INDEX(I2) /* Pointer to the current CSV-Field
1 #WORK-INDEX(I2) /* Pointer to the #infile()
1 #A250(A250) /
Workfield
1 I(I2) /* Workinteger
END-DEFINE
READ WORK 1 #INFILE(*)
PERFORM S-SEPARATE2CSV
FOR I = 1 TO 50
IF #CSV-FIELD(I) NE ´ ´
DISPLAY I #CSV-FIELD(I)
END-IF
END-FOR
END-WORK


DEFINE S-SEPARATE2CSV /*


RESET
#WORK-INDEX
#CSV-INDEX
#A250
REPEAT
EXAMINE #A250 FOR ´;´ GIVING NUMBER I /* Count delimiters
IF I < 2 AND #WORK-INDEX < 10 /* only one or less
ADD 1 TO #WORK-INDEX /* take next #infile
COMPRESS #A250 #INFILE(#WORK-INDEX) TO #A250 LEAVING NO SPACE
END-IF
ADD 1 TO #CSV-INDEX
SEPARATE #A250 INTO #CSV-FIELD(#CSV-INDEX)
REMAINDER #A250
WITH DELIMITER ´;´
WHILE #A250 NE ´ ´
END-REPEAT
*
END-SUBROUTINE
*
END

Hi Dave;

Am busy today so do not have the time to play with the code javascript:emoticon(‘:cry:’)
Crying or Very sad :cry:

However, perhaps someone else has the time to play today.

READ WORK FILE RECORD #GROUP

where 1 #GROUP
2 #a (a250)
2 #b (a250)
2 #C (a250)
2 #d (a250)

Now do a separate #a into #array1() giving number #num1
separate #B into #array2(
) giving number #num2
separate #c into #array3() giving number #num3
separate #d into #array4(
) giving number #num4

Now consider the “interface” between #array1 and #array2.

two possibilities

  1. either an “exact split” between #a and #b (delimiter at #a position 250
    or #b position 1)

  2. not

If 1) - you are done there
If 2) - you must “concatenate” #a (#num1) and #b (1)

continue for the other two interfaces.

steve

Thanks everyone for your contributions. It seems the best solution as is possible given the scenario.

I assume that none of these workarounds are necessary when using NAT4?

For your information I have requested that the file be provided to me with fixed length fields rather than CSV format.

I have discovered that it is possible that " characters are present in the CSV file which complicates things more than I would like.

The processing becomes more complex when attempting to rejoin the fields that may have been inadvertently separated (E.g. fields containing embedded blanks) and also when separating the array variables into the actual variables and I really want to keep it as simple as possible.

Example
The following simulation of what may occur when joining the last occurrence of the first array with the first occurrence of the second array.

DEFINE DATA
LOCAL
1 #A(A50/2)
1 #B(A50/2)
1 #C(A50)
END-DEFINE
*
#A(1) := ‘BLAH BLAH BLAH’
#A(2) := ‘JOH’
#B(1) := ‘N SMITH’
*
COMPRESS #A(2) #B(1) INTO #C
WRITE #C
END

Without checking the original field how can you know whether the second field contained ‘JOHN SMITH’ or ‘JOH N SMITH’ or ‘JOH N SMITH’? No doubt it can be done but the whole thing is becoming quite messy.

Anyway the providers of the CSV file have agreed to change it to a fixed length file with fixed length fields so I dont have to worry about it, but I guess someone might need to one day.

Thanks
Dave

affirmative :wink:

Hello Wolfgang Wei

Hi guys,

When you have finished this thread, could you please post the working code sample you’ve agreed on to the Natural Open Source forum so we can find it in the future?

Thanks,
Steve

Hi Wolfgang,

I don’t agree. Even with Nat4 you can’t separate CSV files, where text fields are sourrounded by double quotes (and text fields may contain the delimiter character).

And Dave has got such a file:

But I agree if you only have CSV-like files, where fields are separated by the delimiter character, text fields are not surrounded by double quotes and text fields do not contain the delimiter character.

[quote="Wilfried B

[quote="Wilfried B

No! Just try:

...
INIT<'ABCDEF;"quoted;string";LMNOP'> 
...

@Jerome LeBlanc

EXAMINE #CSV-FIELD FOR '""' REPLACE WITH '~'

From my point of view this code can cause problems. Imagine a field only containing a double-quote. In CSV this would be:

field1;"""";field2

So I think it’s better to write a proper parser without using SEPARATE.
For example:

define data local
*
01 #max_length (I4) const <100>
01 #delimiter  (A1) const <';'>
01 #quote      (A1) const <'"'>
*
01 #csvline (A100) init <'"""test""";;"test;test";"""";"test""test";'>
01 redefine #csvline
  02 #csv1  (A1/1:#max_length)
*
01 #i4 (I4)
*
01 #a (A100)
01 redefine #a
  02 #a1  (A1/1:#max_length)
*
01 #a_index       (I4)
*
01 #fields        (A100/1:#max_length)
01 #field_index   (I4)
*
01 #field_open         (L)
01 #last_delimiter_pos (I4)
01 #skip_next_quote    (L)
*
end-define
*
for #i4 = 1 to #max_length
*
  decide on first value of #csv1(#i4)
*
     value #delimiter
       if #field_open
         add 1 to #a_index
         move #csv1(#i4) to #a1(#a_index)
       else
         move #i4 to #last_delimiter_pos
         add 1 to #field_index
         move #a to #fields(#field_index)
         reset #a #a_index
       end-if
*
     value #quote
       if #skip_next_quote
         reset #skip_next_quote
       else
         if #last_delimiter_pos = #i4 - 1
           move true to #field_open
         else
           if #i4 = #max_length or #csv1(#i4 + 1) ne #quote
             move false to #field_open
           else
             move true to #skip_next_quote
             add 1 to #a_index
             move #csv1(#i4) to #a1(#a_index)
           end-if
         end-if
       end-if
*
     none value
       add 1 to #a_index
       move #csv1(#i4) to #a1(#a_index)
*
  end-decide
*
end-for
*
for #i4 = 1 to #field_index
  write #i4 #fields(#i4) (AL=50)
end-for
*
end

2: it is not correct, it should read

(without the quotes) after CSV parsing.

[quote="Wilfried B