'Separating' a 'long' record CSV file

Dave_Bowering · March 15, 2006, 10:24am

I need to read and ‘separate’ a CSV format file into 55 fields. My problem is that the length of each record in the file can be up to 858 characters in length.

When I try something like the following, there is no problem.

1 #CSV-INPUT-REC (A253)
.
.
READ WORK FILE RECORD #CSV-INPUT-REC
.
.
SEPARATE #CSV-INPUT-REC INTO #A #B etc.

However NAT 3.1.5 on Z/OS does not allow the variable to be defined any larger and SEPARATE does not appear to work with group fields or arrays. For obvious reasons I would prefer to use the SEPARATE statement rather than a manual process.

Does anyone have any suggestions? Thanks in advance.

Wilfried_B · March 15, 2006, 12:58pm

Dave,

if the CSV file is simple, i.e. strings not surrounded by double quotes, a combination of manual and automatic separation is possible. Obviously you already have defined the whole record into portions of A253 (or similar) fields. Separate each of these portions into its fields. Then combine the last field of one portion with the first field of the prior portion and you will get what you want. You have to catch the special cases that the last character of a portion is the delimiter character or blank.

If the CSV is complex, i.e. strings may be sourrounded by double quotes, this is more complex. It makes no difference if you are in NAT3 or NAT4. You have to delete the double quotes and to keep delimiters within double quotes. So you have to transfer byte by byte.

As long as SAG does not add a TYPE CSV [[WITH] [DELIMITER] delim] to the DEFINE WORK FILE, this is still a manual doing.

Matthias1 · March 15, 2006, 1:32pm

I guess you have to do it completely manually. Because with your logic, you don’t recognize if the last field of a portion is complete or not.
A suitalbe solution would be to redefine each portion with (A1/1:253).
Otherwise you are forced to work with a separate-loop and a remainder. And this is not very fast, I guess…

Matthias1 · March 15, 2006, 2:27pm

By the way: Does the following definition work?

define data local
01 #a1_array (A1/1:1000)
end-define

Finn_the_Dane · March 15, 2006, 2:53pm

Jep !
in nat315 you can define arrays up to 32Kb
Finn

DEFINE DATA LOCAL
01

Wilfried_B · March 15, 2006, 6:00pm

Ok, I swapped “first” and “last”.

The statement in question was SEPARATE and SEPARATE does not work on arrays!

Wei · March 15, 2006, 8:18pm

Will this Code Work?

DEFINE DATA LOCAL
1 C_ANZ-FIELDS(I2) CONST <100> /* Max number of Fieles expected
1 #CSV-FIELD(A50/C_ANZ-FIELDS) /* Max length of a Field is 50 Byte!
1 #INFILE(A100/10) /* 1000 Bytes to read
1 #CSV-INDEX(I2) /* Pointer to the current CSV-Field
1 #WORK-INDEX(I2) /* Pointer to the #infile()
1 #A250(A250) / Workfield
1 I(I2) /* Workinteger
END-DEFINE
READ WORK 1 #INFILE(*)
PERFORM S-SEPARATE2CSV
FOR I = 1 TO 50
IF #CSV-FIELD(I) NE ´ ´
DISPLAY I #CSV-FIELD(I)
END-IF
END-FOR
END-WORK

DEFINE S-SEPARATE2CSV /*

RESET
#WORK-INDEX
#CSV-INDEX
#A250
REPEAT
EXAMINE #A250 FOR ´;´ GIVING NUMBER I /* Count delimiters
IF I < 2 AND #WORK-INDEX < 10 /* only one or less
ADD 1 TO #WORK-INDEX /* take next #infile
COMPRESS #A250 #INFILE(#WORK-INDEX) TO #A250 LEAVING NO SPACE
END-IF
ADD 1 TO #CSV-INDEX
SEPARATE #A250 INTO #CSV-FIELD(#CSV-INDEX)
REMAINDER #A250
WITH DELIMITER ´;´
WHILE #A250 NE ´ ´
END-REPEAT
*
END-SUBROUTINE
*
END

Steve_Robinson · March 16, 2006, 12:32am

Hi Dave;

Am busy today so do not have the time to play with the code javascript:emoticon(‘’)
Crying or Very sad

However, perhaps someone else has the time to play today.

READ WORK FILE RECORD #GROUP

where 1 #GROUP
2 #a (a250)
2 #b (a250)
2 #C (a250)
2 #d (a250)

Now do a separate #a into #array1() giving number #num1
separate #B into #array2() giving number #num2
separate #c into #array3() giving number #num3
separate #d into #array4() giving number #num4

Now consider the “interface” between #array1 and #array2.

two possibilities

either an “exact split” between #a and #b (delimiter at #a position 250
or #b position 1)
not

If 1) - you are done there
If 2) - you must “concatenate” #a (#num1) and #b (1)

continue for the other two interfaces.

steve

Dave_Bowering · March 16, 2006, 4:12am

Thanks everyone for your contributions. It seems the best solution as is possible given the scenario.

I assume that none of these workarounds are necessary when using NAT4?

Dave_Bowering · March 16, 2006, 6:24am

For your information I have requested that the file be provided to me with fixed length fields rather than CSV format.

I have discovered that it is possible that " characters are present in the CSV file which complicates things more than I would like.

The processing becomes more complex when attempting to rejoin the fields that may have been inadvertently separated (E.g. fields containing embedded blanks) and also when separating the array variables into the actual variables and I really want to keep it as simple as possible.

Example
The following simulation of what may occur when joining the last occurrence of the first array with the first occurrence of the second array.

DEFINE DATA
LOCAL
1 #A(A50/2)
1 #B(A50/2)
1 #C(A50)
END-DEFINE
*
#A(1) := ‘BLAH BLAH BLAH’
#A(2) := ‘JOH’
#B(1) := ‘N SMITH’
*
COMPRESS #A(2) #B(1) INTO #C
WRITE #C
END

Without checking the original field how can you know whether the second field contained ‘JOHN SMITH’ or ‘JOH N SMITH’ or ‘JOH N SMITH’? No doubt it can be done but the whole thing is becoming quite messy.

Anyway the providers of the CSV file have agreed to change it to a fixed length file with fixed length fields so I dont have to worry about it, but I guess someone might need to one day.

Thanks
Dave

system · March 16, 2006, 12:58pm

affirmative

Matthias1 · March 16, 2006, 1:28pm

Hello Wolfgang Wei

system · March 16, 2006, 2:17pm

Hi guys,

When you have finished this thread, could you please post the working code sample you’ve agreed on to the Natural Open Source forum so we can find it in the future?

Thanks,
Steve

Wilfried_B · March 16, 2006, 9:45pm

Hi Wolfgang,

I don’t agree. Even with Nat4 you can’t separate CSV files, where text fields are sourrounded by double quotes (and text fields may contain the delimiter character).

And Dave has got such a file:

But I agree if you only have CSV-like files, where fields are separated by the delimiter character, text fields are not surrounded by double quotes and text fields do not contain the delimiter character.

Jerome_LeBlanc · March 17, 2006, 12:45am

[quote="Wilfried B

system · March 17, 2006, 1:05am

[quote="Wilfried B

Matthias1 · March 17, 2006, 3:07pm

No! Just try:

...
INIT<'ABCDEF;"quoted;string";LMNOP'> 
...

Matthias1 · March 17, 2006, 4:56pm

@Jerome LeBlanc

EXAMINE #CSV-FIELD FOR '""' REPLACE WITH '~'

From my point of view this code can cause problems. Imagine a field only containing a double-quote. In CSV this would be:

field1;"""";field2

So I think it’s better to write a proper parser without using SEPARATE.
For example:

define data local
*
01 #max_length (I4) const <100>
01 #delimiter  (A1) const <';'>
01 #quote      (A1) const <'"'>
*
01 #csvline (A100) init <'"""test""";;"test;test";"""";"test""test";'>
01 redefine #csvline
  02 #csv1  (A1/1:#max_length)
*
01 #i4 (I4)
*
01 #a (A100)
01 redefine #a
  02 #a1  (A1/1:#max_length)
*
01 #a_index       (I4)
*
01 #fields        (A100/1:#max_length)
01 #field_index   (I4)
*
01 #field_open         (L)
01 #last_delimiter_pos (I4)
01 #skip_next_quote    (L)
*
end-define
*
for #i4 = 1 to #max_length
*
  decide on first value of #csv1(#i4)
*
     value #delimiter
       if #field_open
         add 1 to #a_index
         move #csv1(#i4) to #a1(#a_index)
       else
         move #i4 to #last_delimiter_pos
         add 1 to #field_index
         move #a to #fields(#field_index)
         reset #a #a_index
       end-if
*
     value #quote
       if #skip_next_quote
         reset #skip_next_quote
       else
         if #last_delimiter_pos = #i4 - 1
           move true to #field_open
         else
           if #i4 = #max_length or #csv1(#i4 + 1) ne #quote
             move false to #field_open
           else
             move true to #skip_next_quote
             add 1 to #a_index
             move #csv1(#i4) to #a1(#a_index)
           end-if
         end-if
       end-if
*
     none value
       add 1 to #a_index
       move #csv1(#i4) to #a1(#a_index)
*
  end-decide
*
end-for
*
for #i4 = 1 to #field_index
  write #i4 #fields(#i4) (AL=50)
end-for
*
end

Wilfried_B · March 17, 2006, 5:32pm

2: it is not correct, it should read

(without the quotes) after CSV parsing.

system · March 17, 2006, 8:43pm

[quote="Wilfried B

Topic		Replies	Views
Variable Delimited dataset Adabas-Natural , Natural , Natural-on-Mainframes	11	5243	April 2, 2021
Create an Excel-File from Adabas-Report (AdaRep) Adabas-Natural , Natural , Natural-Code-Samples , Natural-on-Windows-Unix	13	28440	April 2, 2021
CSV-Workfile + X-Array = NAT6256. Why? Adabas-Natural , Natural , Natural-on-Linux	8	1109	April 2, 2021
Reading a tsv file Adabas-Natural , Natural , Natural-on-Mainframes	14	36589	April 2, 2021
Finding Out the Numeric Position Adabas-Natural , Natural , Natural-Code-Samples	17	42320	April 2, 2021

'Separating' a 'long' record CSV file

Related topics