Is Natural's sorting algorithm stable?

Hello all!

I just want to know whether Natural’s sorting algorithm (i.e. END-ALL … SORT … END-SORT) is stable or not. I guess it is. But I didn’t find any hint in the documentation…

Thanks

Matthias

Hi Mattias,
I have no clue about “Open Systems”, but on mainframe it looks rather as “unstable”, and this is why:

This is the INPUT data:

00000001 THIS IS REC1
00000002 THIS IS REC2
00000005 THIS IS REC3
00000002 THIS IS REC4
00000003 THIS IS REC5
00000004 THIS IS REC6
00000002 THIS IS REC7
00000006 THIS IS REC8
00000007 THIS IS REC9

This is my program:
DEFINE DATA LOCAL
1 #RECORD (A40)
1 REDEFINE #RECORD
2 #KEY (A08)
2 #DATA (A32)
END-DEFINE
READ WORK FILE 1 RECORD #RECORD
END-ALL
SORT #KEY USING #DATA
WRITE (1) #RECORD
END-SORT
END

And finally the output:
00000001 THIS IS REC1
00000002 THIS IS REC2
00000002 THIS IS REC7
00000002 THIS IS REC4
00000003 THIS IS REC5
00000004 THIS IS REC6
00000005 THIS IS REC3
00000006 THIS IS REC8
00000007 THIS IS REC9

As I may see, my REC7 comes out AFTER REC4; since they have the same key (00000002),
I excepted to get REC4 BEFORE REC7; so, the internal SORT on NATURAL mainframe is unstable. I suspect it may be parameterized
by I can not confirm it (?) Am I doing something wrong?

By the way, Matthias, in our shop the internal SORT is rather prohibited. We should use the EXTERNAL SORT utility, and this is what it (SYNCSORT) states about the matter:
“…The EQUALS parameter insures that the original order of equal-keyed records is preserved.
These records will be in the same order in the output file as they were in the
input file. NOEQUALS, the default, specifies that equal-keyed records may not be written
in their original input order.”

Well, it looks like not any sorting algorithm is “stable by default”.

Regards,
NK

Nikolay, it looks as you only sort by the first eight characters ("#KEY (A08)" your message got a smiley here :)).
And this is correct as far as I see. I think there is no guarantee for the order of the records before sorting is kept if they got same sort value.

Well, that`s why I think that “our” (on our shop mainframe, I mean) NATURAL internal sorting algorithm is “unstable”…

And the key is the first height chars of course :slight_smile:

On OpenSystems, I have not heard of stability issues, but I have heard of throughput limitations when dealing with very large amounts of data. Unlike the mainframe, there are many sort utilities available and a number of them resolve the throughput issue.

As to Nikolay’s mainframe tangent …

. I know of no stability issues with Natural’s mainframe/internal sort.
. It was designed for ease of use, so many parameter values are set on our behalf.
. It was designed for speed, so “NOEQUALS” is set to limit CPU usage.
. It was not designed to replace 3rd-party sort utilities. (Personally, I use the SORT verb for on-line sorting, and DFSORT or SyncSORT for batch.) Most shops set the Natural SORT parameter to invoke their external sort utility when executing in batch mode.

Based on the composition of the program and its reference to WORK FILE 1, it is likely that the test was run in batch. Therefore it also is likely that an external sort utility was invoked, and it generated the “unstable” output.

You can specify “EQUALS” (and other sort utility parameters) via Natural’s SORT parameter; Natural will pass the parms to the 3rd-party sort utility.

Thank you for your sample. It tried it out on my solaris environment:

define data local
01 #a21(a21/1:10) init <
'00000001 THIS IS REC1',
'00000002 THIS IS REC2',
'00000005 THIS IS REC3',
'00000002 THIS IS REC4',
'00000003 THIS IS REC5',
'00000004 THIS IS REC6',
'00000002 THIS IS REC7',
'00000006 THIS IS REC8',
'00000007 THIS IS REC9'>
1 #i1 (i1)
end-define
for #i1 = 1 to 9
  ignore
end-all
sort by #a21(#i1) using #i1
  display #a21(#i1)
end-sort
end

output:

        #A21
---------------------

00000001 THIS IS REC1
00000002 THIS IS REC2
00000002 THIS IS REC4
00000002 THIS IS REC7
00000003 THIS IS REC5
00000004 THIS IS REC6
00000005 THIS IS REC3
00000006 THIS IS REC8
00000007 THIS IS REC9

So it looks like it is stable… But I’m not sure.

Hi Matthias,

You’re sorting on the full 21 characters aren’t you?

I am at a loss as to why if you choose to only make your sort key the first 8 characters why one would consider the sort routine unstable when the characters 9 and after are returned in no particular order.

Seems perfectly fine to me.

Unstable would mean if you sort on the first 8 character and the results show that the first 8 characters are not sorted properly.

Oops. You’re right. That doesn’t make sense :oops:

That’s it:

define data local
01 #a21(a21/1:10) init <
'00000001 THIS IS REC1',
'00000002 THIS IS REC2',
'00000005 THIS IS REC3',
'00000002 THIS IS REC4',
'00000003 THIS IS REC5',
'00000004 THIS IS REC6',
'00000002 THIS IS REC7',
'00000006 THIS IS REC8',
'00000007 THIS IS REC9'>
1 #a8 (A8)
1 #i1 (i1)
end-define
for #i1 = 1 to 9
  #a8 := #a21(#i1)
end-all
sort by #a8       using #i1
  display #a21(#i1)
end-sort
end

And that’s unstable:

        #A21
---------------------

00000001 THIS IS REC1
00000002 THIS IS REC2
00000002 THIS IS REC7
00000002 THIS IS REC4
00000003 THIS IS REC5
00000004 THIS IS REC6
00000005 THIS IS REC3
00000006 THIS IS REC8
00000007 THIS IS REC9

What I mean: http://en.wikipedia.org/wiki/Sorting_algorithm#Stability

“Unstable would mean if you sort on the first 8 character and the results show that the first 8 characters are not sorted properly”

If your records are NOT sorted properly, I think you should rather conclude that there`s a bug in your SORT routine :slight_smile:

Matthias’ reply helped me to understand… it’s not unstable in the English-word sort of way. It’s “unstable” in a geeky redefinition of the word as it applies to classifying sort algorithms.

So then, the answer is “yes, the internal Natural sort routine is ‘unstable’ as it is not coded to care what order the records are in beyond the sort key, and no assurances are provided to keep the original order within the SORT order intact”.

I would have to assume that Software AG’s position long ago was why make the sort routine inefficient by executing more commands to make the non-sorted part of the record retain some kind of order? If it’s important to you, then include it in the sort key.

Natural is all about speed and efficiency, yes?

"Based on the composition of the program and its reference to WORK FILE 1, it is likely that the test was run in batch. Therefore it also is likely that an external sort utility was invoked, and it generated the “unstable” output. "

Ralph was absolutely right about it! Well, I should admit my example was non-convincing at all since the EXTERNAL SORT was called with NOEQUALS option (by default). I tried the example provided by Matthias in ONLINE MODE on our mainframe, and the result was the same.
Bottom line: we can NOT state that the INTERNAL sort is “unstable”; on the given example the INTERNAL NATURAL sort proved to being stable :slight_smile:
Any objections? :slight_smile: