# Sort of Workfiles

Hi all,

how do you Sort your Workfiles? Do you use Naturals internal Sort, or use some external sort Programms. And if last one how about negative numeric fields, binary, floating point and so on?

Greetings Sascha

A few years ago, I had the problem to sort 1.7 Million records by an ascii-string. I solved this with a Unix-sort, because it was significantly faster.

Generally, negatives and decimal points are no problem for the Unix-command sort -n. But numeric fields in terms of Natural could be a problem. If necessary, I would be best to write an own sorting-algorithm (e.g. in perl) for that issue.

http://www.perlfect.com/articles/sorting.shtml

We are using SyncSort on HP-UX. we use it as external and as internal sort with Natural.

No problems with “negative numeric fields, binary, floating point and so on”.

I asked because i don’t know much about Natural for Unix Workfiles. But to the Sort-question. AFAIK sort also as grep and many other unix-command-line tools works linebased. What if i use the following Program:

``````
define data
local
01 #workfilestructure
02 #binary-first (b4)
02 #text         (a80)
02 #nummeric     (n14.7)
02 #floating     (f8)
02 #anothertext  (a8)
end-define
#binary-first = H'0A'
#text         = 'Hi Community'
#nummeric     = -12345.789
#floating     = 0.000001
#anothertext  = '01234567'
write work 01 #workfilestructure
end``````

if i use hexer it looks like:

``````
00000000:  0a 20 20 20 48 69 20 43  6f 6d 6d 75 6e 69 74 79  .   Hi Community
00000010:  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20
00000020:  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20
00000030:  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20
00000040:  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20
00000050:  20 20 20 20 30 30 30 30  30 30 30 30 30 31 32 33      000000000123
00000060:  34 35 37 38 39 30 30 30  70 8d ed b5 a0 f7 c6 b0  45789000p.......
00000070:  3e 30 31 32 33 34 35 36  37 0a -- -- -- -- -- --  >01234567.------``````

hexdump -vC is:

[code]

00000000 0a 20 20 20 48 69 20 43 6f 6d 6d 75 6e 69 74 79 |. Hi Community|
00000010 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 | |
00000020 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 | |
00000030 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 | |
00000040 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 | |
00000050 20 20 20 20 30 30 30 30 30 30 30 30 30 31 32 33 | 000000000123|
00000060 34 35 37 38 39 30 30 30 70 8d ed b5 a0 f7 c6 b0 |45789000p.

I almost forget about your sorting problem. Here is an example of a sorting-algorithm in perl. To keep it simple, I only used 15 Byte per record containing three fields.

``````#!/usr/bin/perl
use strict;

\$/=\15;                # treat 15 byte as one record

my @lines=(<STDIN>);   # read whole standard input into an array

for (sort mysort (@lines)) {  # run thru sorted array
print;                      # write it on standard output
}

sub mysort {           # sorting algorithm (compares \$a with \$b)
my @fields_a;
my @fields_b;

@fields_a = unpack "A5A5A5", \$a;   # split line into single fields
@fields_b = unpack "A5A5A5", \$b;   # split line into single fields

\$fields_a[0] cmp \$fields_b[0]      # compare field 0 as ascii
||                    # if field 0 is equal
\$fields_a[1] <=> \$fields_b[1]      # compare field 1 numerical
||                    # if field 1 is also equal
\$fields_b[2] cmp \$fields_a[2];     # compare field 2 descending
}``````

• no problem with newline-characters
• you can change the sorting algorithm exactly to your special needs
• the perl-command “unpack” can interpret formats like integer and float
• perl is available on almost every platform. On Linux it comes with the standard-installation.