By accepting a friend’s challenge to write a program to count words of a text, I think I found a bug here.
If you may, please submit this program and check if you will have the same writes as I had.
Sorry for the bad/unalligned code. It was written fast.
If this is not a bug and I’m being stupid, please tell me this. I promise I won’t be mad
DEFINE DATA
LOCAL
1 PL (A15/10000)
1 CPL (N5/10000)
*
1 TRY (A15)
1 ARQ
2 LINHA (A250)
*
1 X (N5)
1 QUEBRA (A1/10) INIT<' ',',','.',':',';','!','?','/','(',')'>
1 QBR (A1)
1 I (N5)
1 IX (N5)
1 #I (N5)
1 #POSI (N5)
1 #POSF (N5)
1 #POSQ (N5)
1 #POS (N5)
1 #LRES (A250)
1 #CLIN (N5)
END-DEFINE
X := 1
**READ WORK FILE 1 ARQ
LINHA :=
'O CONTROLE POPULACIONAL é FREQUENTEMENTE UM MODO DE MUDAR DE ASSUNTO,'
-'SE VOCê OLHA OS NúMEROS, O MAIOR CRESCIMENTO POPULACIONAL E AS TAXAS'
-' SE VOCê OLHA OS NúMEROS, O MAIOR CRESCIMENTO POPULACIONAL.'
-' POPULACIONAL TENTANDO CONTAR O VALOR DOS NúMEROS. '
REPEAT
RESET #POSI #POSQ #POSF
FOR I 1 10
MOVE QUEBRA(I) TO QBR
EXAMINE LINHA FOR QBR GIVING POSITION #POSQ
IF (#POSQ LT #POSI AND #POSQ NE 0) OR #POSI = 0
MOVE #POSQ TO #POSI
END-IF
END-FOR
IF #POSI GT 1
#POSF := #POSI - 1
ELSE
#POSF := 1
END-IF
RESET TRY
MOVE SUBSTRING (LINHA,1,#POSF) TO TRY
IF TRY EQ QUEBRA(*)
IGNORE
ELSE
RESET IX
EXAMINE FULL PL(1:X) FOR TRY GIVING INDEX IX
IF IX NE 0 AND TRY NE PL(IX) /* BUG HERE. Examine is locating different words and assuming they're the same.
WRITE '=' TRY '=' PL(IX) /* I've tried EXAMINE and EXAMINE FULL... same result.
END-IF
IF IX NE 0 AND TRY EQ PL(IX)
ADD 1 TO CPL(IX)
ELSE
MOVE SUBSTRING (LINHA,1,#POSF) TO PL(X)
ADD 1 TO CPL(X)
END-IF
END-IF
#POS := #POSI + 1
#CLIN := 250 - #POSI
MOVE SUBSTRING (LINHA,#POS,#CLIN) TO LINHA
IF LINHA EQ ' '
ESCAPE BOTTOM
END-IF
ADD 1 TO X
END-REPEAT
FOR X 1 500
IF PL(X) NE ' '
WRITE '=' X '=' PL(X) '=' CPL(X)
END-IF
END-FOR
END
DEFINE DATA LOCAL
1 LINHA (A250)
1 #ARRAY (A30/1:200)
1 #NUMBER (N5)
END-DEFINE
*
INCLUDE AASETC
LINHA :=
‘O CONTROLE POPULACIONAL é FREQUENTEMENTE UM MODO DE MUDAR DE ASSUNTO,’
-‘SE VOCê OLHA OS NúMEROS, O MAIOR CRESCIMENTO POPULACIONAL E AS TAXAS’
-’ SE VOCê OLHA OS NúMEROS, O MAIOR CRESCIMENTO POPULACIONAL.’
-’ POPULACIONAL TENTANDO CONTAR O VALOR DOS NúMEROS. ’
*
EXAMINE LINHA FOR ‘, O’ REPLACE WITH ‘,O’
EXAMINE LINHA FOR FULL '. ’ REPLACE WITH ‘.’
SEPARATE LINHA INTO #ARRAY (*) WITH DELIMITER ‘, .’ GIVING NUMBER #NUMBER
WRITE ‘=’ #NUMBER
WRITE #ARRAY (1:45)
END
Page 1 16-02-01 17:05:25
#NUMBER: 40
O CONTROLE
POPULACIONAL é
FREQUENTEMENTE UM
MODO DE
MUDAR DE
ASSUNTO SE
VOCê OLHA
OS NúMEROS
O MAIOR
CRESCIMENTO POPULACIONAL
E AS
TAXAS SE
VOCê OLHA
OS NúMEROS
O MAIOR
CRESCIMENTO POPULACIONAL
POPULACIONAL TENTANDO
CONTAR O
VALOR DOS
NúMEROS
I would like a run-able version of the original program to try to determine what is wrong with the EXAMINE, but here is a program that counts the number of unique words.
If your challenge involves cash, I want a cut.
DEFINE DATA LOCAL
1 #TXT (A) DYNAMIC
1 #WORDS (A15/10000)
1 #WORD (A15)
1 #W (I4)
1 #C (I4)
1 #I (I4)
END-DEFINE
FORMAT PS=50
ASSIGN #TXT =
'O CONTROLE POPULACIONAL é FREQUENTEMENTE UM MODO DE MUDAR DE ASSUNTO,'
-'SE VOCê OLHA OS NúMEROS, O MAIOR CRESCIMENTO POPULACIONAL E AS TAXAS'
-' SE VOCê OLHA OS NúMEROS, O MAIOR CRESCIMENTO POPULACIONAL.'
-' POPULACIONAL TENTANDO CONTAR O VALOR DOS NúMEROS. '
*
SEPARATE #TXT LEFT JUSTIFIED INTO #WORDS (*)
WITH DELIMITERS ' ,.:;!?/()'
NUMBER #W
FOR #I = 1 #W
ASSIGN #WORD = #WORDS (#I)
IF #WORD = ' '
THEN
ESCAPE TOP
END-IF
END-ALL
SORT #WORD
USING KEY
AT START OF DATA
RESET #W
END-START
AT BREAK OF #WORD
ASSIGN #C = COUNT (#WORD)
DISPLAY OLD (#WORD)
#C
ADD 1 TO #W
END-BREAK
AT END OF DATA
WRITE
/ ' Total words:' T*#C COUNT (#WORD) (NL=10)
/ ' Unique words:' T*#C #W
END-ENDDATA
END-SORT
END
Page 1 02/01/16 17:51:13
#WORD #C
--------------- -----------
AS 1
ASSUNTO 1
CONTAR 1
CONTROLE 1
CRESCIMENTO 2
DE 2
DOS 1
E 1
FREQUENTEMENTE 1
MAIOR 2
MODO 1
MUDAR 1
NúMEROS 3
O 4
OLHA 2
OS 2
POPULACIONAL 4
SE 2
TAXAS 1
TENTANDO 1
UM 1
VALOR 1
VOCê 2
é 1
Total words: 39
Unique words: 24
Hi Steve. Thanks for this, but the challenge was to count how many times each word appears in a text, considering all kind of punctuation.
The main challenge was to import a file, as you can see on line 24 and check the words.
In the end the program works perfectly, but I’m kinda bothered by this “bug” I think I found.
Hi Ralph. Thanks for this. I never used separate before… To be honest, I had no idea this statement ever existed.
I’m familiar with the help utility, but based on daily usage, I never even paid attention to this one.
This code is much much better than mine. Thanks very much.
About the challenge, my program works. I solved the bug with the “AND TRY EQ PL(IX)” in the line 54.
The challenge was to use all programming languages I know to see the smallest code.
Three friends wrote it in Java and C# and two in c++
I wrote in C# and Natural.
I started the program using an examine starting from, but here I can’t use it due to NAT0599 reason 10.
I could mess with COMPOPT, but the last time I did it the application admins weren’t very friendly :oops:
– edit
one doubt about this separate statement.
you set it to look for delimiters. If I put an ellipsis would it work too?
the first tests I did was with a different text, with all kind of punctuation. later I changed to a smaller text because everything was working, except for the examine I mentioned.
There is a reason both Ralph and I used SEPARATE rather than EXAMINE. The SEPARATE code, in addition to being more compact than the EXAMINE code, is FAR more efficient.
If you use the help facility and refer to the SEPARATE statement, you will see that you can indeed use an ellipsis in the SEPARATE as a delimiter.
Since you are competing for “smallest code”, I combined my code and Ralph’s to minimize code; you might want to “play” with the following:
DEFINE DATA LOCAL
1 LINHA (A250)
1 #ARRAY (A30/1:200)
1 #NUMBER (I2)
1 #LOOP (I2)
1 #UNIQUE (I2)
1 #VALUE (A30)
END-DEFINE
*
LINHA :=
‘O CONTROLE POPULACIONAL é FREQUENTEMENTE UM MODO DE MUDAR DE ASSUNTO,’
-‘SE VOCê OLHA OS NúMEROS, O MAIOR CRESCIMENTO POPULACIONAL E AS TAXAS’
-’ SE VOCê OLHA OS NúMEROS, O MAIOR CRESCIMENTO POPULACIONAL.’
-’ POPULACIONAL TENTANDO CONTAR O VALOR DOS NúMEROS. ’
*
SEPARATE LINHA LEFT JUSTIFIED INTO #ARRAY () WITH DELIMITER ‘, .’ GIVING NUMBER #NUMBER
*
IF #ARRAY (#NUMBER) = ’ ’
SUBTRACT 1 FROM #NUMBER / TAKES CARE OF FINAL DELIMITER
END-IF
*
FOR #LOOP = 1 TO #NUMBER
MOVE #ARRAY (#LOOP) TO #VALUE
END-ALL
SORT BY #VALUE USING KEY
AT BREAK OF #VALUE
DISPLAY 10T ‘WORD’ OLD (#VALUE) ‘OCCURENCES’ COUNT (#VALUE)
ADD 1 TO #UNIQUE
END-BREAK
AT END OF DATA
WRITE / 10T ‘TOTAL WORDS:’ COUNT (#VALUE) / 10T 'UNIQUE WORDS: ’ #UNIQUE
END-ENDDATA
END-SORT
END
Page 1 16-02-02 07:57:42
WORD OCCURENCES
------------------------------ ----------
AS 1
ASSUNTO 1
CONTAR 1
CONTROLE 1
CRESCIMENTO 2
DE 2
DOS 1
E 1
FREQUENTEMENTE 1
MAIOR 2
MODO 1
MUDAR 1
NúMEROS 3
O 4
OLHA 2
OS 2
POPULACIONAL 4
SE 2
TAXAS 1
Page 2 16-02-02 07:57:42
WORD OCCURENCES
------------------------------ ----------
TENTANDO 1
UM 1
VALOR 1
VOCê 2
é 1
TOTAL WORDS: 39
UNIQUE WORDS: 24
Could you post a list of all the punctuation characters you are concerned with?
I defined the punctuation on my QUEBRA variable. <’ ‘,’,‘,’.‘,’:‘,’;‘,’!‘,’?‘,’/‘,’(‘,’)'>
I solved the issue of the ellipsis using this huge block hahahahaha
IF #POSI GT 1 /* here I check the position of the punctuation. #POSF := #POSI - 1 /* if position > 1, i set #posf to -1 to determine the end of the word.
ELSE #POSF := 1 /* if punctuation is on first byte, that means it could be an ellipsis or !? or
** whatever else
END-IF
RESET TRY
MOVE SUBSTRING (LINHA,1,#POSF) TO TRY /* here i move it to a try variable to be sure if it is a punctuation
IF TRY EQ QUEBRA() / here i ignore if it is.
IGNORE
SEPARATE will consider consecutive delimiters, such as ellipsis and quoted questions (?"), as having intermediate null words. That is … would be considered period-blank-period-blank-period. That is why my program tests for a blank word within the FOR loop. You could remove the AT START by replacing #W in the ADD and WRITE statements with a new variable.
As for the EXAMINE bug, it isn’t. When TRY contains E or AS, those values are found within CONTROLE and ASSUNTO, respectively. You need another FULL, as in
EXAMINE FULL PL(1:X) FOR FULL TRY GIVING INDEX IX
As long as PL and TRY are the same length, you need only one FULL.
that’s other thing I have never seen around.
I thought the first full was to tell the statement that I was looking for the full array, complete words in PL(*) as the same word in TRY. Not part of the words in PL…
This thread is a real lesson to me. Thank you and Steve very much.
As curiosity, where do you guys use Natural?
I work for a Oil/Gas Company and have worked for a telecom and bank and NEVER saw these commands the way you shown me.
Here I even asked for the specialists help on the “bug” and they told me it was strange and shouldn’t behave that way. It was probably a bug… That’s the reason I came here.
One of the great things about Natural is how easy it is to learn.
One of the worst things about Natural is how easy it is to learn.
Both are true. Natural is so easy to learn, that many programmers learn the basics, then stop the learning process, thus missing out on the fantastic development power that exists within the language.
Both Ralph and I are long time Natural educators. That means we strive to learn all about Natural capabilities and how to employ them most effectively. Ralph has also worked with the State of California for quite a few years. I have consulted with a financial organization for several years.
If you follow the link that Ralph provided, you will see (on page 4) an article entitled “Deleting null array occurrences”. This provides two approaches to “compressing” an array by removing null occurrences, and, a timing comparison which shows the EXAMINE to be quite a bit faster than the COMPRESS / SEPARATE approach.
Somewhat counter intuitive, but both of the approaches are a lot faster than individual tests for blank array members. Since the two approaches operate on strings, not individual array occurrences, the performance differences get larger as the number of array occurrences increases.
I used the timing comparison from the article and added a test for a blank. I also increased the number of array occurrences from 10 to 100. Please realize that the functionality you were looking for could probably involve several hundred, if not thousands of words (array occurrences).
Here is the timing comparison. The EXAMINE really outperforms the other two approaches.