Problems using wm.tn.enumerate:nth to get paginated data from Trading Networks

I am using wm.tn.query services to export data from Trading Networks. The suggested way to do this is to obtain an ‘enumeration identifier’ from wm.tn.query:documentQuery service, then retrieve and process records one ‘page’ at a time using wm.tn.enumerate:nth. However, this process seems to be buggy. Any suggestions?

Here’s the skeleton form of my code, which has become quite complex. There are two REPEAT loops - the outer loop processes the pages one by one, the inner loop keeps re-reading the current page if the next page isn’t ready, and also decides when to exit the inner and outer loops.

INVOKE wm.tn.query:createDocumentQuery
INVOKE wm.tn.query:documentQuery (Return enumeration identifier to access paged query results)
SET N=1
REPEAT (OUTER_LOOP)
    INVOKE wm.tn.enumerate:nth (Read page N)
    <<Process data>>
    REPEAT (INNER_LOOP) (Only move ahead when next page is ready)
        IF N < pagesRead (the next page is ready)
            N=N+1
            EXIT INNER_LOOP
        DEFAULT: (re-read page N)
            Sleep 3 seconds
            INVOKE wm.tn.enumerate:nth (Re-read page N to refresh outputs like pagesRead, threadRunning, etc)
    IF N = pagesRead && threadRunning="false" && totalRows != -1 (the last page has been proceessed -- end flow)
        EXIT OUTER_LOOP
           

Processing larger datasets returns inconsistent results in consecutive runs (differing number of records). Also, changing the pageSize parameter in the wm.tn.query:documentQuery call changes the number of records returned. These are sure indicators something is wrong with the pagination process.

I am not seeing errors or exceptions. I increased the TN tn.query.maxrows parameter to 500000 to ensure TN doesn’t truncate results.

In comparison to the pagination approach, if I run a similar program that sets disablePaging=true in wm.tn.query:documentQuery, that program returns an increased number of records and does so reliably (record counts do not vary). However, this model is not suitable for large data export volumes.

Any suggestions?


PS: The code in this post is a updated version of the code in my older post here:
Can Trading Networks export document delivery data? - #7 by Sonam_Chauhan

*PPS: the outputs of wm.tn.enumerate:nth also don’t always behave as documented. For example, this set of logged outputs observed below contradict this documentation: "When the query completes, (threadRunning is false), the service provides the total number of rows. ". Here, threadRunning is false but totalRows is still -1. *
```
resultCount=161
rowsRead=0
pagesRead=1
threadRunning=false
totalPages=%totalPages% <unsubstituted, i.e. null>
totalRows=-1

This a total shot in the dark and grasping at straws, but maybe there is variable collision of some sort? Pipeline litter may be interfering at some step? Perhaps using pipeline scope techniques might help.

Edit: Odd that creating the query does not provide a way to control record order. Or describe what the default (or only?) order is. Consistent order of records is usually key for paging to work as expected. But I’m likely missing something.

Thanks @reamon - even your shots in the dark usually are uncannily on target.
I don’t think it’s pipeline related because of inconsistent record loss. But I think you are onto something here:

Edit: Odd that creating the query does not provide a way to control record order. Or describe what the default (or only?) order is. Consistent order of records is usually key for paging to work as expected. But I’m likely missing something.

SQL generated for B2B TN transaction queries in 10.1 does do ... ORDER BY d.DocTimestamp DESC to order the results by timestamp.

There seem to be a few different ways to do paging:

The latter two options are ‘modern’. But I’m using TN 10.1, which ran on Oracle 11g originally. So it’s likely only the first (ROWNUM based) pagination method applies to my environment. Regarding this method, the AskTom article states:
One important thing about using this pagination query is that the ORDER BY statement should order by something unique. If what you are ordering by is not unique, you should add something to the end of the ORDER BY to make it so.

Basically, if the identifier used for ordering isn’t unique and has duplicate values, the query cannot paginate deterministically— the same sort order is not assured from run to run of the query.

Timestamps in the MWS transaction query are certainly not unique. So I wonder if this is causing the problems I’m seeing.

These are just musings at the moment. I plan to open an SI.

I’ve opened a support incident SI-518490 on SAG and plan to update their advice back here.

I suspect the root cause may be the record order issue you mentioned @reamon - thanks for your help!

I think I’ve fixed it. I had a bug where the inner loop (REPEAT LOOP_FOR_CURRENT_PAGE) was set to EXIT on FAILURE (the default) instead of EXIT on SUCCESS (an infinite loop). As a result, the inner REPEAT loop that’s meant to ‘spin’ on the current page was exiting on initial execution and not looping at all.

The updated pseudocode for this service is below. The service is quite complex because it has two nested REPEAT loops. These deal with the fact that document results must be obtained page by page but the service can only proceed to the next page of a result set when the page is read by the database query thread (i.e. service execution should not ‘overtake’ the database query thread). The outer REPEAT loop processes the pages one by one. The inner REPEAT loop ‘spins’ on the current page, re-reading it until the next page is ready. The inner loop also exits both the inner and outer loops as necessary.

SERVICE PSEUDOCODE

INVOKE custom service to create document query with dates
INVOKE wm.tn.query:documentQuery (Return enumeration identifier to access paged query results)
SET pageNum=1 (page counter)
REPEAT LOOP_FOR_ALL_PAGES (Outer loop to read results page by page)
    INVOKE wm.tn.enumerate:nth (Read pageNum)
    <<Process data for export>>
    REPEAT LOOP_FOR_CURRENT_PAGE (Inner loop to only move ahead when next page is ready, or exit loops)
        IF pageNum < pagesRead (the next page is ready, increment pageNum and exit inner loop)
            pageNum =pageNum +1
            EXIT LOOP_FOR_CURRENT_PAGE (exit inner loop)
        IF retry count for LOOP_FOR_CURRENT_PAGE >=20
            Log failure and terminate flow
        IF pageNum =pagesRead && threadRunning="false" && totalRows !=-1 (empirically determined conditions to flag if last page has been processed)
            EXIT LOOP_FOR_ALL_PAGES (exit outer loop) 
        DEFAULT: (re-read page N)
            Sleep 3 seconds
            INVOKE wm.tn.enumerate:nth (Re-read current page to refresh outputs like pagesRead, threadRunning, etc)

SAG SI-518490 is still open to investigate difficulties developing this service. Specifically, the outputs of wm.tn.enumerate:nth also don’t always behave as documented, as noted above. Hence the specific conditions that flag the last page has been processed (exiting the inner loop) were empirically determined and implemented (refer to the service pseudocode above).

Now, running the service with different page sizes returns the same number of results.

1 Like

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.