Sunday, May 10, 2020

Recursive calls in Oracle Integration Flows (Scenario: Paginated API calls for large Data Sets)

A number of use-cases can be implemented cleanly using a recursive approach. This post is not to debate the pros and cons of recursion versus looping but provides a simple approach to achieve this.
For scenarios such as the ones listed below, and possibly more, this approach is quite efficient, concise, maintainable, and most importantly, it is highly scalable. It also leaves a smaller runtime footprint with a smaller execution time per instance than a looping flow instance. This also makes error handling easier as I will describe later. 

  • Polling (continuously monitoring an FTP location, a database, or an API output)
  • Paginated API's (when the target system exposes an API with a paginated* interface such as the eBay findProducts operation)
  • Retryable flows

Paginated Calls 

Many software systems store large sets of data. For instance, a vendor might have hundreds of thousands of products and product-prices listed on EBay or an eCommerce store. They might have millions of transactions in their PayPal account. 
There might occasionally be legitimate scenarios to fetch all of this data. The software system can provide various interfaces to export such data in bulk. Such a bulk data export interface could be raw data files, access to a database, but also a standard API like a Rest Service. 

For reasons of performance and good practice, such Rest API's (typically a GET operation) would limit the amount of data returned in any one call by using a pagination strategy. An eCommerce store could list hundreds of thousands of products for instance, and it wouldn't be appropriate to return such a large data set in a single call. A pagination strategy would involve setting parameters like pageNumber and pageSize

In the eBay example of findProducts API, we see two parameters called PageNumber and MaxEntries. It is worth looking at the descriptions of these fields on the API documentation - for instance, if the MaxEntries value is not specified, it defaults to 1, i.e. only one product is returned in the API response. MaxEntries in the eBay API also has an upper limit, which means, if the number of products is very large, it would involve multiple calls while keeping track of which pages have already been fetched. 

Another example of a paginated API's is the Oracle Commerce Cloud's (OCC) listProducts and similar API's. listProducts exposes two request parameters called offset and limit that serve to achieve paginated data extraction through the API. 

Example Scenario

For our example, we will define a simple scenario that requires us to fetch ALL the data from such an API using an Oracle Integration Cloud (OIC) flow. There are of course many ways to achieve this, a loop, in combination with some state management being the most straightforward choice. In this article my focus is on the recursive approach. 

We call our flow the OICFetchPaginagedData_Flow and it will exposes an asynchronous interface (this is important). It takes the pageNumber as one of its input parameters (pageSize can be kept configurable in an OIC Lookup, or it can also be provided as a parameter)

Execution Steps

Apart from parameter validations, error handling, logging, the flow performs these main steps: 

1) Call the target service (like the eBay findProducts API or Oracle Commerce Cloud (OCC) listProducts API) using pageNumber as parameter. 
For eBay, this would be mapped to the PageNumber parameter whereas for OCC, the value pageNumber multiplied by pageSize would be mapped to the offset parameter. 

2) Switch condition:
     a. IF number of results is zero, THEN terminate the flow
     b. IF number of results is greater than zero, THEN
            call  OICFetchPaginatedData_Flow again
     using connection OICFetchPaginatedData_Invoke
            with parameter pageNumber = pageNumber + 1

Technical implementation

My test implementation for this post consists of these technical components:

  1. OICFetchPaginatedData.wsdl Abstract wsdl that defines the interface for our OIC flow
  2. OICFetchPaginatedData_Trigger: OIC SOAP trigger connection that uses the wsdl above
  3. OICFetchPaginatedData__CONCRETE.wsdlConcrete wsdl which has the actual address of the OIC instance where our OIC flow would be deployed. This address must be set before deployment in the service address element of the wsdl - change the host portion only. 
  4. OICFetchPaginatedData_Invoke: OIC SOAP invoke connection that uses the _CONCRETE wsdl
  5. OICFETCHPAGINATED_FLOW_01.00.0000.iar: This is the Oracle Integration flow called OICFetchPaginatedData_Flow. Importing this iar into the OIC design time should create all the required connections. However, be sure to replace the _CONCRETE wsdl with one that has the address to your actual instance. 

At design-time, the flow looks like this:

Execution and Runtime

The flow can be launched by using a SOAP request like below:

Runtime instances should look like this:

Error handling

  • The simplest and default would be that if one of the page fetches hits an error (like a sporadic downtime of the target API), the instance would sit in the Error Hospital until recovered. 
  • Slightly more complex error handling strategies can be implemented - for instance, the error handler block could simply call the next page instance and then set the current instance to stay in error hospital for later recovery (using an error rethrow or create error action of OIC) - this allows the rest of data to be fetched even if one page hit an error, pending recovery only for one instance/page.
  • The same strategy in combination with other parameters (like a retry count) could be used to retry an error a few times until a max limit is reached, before sending the OIC instance to error hospital. In fact, this recursive approach can be easily used to create retryable integration flows (also see this relate post in Idea Lab).

Limitations / Future extensions

  • It might be useful to get the same setup to work with the Rest adapter (i.e. using connections for both Invoke and Trigger)
  • Although the "Local Integration" feature is useful to invoke integration flows deployed on the same environment, this option does not show "self" or doesn't allow recursive calls as of this writing. It is unclear if this is by design or will be available in future.
  • It is important to get the "exit" conditions correctly when all data has been fetched, or we could run into an "infinite"recursion situation. This risk however also exists in a loop implementation as well and simply needs careful condition checks. Also, in the worst case scenario, the Integration flow can be deactivated mid-flight, which should abort all running instances, before re-activating it after a fix. 
  • A logical next step would be to try to get some of the page-fetches to work in parallel (this of course depends on capacity of the OIC instance and also the concurrency supported or allowed by the target system)
Other than these, I think this recursive approach that I described offers a better solution for many of the scenarios I listed, it leads to a concise design, and short-running instances. 

The article and associated samples work as of OIC Version: (200401.0200.34935)
Root directory of the sample application is here:

No comments: