Wednesday, October 18, 2017

Selective persistence of Oracle Diagnostic Logging (ODL) output

Background and Goal

In any application, logging is widely used for diagnostics and debugging. 

Logging at various "checkpoints" (such as entering with request, exiting with response, error handler) in the application can provide a fairly reliable way to trace the execution path of the application - which a subsequent sweep or count can be used to report on. When the logs are regularly analysed and reported on, anomalies can get flagged up proactively and investigated further. Some examples include: a report on requests without a corresponding responses, and a report on fault counts and fault codes.

For such reporting, one can write fairly complex scripts to extract the required data OR persist specific log entries to a database for querying using SQL tools or even a custom-designed user interface. This blog post shows an easy approach to achieve this. 

Oracle Diagnostics Logging

In Oracle Fusion Middleware, the Oracle Diagnostics Logging (ODL) framework is responsible for handling all application level logging. The main configuration for ODL is stored in a file called logging.xml, a copy of which is configured in each managed server and the Admin server (this configuration can also be done via the fusion middleware control)
Briefly, the ODL consists of a set of pre-defined "log handlers" (represented by log_handler elements in logging.xml that in turn use an implementation class or interface and filters and are configured using a set of nested property elements) 

Each application or sub-system is then identified by a logger element - where a logger can use one or more log handlers. Loggers form an inheritable hierarchy where sub-loggers can inherit parent logger properties but can override these as well. 

Loggers called "console-handler" and "odl-handler" are two common ones - sending outputs to the server console and a ${SERVER_NAME}-diagnostic.log file respectively. 

Oracle Service Bus (OSB) Pipeline logs

In the Oracle Service Bus, "log" and "report" are two actions that can be used in complementary ways. 
The Log action used in OSB pipelines specifically uses the ODL framework to write content to the configured log files. The logger used is oracle.osb.logging.pipeline and in the default setup, the output goes to the odl-handler and in turn, to the server specific diagnostic.log file. 

You can view the standard/default configuration of the odl-handler in logging.xml

Custom Log Handler for OSB pipeline logs

I created a slight variant of the odl-handler and called it SBMessageFlowtraceHandler
This is the configuration I created for it:


The key differences from the configuration for odl-handler are:
* The output generated is an XML structure instead of the default text
* I used a rotation frequency of 1 minute - I will come to the reason later
* Used a specific file name pattern and location 

The log handler created above was then mapped to the OSB logger below (which is also a parent logger of oracle.osb.logging.pipeline:

The result of the above configuration is that any log actions used in the pipelines start getting routed to my new osb_managed_server-diagnostic.xml log file. One such entry can be seen below:

We can observe that each log entry is represented by a "msg" element. This includes the original content we logged within the "txt" element but also a lot of interesting metadata:
1. Timestamp of the message in the time attribute
2. host name / IP address where the OSB managed server was running
3. ECID - an invaluable identifier that can be used to search for other related entries in all related OSB logs across servers
4. Within the txt element, there is the 4-tuple structure enclosed in square brackets that includes the route name and whether it was the REQUEST, RESPONSE or ERROR handler pipeline. 

To map these to design time, I have included this image of the relevant section of my pipeline:

Log extraction and Persistence

Once we have made the diagnostic logs available in a well defined format and location, the solution is just like any other OSB project that uses a file transport based proxy service. 
We can use a simple file transport/poller, combined with an XSL transformation and the DBAdapter to save this data into the database!
I have created a draft OSB project with JDev/OSB 12.2.1 that does precisely this here:

The database includes a simple table with these main columns:
1) MessageDateTime - timestamp of the original message, which we can pick from the logs as shown above
2) AuditDateTime - the timestamp when the data was inserted into the database (there is likely to be a slight delay between when the logs are written and when they are polled for persistence)
3) MessagePayload - to store the actual content of the message: which would either be the full content of the 'txt' element or the part after "log summary:". In my example, this is of type XML (SQL Server)

You can of course create your own custom database schema and pick and choose the content and format of data you need inside it. You can choose to save all logged messages or filter based on a certain log level or even certain keywords in the "summary" content of the Log activity. 

Workarounds etc.

1) To prevent the SBAuditLogger from reading the same log file that the server writes to, I used the following "mask" in the file proxy: *flowtrace-*.xml

This means, only 'rolled' files are read (which I already specified to be 1 minute, so the rolling is frequent). The proxy can be configured to delete or archive the files that are read just like any other file polling proxy service. 

2) To avoid situations where the "txt" element comes in with nested CDATA sections, I had to write this small Java utility called in the SBAuditLogger pipeline via a Java callout. 


1) Adding a log action in an OSB pipeline:

2) Configuring Logging and log files / Understanding Oracle Diagnostic Logging:


Tuesday, October 17, 2017

Geographical clusters with the biggest concentration of web services

From a data set of approximately 145 million IP addresses running at least one publicly accessible web service (such as a website), I was able to determine these 20 geographic "clusters".

Note that these are not "18 locations with the most number of web services" but data set clustered into 18 geo locations - similar to centres of gravity. (Which is why some of the clusters appear in the sea but that just happens to be a location central to other clusters close by - such as Southern India and South-East Asia ). It is not surprising that many of these correspond to major technology hubs in the United States - including the Silicon valley areas of California and Seattle. (I did a UK-only analysis and the data clustered around London, a location between Oxford and Cambridge, and a location between Glasgow and Edinburgh - I'm guessing somewhere around Linlithgow). It is a sheer coincidence that these clusters should correspond to specific well known locations - the prevalence of a large number of web hosts at these locations happens to skew the clusters close to these locations.


It is possible for nearly any Internet connected device to run a service accessible over the Internet, as long as it has a public IP address. (All my discussion and analysis below assumes IPv4) 

A "service" in this context could any data or functionality exposed to "clients" using protocols running on TCP/IP - in other words, something that can be reached using an IP address and a TCP port. To actually talk to the service, you need to know its protocol - http is a widespread example. 

Historically, the way IP addresses have been allocated has been in chunks of "networks" assigned to companies or ISP's rather than using some mathematical scheme. Each IP address has some metadata associated with it. Because companies and ISP's that get addresses allocated to them are physically located in some part of the world, the IP addresses allocated to them also therefore get a de-facto geo-location "allocated" to them. (It is possible that large ISP's divide their IP address into smaller sub-nets with different geo-locations assigned to groups of IP addresses). 

To find out yours, you can search  "My IP address" and find out the geo location information about yours from or even by searching "My Location" on Google. 

The IP address data I used is available from . 
The data model that they store for each IP address looks like this and includes the latitude and longitude of each IP address where available:


Machine learning is heavily based on statistical analysis and probability techniques and clustering is one of them. With the large computing power ("cloud scale" distributed computing) and large data sets available to us today, we are able to perform some very interesting analyses. The raw data I used is in JSon format, acquired from and stored on an S3 bucket. To process the data, I used a cluster of Spark nodes. The code to process the data is python with an existing machine learning library to cluster the data and executed with pyspark. 
The crux of this experiment was that 145 million potential geo-location pairs were reduced to  or clustered into a group of 20 - a set we can make practical use of (even if that is in writing a blog post) 


Map Plotted using:


1) Introduction to clustering:

Saturday, October 07, 2017

Raw results - countries list with total IP (IPv4) addresses


Presented below is a list of countries (country codes) and the total count of live IPv4 addresses where a public facing service (such as a website) might be hosted as counted from the scan data of 1st October 2017

The reason these don't quite add up to anywhere in the ballpark of 4 billion (the total IPv4 address space) is because the data set I used might only be scanning for hosts that run some public service exposed over a TCP port (e.g. a website running on port 80 or 443)

The numbers definitely look incorrect and total up to only 145,430,195 - I will continue to investigate why, but they seem to be in proportion. 
It is likely that are only able to gather data about live IP addresses at the time of the scan as opposed to total allocated ones)

|     LT|  120718|
|     DZ|  362827|
|     MM|    3494|
|     CI|   18954|
|     TC|     675|
|     AZ|   39468|
|     FI|  220723|
|     SC|   83878|
|     PM|     323|
|     UA|  768681|
|     RO|  730479|
|     ZM|    9618|
|     KI|     274|
|     SL|     474|
|     NL| 3077280|
|     LA|    5319|
|     SB|     746|
|     BW|    6165|
|     MN|    9664|
|     BS|    7838|
|     PS|   36320|
|     PL| 1539024|
|     AM|   57860|
|     RE|    6976|
|     MK|   41856|
|     MX| 9361233|
|     PF|    7506|
|     TV|      41|
|     GL|   10279|
|     EE|   74403|
|     VG|   13871|
|     SM|    2514|
|     CN|11905007|
|     AT|  403069|
|     RU| 4002435|
|     IQ|   76489|
|     NA|   13269|
|     SJ|     125|
|     CG|   13541|
|     AD|   12536|
|     LI|    6136|
|     HR|   84459|
|     SV|  134530|
|   null|  827348|
|     NP|   22618|
|     CZ|  434625|
|     VA|     409|
|     PT|  278365|
|     SO|    1158|
|     PG|    3291|
|     GG|    2601|
|     CX|     125|
|     KY|    5329|
|     GH|   11492|
|     HK| 1127634|
|     CV|    1745|
|     BN|    6363|
|     LR|     769|
|     TW| 2785149|
|     BD|   88409|
|     LB|   43745|
|     PY|   33953|
|     CL|  340123|
|     TO|     756|
|     ID|  495095|
|     LY|   18077|
|     FK|    1158|
|     AU| 1875091|
|     SA| 1098611|
|     PK|  279205|
|     CA| 3073028|
|     MW|    5162|
|     BM|    6359|
|     BL|     104|
|     UZ|   12856|
|     NE|    1597|
|     GB| 5182929|
|     MT|   20472|
|     YE|    6356|
|     BR| 3554113|
|     KZ|  400583|
|     BY|   59159|
|     NC|   18117|
|     HN|   25888|
|     GT|  115383|
|     MD|  107923|
|     DE| 6338938|
|     AW|    2612|
|     GN|    1140|
|     IO|      65|
|     ES| 1810492|
|     IR|  609566|
|     NR|     178|
|     MO|   26437|
|     BH|   24639|
|     EC|  210964|
|     VI|    1233|
|     IL|  337670|
|     TR|  751779|
|     ME|   26218|
|     VE|  660044|
|     MR|    3197|
|     ZA|  453373|
|     CR|  122065|
|     AI|     469|
|     SX|     869|
|     GU|   21634|
|     KR| 4705816|
|     TZ|   14240|
|     US|45381144|
|     RS|  128773|
|     MS|     262|
|     AL|   45857|
|     MY|  462057|
|     PN|     125|
|     IN| 2169583|
|     JM|   16720|
|     CK|     650|
|     LC|    1418|
|     GM|    1627|
|     AE| 1001729|
|     MQ|    5890|
|     CM|    9684|
|     RW|    3714|
|     TG|    1992|
|     FR| 2709666|
|     GF|    1521|
|     CH|  544074|
|     MG|    5532|
|     CC|     124|
|     TN|  293295|
|     GQ|     759|
|     NU|     136|
|     TL|     745|
|     WF|     479|
|     GR|  243484|
|     PA|  200845|
|     TD|     519|
|     GI|    5229|
|     SD|   15635|
|     AG|    4250|
|     MC|   10245|
|     DJ|     723|
|     JO|   40809|
|     BA|   59273|
|     ET|    1776|
|     SG|  734373|
|     KP|     319|
|     BF|    2820|
|     IT| 3523490|
|     CU|   13847|
|     GW|     254|
|     FO|    1282|
|     MV|    9439|
|     SE|  663630|
|     PH|  392585|
|     WS|    1259|
|     BG|  538707|
|     FJ|    3198|
|     GE|   61683|
|     SK|  128175|
|     FM|     906|
|     MH|    1745|
|     CW|   21457|
|     LV|  102735|
|     MU|   27736|
|     PE|  275323|
|     LS|    5507|
|     MZ|   12728|
|     GD|    3400|
|     DM|     646|
|     KM|     389|
|     DO|  554824|
|     QA|   34995|
|     XK|     581|
|     BZ|   12967|
|     TH| 1366956|
|     EG|  327882|
|     SH|     125|
|     BI|     771|
|     BJ|    1948|
|     MF|     429|
|     GY|    3847|
|     JP| 3299718|
|     TM|     572|
|     VC|    5377|
|     ZW|   11952|
|     SN|   12707|
|     NZ|  401608|
|     OM|   49103|
|     LK|   33816|
|     BT|    2126|
|     HU|  407222|
|     KN|    2990|
|     KE|   32116|
|     SI|  130608|
|     CY|   32025|
|     ML|    9998|
|     HT|    7375|
|     GP|    4018|
|     UG|    7357|
|     IE|  636087|
|     KW|   64836|
|     GA|    8910|
|     VU|    1473|
|     BE|  347894|
|     MA|  227130|
|     AS|     320|
|     KH|   33846|
|     NI|   53612|
|     KG|   14067|
|       |  649814|
|     TT|   32719|
|     SY|   75436|
|     NO|  368080|
|     BO|   93018|
|     ER|     257|
|     CO| 1135399|
|     IM|    7208|
|     SS|     570|
|     UY|   75799|
|     NG|   37838|
|     JE|    4069|
|     YT|     232|
|     AR| 1273489|
|     CF|     249|
|     PW|     251|
|     PR|   27204|
|     TK|     135|
|     LU|   56661|
|     SZ|    5313|
|     NF|     125|
|     VN|  880606|
|     IS|   50124|
|     MP|     529|
|     AF|   14127|
|     BB|    5340|
|     BQ|    4461|
|     SR|   23450|
|     DK|  772845|
|     CD|     458|
|     TJ|    5421|
|     AO|   17188|
|     AX|    1292|
|     ST|     335|


Friday, October 06, 2017

How many programmers does it take to update a Wikipedia page?

......or what it took to count the number of IPv4 addresses in every country (as of 1st October 2017). 

This Sunday, I found that the Wikipedia page on List of countries by IPv4 address allocation was using data from 2012. I wondered what it might take to add more up to date information on that page. During a recent course I attended, I got to know about - a fascinating project that involves periodically scanning ALL of the IPv4 address space and storing as much of publicly visible metadata about the active addresses as possible (location, ISP, open ports, services running, operating system, vulnerable services running if any). Each daily dump of the IPv4 address space is close to a terabyte.
An individual IP address record is represented as a JSon object - part of one of the records is shown here:

There is a lot of information to be gleaned from analysing this data - some might have very useful applications and some purely to satisfy curiousity. Also, copying the raw dataset is not the only way to analyse this - might allow querying their data directly on request.
 Given the volumes, this clearly falls in the realm of a Big Data problem and any querying or analytics on this is best achieved using a distributed approach - so this is a perfect problem to leverage fully cloud based resources.

Stage 1:

Copy the latest data set to an S3 bucket.

This might sound easy but the full data is close to 1TB. Ideally I would have preferred a more distributed way of transferring this data. But for now, old fashioned wget from the and then an "aws s3 cp" to S3 storage did the job. 
wget of the compressed data set took around 24 hours and "aws s3 cp" of the uncompressed data took just under 48 hours (with a few hours in the middle that it took to uncompress the downloaded lz4 file). 

For intermediate storage, I created an instance with 2TB of storage. The cost didn't seem bad if all my data transfer completed within a day or so.

Test run:
wget --user=jvsingh --ask-password

The actual command to get that ~221G file (compressed version):
nohup wget --user=jvsingh --password=***** &

(used nohup as I know it was going to take hours so didn't want to keep my ssh terminal open just for this)

For the second stage of uploading the uncompressed file to my S3 bucket, it seems an elegant and faster way might have been to use a multipart upload using a distributed approach. But looking at the upfront setup required for it, I decided against it for this particular test. 

Stage 2:

AWS Setup - I already had an aws account with an SSL key-pair for the region I selected (the cheapest in terms of instance costs and also costs of S3 storage - to avoid intra region data transfer costs and possible network latency, I used the same region for both my S3 bucket and spark instances). 
Additionally, to allow command line tools (such as flintrock) to connect and operate the AWS account, I had to install and set up the local aws command line interface - which requires a pair of credentials generated through AWS - IAM 
I had also previously created an S3 bucket to hold the 1 TB data file. This would allow multiple spark instances to access the data, which otherwise won't be possible or too complex to set up with general purpose disk-like storage (might be possible with hadoop distributed file store but using S3 here definitely saved me from a lot of extra configuration)

Stage 3:

Download and install flintrock, configure its yaml configuration (here's their template) to set up the spark cluster. This is convenient as I intended to do this on AWS, which is very easy to set up with flintrock. (I used an Amazon Linux AMI - the rest of the setup is self-explanatory in the template)
I start an initial cluster with 3 worker nodes. 

One can configure a spark cluster without flintrock as well - I found a set of steps here. Flintrock made things a lot easier. 

Step 4:

  • Login to the spark master instance
  • Submit the spark job using spark-submit 

nohup ~/spark/bin/spark-submit --master spark:// --executor-memory 6G --packages com.amazonaws:aws-java-sdk-pom:1.10.34,org.apache.hadoop:hadoop-aws:2.7.2 --conf "spark.driver.maxResultSize=2g" > main_submittedjob.out &

I first executed a dry run on a smaller 1GB dataset to make sure everything was ready and working. A snippet of results from the dry run is shown here (I used country_code instead of country name to be safe - these can always be translated and sorted later - at this point I am eager to get the main counts):

  • Gradually increase the number of worker instances to see the data analysis speeding up as the work gets distributed evenly on the newly joined instances.
"flintrock add-slaves" does this seamlessly for most part (it installed spark and other libraries) 
I did have to manually log in to each new instance and use the command 
spark/sbin/ spark://master_host:7077  
to ensure they got added to the cluster

After this, I could sit back and watch with satisfaction the jobs (rather individual tasks) getting evenly redistributed on the new nodes. 
  • Watch progress on the spark master console and wait for the final results to appear!

Shown below, the job stages console, 30 minutes in -

Coming up: The actual results

(if nothing breaks down till then!)
I posted my initial results here - sorry to report, the counts don't quite add up. Will investigate why in due course.


1) Paul Fremantle (WSO2 co-founder)- for the tools and techniques he taught on his Cloud and Big Data course at Oxford
2) for the idea of scanning the whole of IPv4 address space, the initiative and execution

Saturday, September 30, 2017

Test driven SOA: Tool kit for comprehensive automated test coverage

In this post I am going to share some tools I find useful when developing components for the Oracle service bus - same principles should apply to the integration cloud service as well. 

If we are not test first (or at least test alongside) programming, we are essentially debug later programming (See "Physics of Test Driven Development").

If the enterprise service bus sits in the middle of an organisation's messaging and integration landscape, there are some key architectural principles that help in getting the best out of any service bus solution:
  • It is not the place for business logic but for integration logic i.e. heavy on message transformations and often enrichment
  • Any operations, message flows or pipelines that the service bus exposes should be stateless and without side effects (ideally). To achieve this, a lot would depend on backend services too - they would ideally need to be idempotent. 
  • Exposed interfaces must be designed to be canonical while invoked endpoints abstracted away so that calling systems are decoupled from calling systems (and then there are non-functional elements of decoupling that the Service Bus can help achieve too such as by messaging - but this post is not about the value addition of Service buses)
  • Like any other software, it must have comprehensive unit test coverage (no, not the platform but what we have developed on it) and I might be stating the obvious here but I often find test coverage inadequate at many FMW customer sites. 
Whatever transformations, validations or enrichment the service bus does to incoming messages, must have some test coverage. Having a good test coverage means the solution is less prone to regression defects, easier to change and the whole solution is more agile (Agility comes with good practices and tools and not with ceremonies with strange names like scrum). 

Often I go to a customer site where they have important business data flows running on an ESB solution with complex data transformations. You never know how a change in some field or some complex template or xpath expression might lead to some unrelated side-effect. Needless to say, unless I find an exhaustive set of test cases (and surprisingly often I don't - maybe that's why they call me in the first place), the first thing I would do is to create some - this is the only way to ensure that the external interfaces to the system continue to work as before and after I make a change (except for the specific change I intended to make of course). 
It is also invaluable when I have to make improvements to the system - such as refactoring to improve old code. 

Some technical scenarios that we can address (frequently seen with Service bus implementations)
http to file/jms/http/database, File to File, File to JMS, JMS to JMS, JMS to http and other combinations thereof in more complex orchestrations, such as file to http and then JMS
Data formats exchanged can also vary: Native text, XML, JSon, binary

Requirements from a test framework (from an ESB point of view):
* One click to run multiple test cases
* Visual indication of pass or failure
* Can be run with mainstream build/CI tools (such as the popular maven)
* Ability to mock http endpoints 
* Ability to assert (equality, pattern matches)
* Ability to "Diff" - i.e. identify differences between two pieces of text but also identify differences between two XML or JSon documents 

For unit testing one can consider the Service bus (or ICS) as a message transformation black box and get the test framework to interact with endpoints only: filesystem locations, JMS destinations, inbound http endpoints, mock http endpoints. 
Again, for unit testing, I keep all endpoints on the local server (with invoked http endpoints served by a mocking tool) and an OSB Customisation file specifically for the test instance (which points to mock http endpoints where required, in addition to the local/test JMS destinations etc)

The tool kit that I have found most effective and have been using a lot lately:
1) JUnit - plain old tried and tested with all the power of Java at hand. 
In the Fusion middleware environment, we get access to all the weblogic client libraries (full and cut-down versions) to interact with JMS queues. I have made variants of this A-Team example for different scenarios such as to read a specific number of messages which I expect for a specific input. 

2) WireMock - Easy to set up and use and is effective. I only had to add this dependency to my maven POM file and with the WireMock import, I was ready with my mock http service (I have not tried the individually downloadable jar). For individual test cases, I could reply with different XML or JSon responses with different data and statuses (success, failure). 
The assertions can be performed at specific XPath level (ensuring that a specific XPath contains the value you expected) or at the full document level. 

Worth noting that in the SOA Composite test framework, we can mock endpoints as well, in addition to running them as part of a maven build - but my post is focused on OSB. 

3) XMLDiff - This is an API hidden away in one of the FMW libraries (Oracle XML Parser). 
For normal XML manipulation, we often get by with java DOM/SAX API's. However, I found XMLDiff very handy when comparing two XML documents which we often need to do in test scenarios. 
Think how you would compare an actual XML payload with an expected XML payload - XMLDiff does it for us by identifying the specific xpath where it found differences. 

Again, in a FMW environment you can add it as a library in JDeveloper or the following dependency in the maven pom:

The output of many of the diff operations is another XML document listing the differences. If it contains no "append-node" or "delete-node" elements, it means the documents are identical. 

4) SOAPui - last but not the least of course - this is a no-brainer for initiating unit tests for exposed http endpoints. Easily achieved by adding it as a plugin in your project pom.

The tools can then easily be extended to make repeatable, automated integration tests. Additional frameworks can add value where desirable (Citrus, Cucumber seem popular)

One final point: In addition to making code less prone to regression defects and more change friendly with potential to allow more frequent releases, test cases also serve as a "source of truth" repository of the business rules actually implemented in code - the more there are, the better. 
Documents might go out of sync, people might leave and forget to update documents, and then there is the semantic gap between documented language and code. If a test case says a field HEIGHT cannot exceed 9.99, then only a passing test can prove that it in fact doesn't. 
So given any business requirement, my priority would be to write failing tests first to document those requirements, write code that fulfils those requirements, accommodate all the "changes of mind" (whether genuine or ......) in a more agile way, and put everything into documentation once dust settles. 

Coming up: more sample code for testing Service bus "code", less essays. 
In the meantime, I can flaunt the e-unit tests I wrote the last I tried my hand at Erlang. It is a small component of a larger programming assignment I had to do and the assessment report said my software met the most number of requirements. I attribute this hands-down to the adequate test coverage I had added right from start. 

Summary: TDD allows us to write more complex software and keep it maintainable, more change-friendly and more responsive to change