WebSphere Default Messaging (SIBus) and WebService Error Handling
An interesting situation arose the other day for a solution to the following requirements on WPS 6.0.2.3:
- We have a JMS queue that collects message for a web service Endpoint.
- We have a web service that is only available during business hours (say 9-5).
- If we send a request outside business hours, we will get an exception.
- During business hours, a failed transaction should be re-tried for four hours before it is recorded as a failure.
- The retries to the service should occur X minutes apart.
- The solution should have the minimum ‘moving parts’. If it requires multiple scripts and cron jobs, it’s too complicated.
The problem:
If we don’t do anything special, all our transactions end up in the Failed Event Manager where someone is going to have to manually re-submit them. Manual anything is a bad solution in Integration, so we need to look for something better. Default messaging is pretty bad at the concept of “Stop picking up messages if the endpoint I’m trying to call is down”. It will pick every single message off the queue, fail, and put it into the FEM.
The solution that immediately came to mind was to leverage the JMS queues. I could put all the requests in a new queue that service requesters would place their messages.Next, I would install an MDB that enforced this policy on that request queue. If the message on the queue met the above requirements, it will be placed onto the JMS Queue for the WebService Endpoint. Basically, the MDB would act like a gatekeeper.
The problem that I ran into was how to deal with messages that the gatekeeper didn’t want YET (say outside office hours). I thought JMS message selectors would do it but they’re only static. How could I start processing messages when the office door went from CLOSED to OPEN at a specific time? I was unable to resolve this issue.
The solution that we went went involved using a Long Running BPEL process to meet all the above requirements. A short running BPEL is no good because it must run within a single transaction, meaning 180s to resolution. Increasing this to four hours is unreasonable. A Long Running BPEL is composed of multiple transactions and is not bound by this 180s limitation. It includes dynamic duration nodes that will allow it to sleep until office hours open, sleep until the next retry and calculate how long it’s been trying to send. Also, no ‘moving parts’.
It kind of sucks to use a long running BPEL process to solve a technical issue instead of a business problem, but it does what we need without exposing our WID/WPS developers to the world of RAD/WAS.
Related Posts
- MQ Import Anti-Pattern: Multiple MQ Imports connecting to single pair of physical request/response queues
- BPEL: Beware the use of nested loops in a short running process
- Asynchronous SCA One-Way Operations And How To Recover
- WebSphere Process Server Endurance
- SCA, BPEL, Websphere Adapter for JDBC and Transactionality
You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

June 23rd, 2008 at 2:23 am
Hi,
we have pretty much the same challenge. I would be surprised if we are the only ones that want support for automatic retries against backend-systems…so I believe this should be better handled by WPS. FEM is to me a good last-resort queue, but typically not a good structure for holding messages while systems are down.
We have also considered the JMS-solution. I believe it should be feasible. Outside of business hours I expected that it should be OK to check the time, roll back the message, sleep, and retry later.
However, we are currently pursuing to use the FEM API for automatic resubmit (having a batch job look through failed messages and decide which ones should be resubmitted). We’ll se if this turns out OK.
June 23rd, 2008 at 9:49 am
Hi Trond,
Yeah, it’s a very common pattern that I was surprised is so difficult to implement.
From what I have heard from a developer who attempted to use the FEM API to get a handle to the business object contained in the event and he said he was unable to get it (or that there was no method). So if you get something like this working, it’d be awesome to let me know.
Cheers.
June 24th, 2008 at 8:51 am
Dan,
Just tried it, and it seems to work pretty good. It is possible to get the business object (use the “getFailedEventWithParameters” method on the FailedEventManagerEJB), but it is not necessary. I simply search for failed events using the FailedEventManagerEJB, look through the failure exception/type/date, and invoke resubmit() on the FailedEvents that are candidates. I’ll send you the code if you want it
My gut feeling, however, still is that there should be a resubmit queue that handles downtime/scheduled unavailability. That means that each backend system will have its own independent resubmit-handler. I’m not too happy about using the FEM for this.
June 25th, 2008 at 8:57 am
Hi Trond,
Of course I’m interested in the code
Awesome that you got that all up and working. Are you on the 6.1 level? We are at 6.0.2 maybe thats why you were able to get it going so quickly.
And yes, I agree that the Failed Event Manager is a ‘last resort’ location but an integration platform should be able to deal with the common pattern of retrying “out of the box”. How about a new set of properties on an activation spec that define “Business Hours”, Retry Delay etc?
June 25th, 2008 at 11:27 am
You have mail
July 2nd, 2008 at 9:20 am
One way to handle this type of situation is:
Check availability of destination. If it is not available, set a flag in say a relationship to divert the data to a persistent store. When the destination system is restored, reset the flag and process the backlog in the persistent store (I have used a DB for this).
July 23rd, 2008 at 1:21 am
Nice job, can you email me code too? thanks
October 13th, 2008 at 10:20 am
Hi Dan,
We have a small scenario in our project where in we are using JMS Queue (SIB)to send and receive soap messages (soap over JMS).
Listener picks up the message and invokes the service.
If I have any problem (specific errors) while processing the request I want to rollback the message to the same queue and end the process.
I don’t find any way to rollback the message. The message is lost when a runtime exception (unhandled) occurs.
Do you have any idea on how I can do this.
Thanks in advance.
October 14th, 2008 at 3:03 pm
You’ll need to pause the Activation Spec listener once your system has detected that a transaction rollback is required. Or else you will find that your transaction rollsback and it immediately re-tried for n times until it ends up in the excpetion destination.
http://www.ibm.com/developerworks/wikis/display/woolf/Pausing+SIBus+MDBs
October 16th, 2008 at 1:46 am
Hi Dan,
Thanks for the inputs.
This fixpack is to enable the property on the J2CActivation specification.
I am using WID 6.1.2 which has WAS version of 6.1.0.17. I can see that this property is already enabled in this version. The only problem is that I am not able to bring that to effect.
I am able to set value for ‘maxSequentialMessageFailure’ property but I am not able to change the ‘Required’ column to ‘true’ in the customProperties tab of J2CActivation specification.
Have you got any idea about this?
October 21st, 2008 at 3:48 am
It’s pretty amazing that the solution for throttling message processing is based on having to create a custom application that reads error messages from a log file and then invokes server administrative features to stop the activation spec. Why didn’t they make this declarative…?!
October 21st, 2008 at 5:08 am
Hi Trond,
Can you please elaborate on how exactly I can deactivate the J2Cactivation spec in runtime?