WebSphere Default Messaging (SIBus) and WebService Error Handling

Posted on Apr 22, 2008 by dan

An interesting situation arose the other day for a solution to the following requirements on WPS 6.0.2.3:

We have a JMS queue that collects message for a web service Endpoint.
We have a web service that is only available during business hours (say 9-5).
If we send a request outside business hours, we will get an exception.
During business hours, a failed transaction should be re-tried for four hours before it is recorded as a failure.
The retries to the service should occur X minutes apart.
The solution should have the minimum ‘moving parts’. If it requires multiple scripts and cron jobs, it’s too complicated.

The problem:

If we don’t do anything special, all our transactions end up in the Failed Event Manager where someone is going to have to manually re-submit them. Manual anything is a bad solution in Integration, so we need to look for something better. Default messaging is pretty bad at the concept of “Stop picking up messages if the endpoint I’m trying to call is down”. It will pick every single message off the queue, fail, and put it into the FEM.

The solution that immediately came to mind was to leverage the JMS queues. I could put all the requests in a new queue that service requesters would place their messages.Next, I would install an MDB that enforced this policy on that request queue. If the message on the queue met the above requirements, it will be placed onto the JMS Queue for the WebService Endpoint. Basically, the MDB would act like a gatekeeper.

The problem that I ran into was how to deal with messages that the gatekeeper didn’t want YET (say outside office hours). I thought JMS message selectors would do it but they’re only static. How could I start processing messages when the office door went from CLOSED to OPEN at a specific time? I was unable to resolve this issue.

The solution that we went went involved using a Long Running BPEL process to meet all the above requirements. A short running BPEL is no good because it must run within a single transaction, meaning 180s to resolution. Increasing this to four hours is unreasonable. A Long Running BPEL is composed of multiple transactions and is not bound by this 180s limitation. It includes dynamic duration nodes that will allow it to sleep until office hours open, sleep until the next retry and calculate how long it’s been trying to send. Also, no ‘moving parts’.

It kind of sucks to use a long running BPEL process to solve a technical issue instead of a business problem, but it does what we need without exposing our WID/WPS developers to the world of RAD/WAS.

Author: dan

Website

Comments

Trond Isaksen

June 23, 2008 at 2:23 am

Hi,
we have pretty much the same challenge. I would be surprised if we are the only ones that want support for automatic retries against backend-systems…so I believe this should be better handled by WPS. FEM is to me a good last-resort queue, but typically not a good structure for holding messages while systems are down.
We have also considered the JMS-solution. I believe it should be feasible. Outside of business hours I expected that it should be OK to check the time, roll back the message, sleep, and retry later.

However, we are currently pursuing to use the FEM API for automatic resubmit (having a batch job look through failed messages and decide which ones should be resubmitted). We’ll se if this turns out OK.
dan Post author

June 23, 2008 at 9:49 am

Hi Trond,

Yeah, it’s a very common pattern that I was surprised is so difficult to implement.

From what I have heard from a developer who attempted to use the FEM API to get a handle to the business object contained in the event and he said he was unable to get it (or that there was no method). So if you get something like this working, it’d be awesome to let me know.

Cheers.
Trond

June 24, 2008 at 8:51 am

Dan,
Just tried it, and it seems to work pretty good. It is possible to get the business object (use the “getFailedEventWithParameters” method on the FailedEventManagerEJB), but it is not necessary. I simply search for failed events using the FailedEventManagerEJB, look through the failure exception/type/date, and invoke resubmit() on the FailedEvents that are candidates. I’ll send you the code if you want it 🙂

My gut feeling, however, still is that there should be a resubmit queue that handles downtime/scheduled unavailability. That means that each backend system will have its own independent resubmit-handler. I’m not too happy about using the FEM for this.
dan Post author

June 25, 2008 at 8:57 am

Hi Trond,

Of course I’m interested in the code 🙂 Awesome that you got that all up and working. Are you on the 6.1 level? We are at 6.0.2 maybe thats why you were able to get it going so quickly.

And yes, I agree that the Failed Event Manager is a ‘last resort’ location but an integration platform should be able to deal with the common pattern of retrying “out of the box”. How about a new set of properties on an activation spec that define “Business Hours”, Retry Delay etc?
Trond Isaksen

June 25, 2008 at 11:27 am

You have mail 🙂
Chris Holding

July 2, 2008 at 9:20 am

One way to handle this type of situation is:
Check availability of destination. If it is not available, set a flag in say a relationship to divert the data to a persistent store. When the destination system is restored, reset the flag and process the backlog in the persistent store (I have used a DB for this).
David L

July 23, 2008 at 1:21 am

Nice job, can you email me code too? thanks
Rajesh_sk

October 13, 2008 at 10:20 am

Hi Dan,

We have a small scenario in our project where in we are using JMS Queue (SIB)to send and receive soap messages (soap over JMS).
Listener picks up the message and invokes the service.
If I have any problem (specific errors) while processing the request I want to rollback the message to the same queue and end the process.
I don’t find any way to rollback the message. The message is lost when a runtime exception (unhandled) occurs.
Do you have any idea on how I can do this.
Thanks in advance.
dan Post author

October 14, 2008 at 3:03 pm

You’ll need to pause the Activation Spec listener once your system has detected that a transaction rollback is required. Or else you will find that your transaction rollsback and it immediately re-tried for n times until it ends up in the excpetion destination.

http://www.ibm.com/developerworks/wikis/display/woolf/Pausing+SIBus+MDBs
Rajesh_sk

October 16, 2008 at 1:46 am

Hi Dan,

Thanks for the inputs.
This fixpack is to enable the property on the J2CActivation specification.
I am using WID 6.1.2 which has WAS version of 6.1.0.17. I can see that this property is already enabled in this version. The only problem is that I am not able to bring that to effect.
I am able to set value for ‘maxSequentialMessageFailure’ property but I am not able to change the ‘Required’ column to ‘true’ in the customProperties tab of J2CActivation specification.

Have you got any idea about this?
Trond

October 21, 2008 at 3:48 am

It’s pretty amazing that the solution for throttling message processing is based on having to create a custom application that reads error messages from a log file and then invokes server administrative features to stop the activation spec. Why didn’t they make this declarative…?!
Rajesh_sk

October 21, 2008 at 5:08 am

Hi Trond,

Can you please elaborate on how exactly I can deactivate the J2Cactivation spec in runtime?
Ken

May 31, 2009 at 4:24 pm

WAS JMS default provider is crapola when it comes to anything beyond basic messaging. You can get more with MQ but be prepared for a fight. Weblogic does this nicely with a redelivery limit/delay (multiple tiers too with an override).
dan Post author

June 1, 2009 at 11:25 am

Hi Ken,

Can you expand on the problems you’ve had with the WAS Default JMS Provider?

– Dan
Elena Neroslavskaya

December 28, 2009 at 7:24 pm

Hi Dan,

Would it be possible to publish (or email to me) FEM code you and Trond were discussing?

I have seen you reply also for a similar posrt on devworks:
https://www.ibm.com/developerworks/forums/thread.jspa?messageID=14044278&#14044278
where you suggest JMS queues and MDB, could you please elaborate a bit more on how this would interact with mediation running on WESB and calling service provider?

May be some new great post?

Thank you
dan Post author

January 12, 2010 at 3:49 pm

Hi Elena,

The issue is that there’s a breakdown in the simplicity of JMS Queues when the message consumer throws an error (which could be a common case when a transaction arrives during database maintenance). In order to prevent the queue from stopping, you have to figure out where the message that caused the error goes. By default in WPS it’ll go into the Failed Event Manager, where you can resubmit it. My suggestion in the developerworks post was that you could setup another queue instead of the FEM and stick messages that failed due to known common exceptions. Then, when maintenence is complete on the database you have a little MDB that picks the messages off this backup queue and adds them into the real queue.

It’s completely unelegant but there’s not a lot of choice out there.
Ishwara Varnasi

February 4, 2010 at 5:21 pm

Hi Dan, question on SIBus vs MQ.
Which of SIBus and MQ is the best option to go for asynchronous messaging from within a J2EE application running on WebSphere? Is SIBus as good as MQ? I haven’t seen any articles or blogs comparing SIBus to MQ. It will be good if you could give me some insight.
Jonas

May 8, 2010 at 3:58 am

Hi Ishwara,

I don’t think there’s one clear answer to that question. Here’s however a link to infocenter that could be of use to you:

http://publib.boulder.ibm.com/infocenter/wasinfo/v7r0/topic/com.ibm.websphere.base.doc/info/aes/ae/cmj_jmsp.html

danzrobok.com

Business Integration and SOA with an IBM WebSphere slant

WebSphere Default Messaging (SIBus) and WebService Error Handling

Author: dan

Comments

Leave a Reply Cancel reply