Wednesday, April 30, 2008

BizTalk 2006 Oracle Adapter (ODBC) - An alternative for Poll Statement with Function (Not Procedure)

In Oracle Adapter, there are several ways to poll the data from the database, in this post, I will be discussing on the polling data without triggers, just by using Poll SQL Statement and Post Poll SQL Statement.

Part 1 - The First Project
We have a table of data which will be polled by several orchestrations based on a status field in the table to process, see below screen for the details.

By using Poll SQL Statement to get the rows and Post SQL Statement to update the rows to other status is actually working fine for most of the cases, and it is important to note that they are both executed within a transaction of serializable isolation level.

In my first project, it is working fine, because the frequency is quite low, poll 1 row of data / 60 seconds. Life is good ;)

Part 2 - The Second Project
Ok, now in which kind of scenarios where it is not working as expected?

We have a new application which needs to poll rows of data in short interval, because the expected data to process will be huge, so it is not acceptable for us to poll 1 row / 60 seconds.
In short, we need to process 3.000 rows of data in 1 hour, so that leaves us with 50 rows / 60 seconds.

So what's the issue with that? just change the query to select top 50 and off you go? hhhmm... not as smooth as we thought :P

Reasons : As the processing got huge, it seems that the server and/or the adapter are affected as well. Because of the latency, it seems that the Post Poll SQL Statement is not executed in the expected time, this cause the subsequent polls may poll the same data as the previous one, because the Update the row status in the Post Poll SQL Statement hasn't been executed yet. This eventually lead to double processing of the data.

Part 3 - The Search for Alternatives
What is the ideal way for this? I would say that call a procedure to poll the data, because we can get the rows and update them in the same time. And btw, I don't really like the way Poll SQL Statement and Post SQL Statement work, we need to make sure that the conditions / where statements in the Poll SQL Statement and Post Poll SQL Statement are the same or we'll get nasty update results :P

Unfortunately, I have been trying for calling a procedure without any good results for quite a while now, and i have posted some questions about this in the web and they have been replicated to many places now :P links

Well, several people have mentioned that they are able to use oracle procedure to poll the data, however I haven't received more details about it until now.
http://forums.microsoft.com/TechNet/ShowPost.aspx?PostID=2527819&SiteID=17
http://www.microsoft.com/communities/newsgroups/en-us/default.aspx?dg=microsoft.public.biztalk.general&tid=97564c53-15ee-4a8e-997c-72f8c2520241

The main problem with oracle procedure, it will need an output parameter to return a resultset, and I was not able to think a way to work with this parameter in the Poll SQL Statement.

So several days ago, i stumbled on a post (I forgot where from) but someone from the post mentioned about using function in the select statement like this below :

Select packagename.functionname from dual;


This gave me an idea, because it will just like a normal select query which returns row(s).

So the next thing to do is to have the DML statement to update the selected rows, I found a way to allow DML Statement in the function, is to use PRAGMA AUTONOMOUS_TRANSACTION which will summon its own transaction so it will not affect the caller transaction.

Part 4 - The Solutions

Solution 1 - Get one row

1. Create an oracle function to select and update the row then return the unique key

CREATE OR REPLACE FUNCTION pollonejob RETURN NUMBER IS
PRAGMA AUTONOMOUS_TRANSACTION;
JOBID NUMBER;
BEGIN
SELECT JOB_ID
INTO JOBID
FROM JOBS
WHERE JOB_STATUS = 'NEW'
AND ROWNUM = 1 FOR UPDATE;

-- This is to test the lock only
--DBMS_LOCK.SLEEP(3);

UPDATE JOBS
SET JOB_STATUS = 'POLLED'
WHERE JOB_ID = JOBID;

COMMIT;

RETURN JOBID;
END;
/


2. Setup the Poll SQL Statement :
SELECT pollonejob AS JOBID FROM dual

Or, if you can do it like this to retrieve more information based on the key :
SELECT job_id, job_desc, job_status FROM jobs WHERE job_id = pollonejob

Solution 2 - Get more rows
1. Create an oracle type as the container for the unique keys
CREATE OR REPLACE TYPE number_key_t is TABLE OF NUMBER

2. Create an oracle function returning the previously created type with a parameter to speficy how many rows to retrieve
CREATE OR REPLACE FUNCTION pollmultiplejob(noOfRows NUMBER) RETURN NUMBER_KEY_T IS
PRAGMA AUTONOMOUS_TRANSACTION;
l_data NUMBER_KEY_T := NUMBER_KEY_T();
BEGIN
SELECT JOB_ID
BULK COLLECT
INTO l_data
FROM JOBS
WHERE JOB_STATUS = 'NEW'
AND ROWNUM <= noOfRows FOR UPDATE;

-- This is to test the locking only
--DBMS_LOCK.SLEEP(3);

UPDATE JOBS
SET JOB_STATUS = 'POLLED'
WHERE JOB_ID IN (SELECT column_value FROM TABLE(CAST(l_data AS NUMBER_KEY_T)));

COMMIT;

RETURN l_data;
END;
/


3. Setup the Poll SQL Statement :
SELECT column_value FROM TABLE(CAST(pollmultiplejob(20) AS NUMBER_KEY_T))

Or, if you can do it like this to retrieve more information based on the key :
SELECT job_id, job_desc, job_status
FROM jobs WHERE job_id IN (SELECT column_value FROM TABLE(CAST(pollmultiplejob(20) AS NUMBER_KEY_T)))

Notes :
1. You do not need the Post Poll SQL Statement anymore, since when the Poll SQL Statement is executed, the rows will be updated in the same time it polls
2. Avoid setting the poll interval less than the time to execute the polling query, for example if the polling takes 30 seconds to complete, it's better to set the poll interval 40 seconds or more, consider the future data growth
3. As the function will update the data when being called in the select query, please do handle that more carefully by giving more meaningful function name, so everyone will know that the function will not only retrieve the data but also update the rows in the same time.

I have tested this with one receive location for several days now and it seems that they are working fine as expected :)

I'm just glad that I have this workaround or you might say that this is a bit hack :P but it did solve my problem and work for Oracle Adapter with ODBC, and i hope it will be useful for you guys as well.

**Update** I have tried creating 5 receive locations calling the same function and there were no duplicate poll :D
However, there is a flaw, the FOR UPDATE statement may not work as expected when trying to poll the data sequentially where in oracle you will need to have a sub query like below :
SELECT JOB_ID
BULK COLLECT
INTO l_data
FROM JOBS
WHERE JOB_ID IN (
SELECT JOB_ID
FROM (
SELECT JOB_ID
FROM JOBS
WHERE JOB_STATUS = 'NEW'
ORDER BY JOB_ID
) A
WHERE ROWNUM <= 1
)
FOR UPDATE;

I experienced duplicate polls with this, so make sure you test your query first for any duplicate issue.

Feel free to drop any comments or questions ;)

13 comments:

lastikman said...

Hi Good Day,

select *
from BTCTL_PASR_NAMES where seq_id =
(SELECT column_value from table(cast(SHAMAN.BT_POLL_MULTIPLE_PT_CREATE(1)
AS SHAMAN.BTCTL_SHAMAN_ICARUS_TYPE)))


we used this solutions its work find to the test database, but when we try to ship to live database.
the query that we used polled all status of the table. please help us thanks you very much.

ARNOLD
TMC

Bembeng Arifin said...

Hi Arnold,
Based on the query, it seems that you would return the only one row since you're using "seqid =", otherwise you would get the error ORA-01427: single-row subquery returns more than one row.
Can you please check the query in the BT_POLL_MULTIPLE_PT_CREATE function that you are using to get the rows and make sure that you use the parameter (1) in rownum = varNoInstance.
Then check that when you update the query, you are using the temporary BTCTL_SHAMAN_ICARUS_TYPE table.
Hope you can find the problem :)

lastikman said...

HI Bembeng Arifin,

select *
from BTCTL_PASR_NAMES where seq_id IN
(SELECT column_value from table(cast(SHAMAN.BT_POLL_MULTIPLE_PT_CREATE(1)
AS SHAMAN.BTCTL_SHAMAN_ICARUS_TYPE)))

Thanks for your time,
We try this query result is the same.
That why we to change the "seq_id IN" to "seq_id =" to return single row. This the function we created. thanks again for time and help.

CREATE OR REPLACE FUNCTION BT_POLL_MULTIPLE_PT_CREATE(noOfRows NUMBER) RETURN SHAMAN.BTCTL_SHAMAN_ICARUS_TYPE IS
PRAGMA AUTONOMOUS_TRANSACTION;
l_data SHAMAN.BTCTL_SHAMAN_ICARUS_TYPE := SHAMAN.BTCTL_SHAMAN_ICARUS_TYPE();
BEGIN
SELECT SEQ_ID BULK COLLECT
INTO l_data
FROM SHAMAN.BTCTL_PASR_NAMES
WHERE POLL_STATUS = 'NEW'
AND ROWNUM <= noOfRows FOR UPDATE;

-- This is to test the locking only
--DBMS_LOCK.SLEEP(3);

UPDATE SHAMAN.BTCTL_PASR_NAMES
SET POLL_STATUS = 'POLLED'
WHERE SEQ_ID IN (SELECT column_value FROM TABLE(CAST(l_data AS SHAMAN.BTCTL_SHAMAN_ICARUS_TYPE)));

COMMIT;
RETURN l_data;
END;

Bembeng Arifin said...

Hi Arnold,

I looked at the function and it seems to be fine.

There are 2 things you can try :

1. Try to invoke this in your sql Plus and see how many rows are returned and how many rows were updated.

Put this in the function after update statement and before the line commit, to print out how many rows were updated :
Dbms_Output.put_line(SQL%ROWCOUNT);

Execute :
SELECT column_value from table(cast(SHAMAN.BT_POLL_MULTIPLE_PT_CREATE(1)
AS SHAMAN.BTCTL_SHAMAN_ICARUS_TYPE))

2. Change the condition below for temporary, to make sure that you only get 1 row then test again the result :
AND ROWNUM <= noOfRows FOR UPDATE;
to
AND ROWNUM <= 1 FOR UPDATE;

rlg said...

I have similar concern:

Please see the cursor below, I want to update col1 to 'P', this will be used in multithreading environment in oracle.
Do you see any issue?
I don't need any order clause.

cursor c1 is
SELECT rowid
FROM table tbl
WHERE tbl.col1 = 'U'
AND tbl.col1 = 'abc'
AND rownum <= 500
FOR UPDATE;

Bembeng Arifin said...

Hi rlg,

Are you using the cursor in the function? If you are, can you please post the whole function at here?
The FOR UPDATE statement should work for multithread environment, if you need to test it out, you can try create some .net code to call the function and use my concurrency testing tool to execute that code in multithread execution. The details at http://bembengarifin-tech.blogspot.com/2008/06/testing-concurrency-using-threadpool.html
Hope that helps

rlg said...

Hi Bembeng Arfin,

The function is within oracle package and it is too big to post. My only concern is if 'FOR UPDATE' will cause any issue or not. I have never used this befor. Also, rownum <= 500, along with other WHERE and AND clause has created some concerns too. Someone said if you are using ROWNUM <= 500 along with other WHERE clause, you might not get correct result. I think if I am not using order by clause, I should be fine.
The purpose of this is to use all thread 100% of time. In normal environment, if you have 1000 records and 5 threads, you would divide them and have each thread work no 200 records. There is strong possibility that one or two thread will be done in 1/4 of time and go to end. My goal is to give each thread say, 100 records first then which ever gets done first, assign it with another 50 so on and so forth.

Thanks for your comment.

Bembeng Arifin said...

Hi rlg,

From what I know, FOR UPDATE statement is similar to ROWLOCK in MS. SQL Server, it will block other requests (reads) for the returned rows until the transaction is completed.
As for the incorrect result, I can say that we need to be careful when using ROWNUM, since it may not returned the rows in the correct sequence, you can read it more at http://www.oracle.com/technology/oramag/oracle/06-sep/o56asktom.html. If you are worry about incorrect result regarding to the concurrent requests to the database, I suggest you read more on the isolation level when retrieving the rows.
From the purpose description, it surely looks like ThreadPool is a good way to go since it works like a queue process. You can find out more at http://www.albahari.com/threading/part3.html#_Thread_Pooling and see whether it suits to your needs.
Good Luck ;)

stainless steel barometer said...

Wonderful Article. I had gained lot of things such as how the data is retrieved using the Oracle Connection. The Example taken is this article is really good.

men's dress shirts said...

Thanks for sharing this it help me a lot in my Web development career.

Pieter said...

Hi, thanks for your article. We ran into the same situation with our oracle poller for one of our mission critical apps. The post poll was updating rows that were inserted between the poll statement and the post poll statement. We implemented your approach and have had huge success. However, at least once a day we encounter a situation where the function queried the db, found a row, updated the status as being sent, but biztalk did not get the key value returned. we are polling every 3 seconds and return a max of 6 rows. becasue of the frequency we really never get 6 rows. And what i find really weird is the missed polled row is usually from a time where we have a lower load on the system. Has anyone ever encountered this situation where the function updated the row as sent but never hit biztalk be processed.

Pieter...

bluephnx said...

There is one problem with this approach, and that is the PRAGMA AUTONOMOUS_TRANSACTION directive. Normally, when BizTalk polls the database and inserts the message into the messagebox, that will be executed as one distributed transaction. When you use the PRAGMA AUTONOMOUS_TRANSACTION, the update in the polling-table will be done even though BizTalk is not able to insert the message into the messagebox, so BizTalk is not able to do a rollback if the insert intoi msgbox is unsuccessful. I used this approach earlier also, but as exaplained above, it is not perfect, and I am now working on a new and improved pollingmechanism.

Unknown said...

wonderful information, I had come to know about your blog from my friend nagesh, delhi,i have read atleast 6 posts of yours by now, and let me tell you, your website gives the best and the most interesting information. This is just the kind of information that i had been looking for, i'm already your rss reader now and i would regularly watch out for the new posts, once again hats off to you! Thanks a ton once again, Regards, Single Row Function in sql




Post a Comment