In Oracle Adapter, there are several ways to poll the data from the database, in this post, I will be discussing on the polling data without triggers, just by using Poll SQL Statement and Post Poll SQL Statement.
Part 1 - The First Project
We have a table of data which will be polled by several orchestrations based on a status field in the table to process, see below screen for the details.
By using Poll SQL Statement to get the rows and Post SQL Statement to update the rows to other status is actually working fine for most of the cases, and it is important to note that they are both executed within a transaction of serializable isolation level.
In my first project, it is working fine, because the frequency is quite low, poll 1 row of data / 60 seconds. Life is good ;)
Part 2 - The Second Project
Ok, now in which kind of scenarios where it is not working as expected?
We have a new application which needs to poll rows of data in short interval, because the expected data to process will be huge, so it is not acceptable for us to poll 1 row / 60 seconds.
In short, we need to process 3.000 rows of data in 1 hour, so that leaves us with 50 rows / 60 seconds.
So what's the issue with that? just change the query to select top 50 and off you go? hhhmm... not as smooth as we thought :P
Reasons : As the processing got huge, it seems that the server and/or the adapter are affected as well. Because of the latency, it seems that the Post Poll SQL Statement is not executed in the expected time, this cause the subsequent polls may poll the same data as the previous one, because the Update the row status in the Post Poll SQL Statement hasn't been executed yet. This eventually lead to double processing of the data.
Part 3 - The Search for Alternatives
What is the ideal way for this? I would say that call a procedure to poll the data, because we can get the rows and update them in the same time. And btw, I don't really like the way Poll SQL Statement and Post SQL Statement work, we need to make sure that the conditions / where statements in the Poll SQL Statement and Post Poll SQL Statement are the same or we'll get nasty update results :P
Unfortunately, I have been trying for calling a procedure without any good results for quite a while now, and i have posted some questions about this in the web and they have been replicated to many places now :P links
Well, several people have mentioned that they are able to use oracle procedure to poll the data, however I haven't received more details about it until now.
http://forums.microsoft.com/TechNet/ShowPost.aspx?PostID=2527819&SiteID=17
http://www.microsoft.com/communities/newsgroups/en-us/default.aspx?dg=microsoft.public.biztalk.general&tid=97564c53-15ee-4a8e-997c-72f8c2520241
The main problem with oracle procedure, it will need an output parameter to return a resultset, and I was not able to think a way to work with this parameter in the Poll SQL Statement.
So several days ago, i stumbled on a post (I forgot where from) but someone from the post mentioned about using function in the select statement like this below :
Select packagename.functionname from dual;
This gave me an idea, because it will just like a normal select query which returns row(s).
So the next thing to do is to have the DML statement to update the selected rows, I found a way to allow DML Statement in the function, is to use PRAGMA AUTONOMOUS_TRANSACTION which will summon its own transaction so it will not affect the caller transaction.
Part 4 - The Solutions
Solution 1 - Get one row
1. Create an oracle function to select and update the row then return the unique key
CREATE OR REPLACE FUNCTION pollonejob RETURN NUMBER IS
PRAGMA AUTONOMOUS_TRANSACTION;
JOBID NUMBER;
BEGIN
SELECT JOB_ID
INTO JOBID
FROM JOBS
WHERE JOB_STATUS = 'NEW'
AND ROWNUM = 1 FOR UPDATE;
-- This is to test the lock only
--DBMS_LOCK.SLEEP(3);
UPDATE JOBS
SET JOB_STATUS = 'POLLED'
WHERE JOB_ID = JOBID;
COMMIT;
RETURN JOBID;
END;
/
2. Setup the Poll SQL Statement :
SELECT pollonejob AS JOBID FROM dual
Or, if you can do it like this to retrieve more information based on the key :
SELECT job_id, job_desc, job_status FROM jobs WHERE job_id = pollonejob
Solution 2 - Get more rows
1. Create an oracle type as the container for the unique keys
CREATE OR REPLACE TYPE number_key_t is TABLE OF NUMBER
2. Create an oracle function returning the previously created type with a parameter to speficy how many rows to retrieve
CREATE OR REPLACE FUNCTION pollmultiplejob(noOfRows NUMBER) RETURN NUMBER_KEY_T IS
PRAGMA AUTONOMOUS_TRANSACTION;
l_data NUMBER_KEY_T := NUMBER_KEY_T();
BEGIN
SELECT JOB_ID
BULK COLLECT
INTO l_data
FROM JOBS
WHERE JOB_STATUS = 'NEW'
AND ROWNUM <= noOfRows FOR UPDATE;
-- This is to test the locking only
--DBMS_LOCK.SLEEP(3);
UPDATE JOBS
SET JOB_STATUS = 'POLLED'
WHERE JOB_ID IN (SELECT column_value FROM TABLE(CAST(l_data AS NUMBER_KEY_T)));
COMMIT;
RETURN l_data;
END;
/
3. Setup the Poll SQL Statement :
SELECT column_value FROM TABLE(CAST(pollmultiplejob(20) AS NUMBER_KEY_T))
Or, if you can do it like this to retrieve more information based on the key :
SELECT job_id, job_desc, job_status
FROM jobs WHERE job_id IN (SELECT column_value FROM TABLE(CAST(pollmultiplejob(20) AS NUMBER_KEY_T)))
Notes :
1. You do not need the Post Poll SQL Statement anymore, since when the Poll SQL Statement is executed, the rows will be updated in the same time it polls
2. Avoid setting the poll interval less than the time to execute the polling query, for example if the polling takes 30 seconds to complete, it's better to set the poll interval 40 seconds or more, consider the future data growth
3. As the function will update the data when being called in the select query, please do handle that more carefully by giving more meaningful function name, so everyone will know that the function will not only retrieve the data but also update the rows in the same time.
I have tested this with one receive location for several days now and it seems that they are working fine as expected :)
I'm just glad that I have this workaround or you might say that this is a bit hack :P but it did solve my problem and work for Oracle Adapter with ODBC, and i hope it will be useful for you guys as well.
**Update** I have tried creating 5 receive locations calling the same function and there were no duplicate poll :D
However, there is a flaw, the FOR UPDATE statement may not work as expected when trying to poll the data sequentially where in oracle you will need to have a sub query like below :
SELECT JOB_ID
BULK COLLECT
INTO l_data
FROM JOBS
WHERE JOB_ID IN (
SELECT JOB_ID
FROM (
SELECT JOB_ID
FROM JOBS
WHERE JOB_STATUS = 'NEW'
ORDER BY JOB_ID
) A
WHERE ROWNUM <= 1
)
FOR UPDATE;
I experienced duplicate polls with this, so make sure you test your query first for any duplicate issue.
Feel free to drop any comments or questions ;)