[omniORB] Canceling a blocking function
Han Kiliccote
kiliccote@cmu.edu
Mon, 3 Apr 2000 08:32:00 -0400
Thanks for your suggestion, however see below.
-----Original Message-----
From: Tres Seaver <tseaver@palladion.com>
To: Han Kiliccote <kiliccote@cmu.edu>
Cc: omniorb-list@uk.research.att.com <omniorb-list@uk.research.att.com>
Date: Sunday, April 02, 2000 3:16 PM
Subject: Re: [omniORB] Canceling a blocking function
>Han Kiliccote wrote:
>>
>> At Carnegie Mellon University, we are developing a prototype for a
>> distributed system that contains a very large number of servers (e.g.,
>> 100000servers). In this prototype, we need to send a request to a large
>> subset of these servers (e.g., 100).
>>
>> Currently we have a loop that uses a thread pool to attach a thread to a
>> server and each thread calls a function in a different server.
>>
>> When a percentage (e.g., 50) of these functions return, we would like to
>> cancel the operation in the remaining threads which are blocking either
>> because the servers are down/faulty or just about to complete but not yet
>> completed.
>>
>> Currently we don't know how to do this. In each remaining thread, there
is a
>> call
>>
>> server[i]-> do_function(argument) // blocked (no reply yet)
>>
>> How can we unblock this? We don't want to wait more than 10sec for these
>> functions to timeout because since the overall function is deemed
completed,
>> there will be another request soon and this would cause a very large
number
>> threads to exist in the system at any given point. We dont want to lower
the
>> timeout to anything to less than 10sec because this would cause an early
>> abort in some cases.
>>
>> Your advice and help are greatly appreciated.
>>
>> P.S. Shall we switch to one-way functions?
>
>Consider very carefully using something like CosEvents/CosNotifications
>to manage the NxM communications you need here. One way to handle your
>scenario:
>
> 1. Create a notification channel within your "master" server (or in
> a separate server, perhaps for scalability).
>
I should have been clearer about the research. Our goal is to remove any
central server from the system (or replace each server with a randomly
chosen a large set of servers). Each client acts like a mini-server. Master
or monolithic servers are something we are trying to replace. The goal of
the research is to show that we can create a large system and still not have
any single point of failure dues to centralized servers.
> 2. Create another channel on which to broadcast requests from the
> requests from master to slave servers.
>
> 2. In each slave server, subscribe a pull consumer to the "request"
> channel. One thread loops as follows:
>
> - pull new request from the requst channel & enqueue them
>
> - pull cancellations from the request channel and mark their
> requests.
>
> Another thread pulls requests from the queue, processing each one
> while checking at intervals to see if it has been cancelled. On
> completion, the processing thread pushes the result to the "result"
> channel.
>
> This server could perhaps be single-threaded, since you have to
> break the "work" up into segmentes to allow checking for
> cancellation.
>
> 2. From the master server, BEFORE broadcasting your requests,
> register a pull consumer on the "results" channel, using a filter
> for the request ID you are about to broadcast.
>
> 3. On the master server, push the request onto the "request" channel.
> Repeatedly pull results from the channel until reaching your
> desired threshhold. Unsubscribe from the channel (results not
> yet received will go into the bit bucket). Broadcast a cancel
> on the current request.
>
>One-ways won't help a whole lot here, unless the request-processing time
>is very small. The new asynchronous message invocation (AMI) spec might
>help, but I imagine that you are truly CPU bound here (else why 10E5
>servers), so the network latency is likely not a big problem. The
>"scatter-gather" solution I proposed has the advantage of decoupling the
>master and the slaves, which becomes especially critical for issues
>involving large numbers of peers (yours is the largest number I have ever
>seen seriously proposed!)
>
Actually (at our current implementation) we are network bound. Each request
takes 0.1ms to complete and 4ms is vasted on the network.
I though omniorb does not support AMI. Am I wrong?
>Best,
>
>Tres.
>--
>=========================================================
>Tres Seaver tseaver@palladion.com