This project is read-only.

Got an InvalidOperationException in SmartThreadPool.Cancel(true)

May 30, 2008 at 1:53 PM
It's a "Collection was modified" in "foreach (WorkItemsGroup workItemsGroup in workItemsGroups)".
workItemsGroups was empty after the exception.

There is a chance that I called several Cancel()s from different threads, either for WorkItemsGroup SmartThreadPool.

This is probably not supposed to happen... probably Cancel() should lock on the processed WorkItemsGroup/Pool?
May 31, 2008 at 12:29 AM
Thank you for noting me about this.

I think I should have made a copy of the workItemsGroups.Values and then Cancel each work item group, like this:

            //ICollection workItemsGroups = _workItemsGroups.Values;
            WorkItemsGroup[] workItemsGroups;
            lock (_workItemsGroups.SyncRoot)
            {
                workItemsGroups = new WorkItemsGroup[_workItemsGroups.Count];
                _workItemsGroups.Values.CopyTo(workItemsGroups, 0);
            }

            foreach (WorkItemsGroup workItemsGroup in workItemsGroups)
           {
                workItemsGroup.Cancel(abortExecution);
            }

I still need to think about this, since there is a scenario when a WorkItemGroup may cancel a work item that was queued after the SmartThreadPool.Cancel was called.

Ami

Jun 5, 2008 at 1:13 PM
Edited Jun 5, 2008 at 1:28 PM
> WorkItemGroup may cancel a work item that was queued after the SmartThreadPool.Cancel was called

I see, it's clearly a race condition, and it's likely to occur because canceling a thread can take any time.
Probably you can't avoid more deep copy of current workItemsGroups.  Or better, just copy each WorkItemsGroup's current Items into single list instead of copying WorkItemsGroup list.

Edit: Sorry. It would work if cancelling an item didn't involve its WorkItemGroup ownership.
What if, in STP.Cancel(), you pass false to workItemsGroup.Cancel(), then if (abortExecution) {...} will do the remaining job anyway?
This looks like doing the same and will greatly reduce the time frame for race condition (if leaving one at all).
Jun 5, 2008 at 1:50 PM
Edited Jun 5, 2008 at 1:51 PM
Now I stopped on such an implementation if Cancel(bool):

        public override void Cancel(bool abortExecution)
        {
            WorkItemsGroup[] workItemsGroups;
            List<WorkItem> runningItems = new List<WorkItem>();

            lock (_workItemsGroups.SyncRoot)
            {
                _canceledSmartThreadPool.IsCanceled = true;
                _canceledSmartThreadPool = new CanceledWorkItemsGroup();
                workItemsGroups = new WorkItemsGroup[_workItemsGroups.Count];
                _workItemsGroups.Values.CopyTo(workItemsGroups, 0);

                if (abortExecution)
                {
                    foreach (ThreadEntry threadEntry in _workerThreads.Values)
                    {
                        WorkItem workItem = threadEntry.CurrentWorkItem;
                        if (null != workItem &&
                            //threadEntry.AssociatedSmartThreadPool == this &&
                            !workItem.IsCanceled)
                        {
                            runningItems.Add(workItem);
                        }
                    }
                }
            }

            foreach (WorkItemsGroup workItemsGroup in workItemsGroups)
            {
                workItemsGroup.Cancel(false);
            }

            // Race condition: Still some work item could slip in here - not get
            // into runningItems, but then go to queue and avoid workItemsGroup.Cancel(false);

            if (abortExecution)
            {
                foreach (var item in runningItems)
                {
                    item.GetWorkItemResult().Cancel(true);
                }
            }
        }

We could avoid the race (and the long lock) if WorkItem had a timestamp - which could use lock-free increments.
Jun 5, 2008 at 2:35 PM
The scenario I am talking about is:

1. There were work items in the SmartThreadPool queue and its WorkItemsGroup queue.
2. The SmartThreadPool.Cancel() has been called (didn't finish yet)
3. A new work item is queued into the WorkItemsGroup
4. The SmartThreadPool.Cancel() cancel the WorkItemsGroup (continued the method execution of 2)
5. The work item is canceled.

In my scenario the work item shouldn't have been canceled, since it was queued after the call to cancel.
This is what I want to prevent.

I don't think that there will be work items that won't be canceled by the current implementation,
I worry about work items that will be canceled wrongly.

Note that the lock doesn't take so long, it doesn't do any waits.

Ami
Jun 5, 2008 at 7:09 PM


amibar wrote:
The scenario I am talking about is:

Yes, I understood that the first time.
But ABORTING a work item CAN take notably long - if it involves closing network connections, or closing whatever other resources. This is what caused the rare error in my application: while aborting one work item, some other had the time to complete and leave a Group.

That's because time to complete item is comparable to aborting time.

My proposed implementation avoids the scenario you described: note it doesn't cancel any item that was not running when Cancel() entered lock().
It has symmetric problem:
1. Workitem could be only queued, not running, during lock(), and thus not get into runningItems;
2. While the the loop is running:
foreach (WorkItemsGroup workItemsGroup in workItemsGroups) ... { workItemsGroup.Cancel(false); }
it could go to queue, thus not qualifying to workItemsGroup.Cancel(false);
3. Then it also won't be aborted in runningItems.

But this variant leaves little time for this - workItemsGroup.Cancel(false) is not long, - and an extra item executed after Cancel(true) is a problem only in rare cases.