Using IDisposables with LINQ

Objects that implement IDisposable are everywhere. The interface even gets its own language features (C#, VB, F#). However, LINQ throws a few wrenches into things:

  1. LINQ’s query syntax depends on expressions; using blocks are statements.
  2. When querying a sequence of IDisposable objects, there’s no easy way to ensure disposal after each element has been consumed.
  3. Returning deferred queries from within a using statement is often desired, but fails spectacularly.

There are possible work-arounds for each issue…

  1. Put the using statement in a method (named or anonymous) that is called from the query. See also: Thinking Functional: Using.
  2. Use a method that creates a dispose-safe iterator of the sequence, like AsSafeEnumerable().
  3. Refactor the method to inject the IDisposable dependency, as shown in the first part of Marc’s answer here.

But, as you might have guessed, I would like to propose a better solution. The code is really complex, so bear with me:

public static IEnumerable<T> Use<T>(this T obj) where T : IDisposable
{
    try
    {
        yield return obj;
    }
    finally
    {
        if (obj != null)
            obj.Dispose();
    }
}

That’s it. We’re turning our IDisposable object into a single-element sequence. The trick is that the C# compiler will build an iterator for us that properly handles the finally clause, ensuring that our object will be disposed. It might be helpful to set a breakpoint on the finally clause to get a better idea what’s happening.

So how can this simple method solve all our problems? First up: “using” a FileStream object created in a LINQ query:

var lengths = from path in myFiles
              from fs in File.OpenRead(path).Use()
              select new { path, fs.Length };

Since the result of Use() is a single-element sequence, we can think of from fs in something.Use() as an assignment of that single value, something, to fs. In fact, it’s really quite similar to an F# use binding in that it will automatically clean itself up when it goes out of scope (by its enumerator calling MoveNext()).

Next, disposing elements from a collection. I’ll use the same SharePoint problem that AsSafeEnumerable() solves:

var webs = from notDisposed in site.AllWebs
           from web in notDisposed.Use()
           select web.Title;

I find this syntax rather clumsy compared with AsSafeEnumerable(), but it’s there if you need it.

Finally, let’s defer disposal of a LINQ to SQL DataContext until after the deferred query is executed, as an answer to the previously-linked Stack Overflow question:

IQueryable<MyType> MyFunc(string myValue)
{
    return from dc in new MyDataContext().Use()
           from row in dc.MyTable
           where row.MyField == myValue
           select row;
}

void UsingFunc()
{
    var result = MyFunc("MyValue").OrderBy(row => row.SortOrder);
    foreach(var row in result)
    {
        //Do something
    }
}

The result of MyFunc now owns its destiny completely. It doesn’t depend on some potentially disposed DataContext – it just creates one that it will dispose when it’s done. There are probably situations where you would want to share a DataContext rather than create one on demand (I don’t use LINQ to SQL, I just blog about it), but again it’s there if you need it.

I’ve only started using this approach recently, so if you have any problems with it please share.

SharePoint Disposal Wish List

I was almost off my SharePoint disposal kick when Jeremy Jameson had to pull me back in with this post. Resisting the temptation to rehash some stuff I’ve covered before, I thought it might be therapeutic to spend some time discussing what the dispose story should look like. I’m sure the API for WSS 4.0 is more or less locked at this point, but it still might be useful to frame how I think about ownership of these blasted IDisposables.

In my SharePoint fantasy world, the dispose story can be explained in four bullets:

  1. Use of a disposed object throws an exception; failures to dispose properly are logged.
  2. IDisposable references retrieved from an instance property are owned by the instance.
  3. IDisposable references retrieved from a constructor, method or collection indexer are owned by the developer and should always be disposed.
  4. IDisposable references from an enumerator are owned and disposed by the enumerator.

Let’s examine these a bit more closely.

1. Disposed = Invalid = Use is Exceptional; Discoverability = Good

The leaked object logging is already pretty good, but that’s more than made up for by the epic failure to actually close objects that have been closed. Fixing discoverability would make all other points a matter of convenience.

2. Instance Properties

An instance property suggests “mine”. My context’s Site, my site’s RootWeb, my feature’s Parent, etc. If something is “mine,” then I should be responsible for it. This behavior is already consistent with three exceptions:

SPWeb.ParentWeb

This property is tricky because the instance is shared for all consumers of the SPWeb (think SPContext.Web), but will not be cleaned up if the SPWeb is disposed and has no mechanism to notify the child that it has been disposed to null out the reference to it. So what to do? Well in accordance with my fantasy guidelines, there are two options, either of which would resolve the ambiguity:

  1. Leave ParentWeb as a property and require that the SPWeb dispose its parent as part of the clean-up process.
  2. Provide SPWeb.GetParentWeb() instead, returning a new SPWeb on each call that is owned by the developer.

Since GetParentWeb() can be implemented as an extension method, I guess I would prefer option 1.

SPList.ParentWeb

This property is almost fine, but for some reason has an edge case returns a new SPWeb (details here). It should be fine as a property, but that edge case needs to disappear. Why can’t it just return SPList.Lists.Web?

MOSS UserProfile.PersonalSite

This property returns a new SPSite on each call, so ideally it would be a method instead, perhaps GetPersonalSite. That seems much easier than having UserProfile implement IDisposable to facilitate cleanup of shared instance.

3. Constructors & Methods & Indexers, Oh My!

If properties are “mine”, then everything else is “yours”. Again, the vast majority of the API already fits this behavior. The few discrepancies:

SPControl.GetContextSite() & SPControl.GetContextWeb()

I believe these are the only methods that return IDisposables that SharePoint owns. MSDN clearly indicates that behavior, but for consistency the preferred usage (consistent with our fantasy guidelines) would be to use an SPContext’s Site and Web properties instead.

MOSS PublishingWeb

I’ve already discussed the semi-disposability of PublishingWeb, which just needs a few tweaks to fit nicely in my fantasy model:

  • Implement IDisposable! Close() is already implemented, just need the interface to take advantage of our languages’ using constructs.
  • I believe ParentPublishingWeb would be fixed if SPWeb.ParentWeb were disposed with the child. Alternatively, change to GetParentPublishingWeb and ensure that the returned PublishingWeb will close the internal SPWeb on dispose.
  • PublishingWebCollection.Add() is fine, but the indexer works in such a way that an internal SPWeb is created that is not cleaned up by PublishingWeb.Close(). I would consider this a bug in the current code base, which would naturally be fixed in my fantasy.

4. Enumerating Collections

This isn’t quite as simple as it would seem. When dealing with collections and enumerators, there are essentially two classes of operation:

  1. Enumerate over all or part of the collection and perform an operation. (LINQ Select, GroupBy, OrderBy)
  2. Enumerate over the collection to pick an object from the collection to return. (LINQ First, Last, ElementAt)

In my experience, the vast majority fall into the first category, including most of LINQ. These operations shouldn’t need to know that the objects being enumerated are IDisposable; it’s up to the enumerator to behave appropriately. Rather than sacrifice these many useful operations, I suggest that the enumerator should include proper disposal of the objects it creates—this is precisely what my AsSafeEnumerable() iterator does.

The second category can either be handled using for loops and indexers, which return objects that the caller must dispose, or through higher-order functions that allow dispose-safe operation on or selection from the desired element. But again, these seem to be the exception rather than the rule, and can be handled accordingly.

Wishful Thinking

The unfortunate reality is that most of these adjustments could never happen because breaking changes are generally frowned upon. But perhaps in framing our disposal thought process around some simple rules with documented exceptions, it may be easier to get things right without leaning on tools like SPDisposeCheck.

Posted in Object Model, SharePoint. Tags: , . Comments Off on SharePoint Disposal Wish List

PowerShell Dispose-All

With all the talk of Dispose safety in object model code, it’s easy to forget that those same objects need to be cleaned up in PowerShell as well. Sure, SharePoint will clean everything up when the thread ends, but my PowerShell doesn’t get closed all that often, leaving open SPRequests for extended periods of time. To combat that, I wrote a quick function that will clean up all my IDisposable variables for me:

function Dispose-All {
    Get-Variable -exclude Runspace |
        Where-Object {
            $_.Value -is [System.IDisposable]
        } |
        Foreach-Object {
            $_.Value.Dispose()
            Remove-Variable $_.Name
        }
}

For some reason, I don’t think it would be a good idea to Dispose() the current Runspace… (Edit: Apparently I define $Runspace in a long-forgotten part of my profile script, so don’t be worried if you don’t have that defined.)

Of course this isn’t foolproof, as variables can be overwritten and not all IDisposables will have been assigned to variables in the first place, but you could certainly adapt your shell behavior to make the most out of a catch-all cleanup like this.

Implementing LINQ Select for SPWebCollection

If I’m going to suggest that others use yield return, I suppose I should take my own advice. In particular, my original implementation of SPWebCollection.Select can be improved using the same techniques I used for Where:

public static IEnumerable<TResult> Select<TResult>(this SPWebCollection webs,
                                                   Func<SPWeb, TResult> selector)
{
    foreach (SPWeb web in webs)
    {
        TResult ret = selector(web);
        web.Dispose();
        yield return ret;
    }
}

And we might as well support a filtered Select as well:

public static IEnumerable<TResult> Select<TResult>(this SPWebCollection webs,
                                                   Func<SPWeb, TResult> selector,
                                                   Func<SPWeb, bool> predicate)
{
    foreach (SPWeb web in webs.Where(predicate))
    {
        TResult ret = selector(web);
        web.Dispose();
        yield return ret;
    }
}

Implementations using Cast<T>() are left as an exercise for the reader.

The function itself isn’t particularly interesting, but I did stumble on something I found rather surprising. When I first wrote up my function, I typed the following:

public static IEnumerable<TResult> Select<TResult>(this SPWebCollection webs,
                                                   Func<SPWeb, TResult> selector)
{
    foreach (SPWeb web in webs)
    {
        yield return selector(web);
        web.Dispose();
    }
}

I’m really not sure why I typed it that way…obviously you can’t keep going after you return, right? Well it turns out you can. The generated class just waits to call Dispose() until the next call to MoveNext(), effectively picking up where it left off. Here’s what that looks like in Reflector:

switch (this.<>1__state)
{
    case 0:
        this.<>1__state = -1;
        this.<>7__wrap1a = this.webs.GetEnumerator();
        this.<>1__state = 1;
        while (this.<>7__wrap1a.MoveNext())
        {
            this.<web>5__19 = (SPWeb) this.<>7__wrap1a.Current;
            this.<>2__current = this.selector(this.<web>5__19);
            this.<>1__state = 2;
            return true;
        Label_0080:
            this.<>1__state = 1;
            this.<web>5__19.Dispose();
        }
        this.<>m__Finally1c();
        break;

    case 2:
        goto Label_0080;
}
return false;

As Select will almost always be used in a foreach, with back-to-back calls of MoveNext(), the distinction is mostly academic. Still, I prefer to know that the web will be disposed immediately after the selector is done with it.

Implementing LINQ Where for SharePoint

First, a huge thanks to Waldek Mastykarz for running with my suggestion to run some performance tests on list item filtering. In short, CAML wins by a factor of 300, which I expect would be even more pronounced on larger lists and under load.

In his test, Waldek implements Where() as follows:

public static IEnumerable<SPListItem> Where(this SPListItemCollection items,
                                            Func<SPListItem, bool> predicate)
{
    List<SPListItem> list = new List<SPListItem>();
    foreach (SPListItem item in items)
        if (predicate(item))
            list.Add(item);
    return list;
}

This works as expected, but allocates a secondary data structure to store the filtered items. The preferred approach is to use the yield return syntax:

public static IEnumerable<SPListItem> Where(this SPListItemCollection items,
                                            Func<SPListItem, bool> predicate)
{
    foreach (SPListItem item in items)
        if (predicate(item))
            yield return item;
}

The actual IL this generates is too complex to go into here, but I highly suggest checking it out in Reflector. In short, the compiler creates a private class that provides a filtered enumerator without actually building an intermediate data structure, instead filtering in MoveNext(). Using yield also defers execution until the collection is actually enumerated, though I can’t think of a SharePoint example where this would actually matter.

Another alternative, which also defers execution, is to leverage LINQ’s Cast<T>() operator and LINQ’s IEnumerable<T>.Where():

public static IEnumerable<SPListItem> Where(this SPListItemCollection items,
                                            Func<SPListItem, bool> predicate)
{
    return items.Cast<SPListItem>().Where(predicate);
}

I imagine the compiler would optimize the yield-generated class in much the same way it optimizes LINQ’s internal implementation, but I will leave that research for a later date. It would also be interesting to compare the performance between the different implementations, though in a SharePoint context I expect the difference would be insignificant compared with the more expensive operations needed to retrieve the data.

The Problem with IDisposable

In my previous post, I suggested a Dispose-safe implementation of SPWebCollection.ForEach(), which Waldek leveraged for his Where implementation. Presumably because he was concerned about leaking SPWebs, his Where() implementation just returns a list of the web IDs. While avoiding leaks is smart, an ID isn’t nearly as useful as the full SPWeb and opening a new SPWeb from the ID is an expensive operation. What if I wanted a filtered enumeration of the SPWeb objects?

Well if we use one of the patterns described above, we should be safe if we call Dispose() for each when we’re done, right? I probably wouldn’t bother asking if there weren’t a catch, so I’ll answer my question with another question: When would Dispose() be called on the webs for which the predicate is false? It wouldn’t! To prevent these leaks, we need to be a bit more sophisticated:

public static IEnumerable<SPWeb> Where(this SPWebCollection webs,
                                       Func<SPWeb, bool> predicate)
{
    foreach (SPWeb web in webs)
        if (predicate(web))
            yield return web;
        else
            web.Dispose();
}

Or using Cast<T>():

public static IEnumerable<SPWeb> Where(this SPWebCollection webs,
                                        Func<SPWeb, bool> predicate)
{
    return webs.Cast<SPWeb>().Where(w =>
    {
        bool r = predicate(w);
        if (!r)
            w.Dispose();
        return r;
    });
}

Again, a detailed IL investigation would likely prove one preferable to the other, but the principle is the same.

Finally, since caller-dependent disposal is unreliable and delegates are fun, I figure we could use a Dispose-safe filtered ForEach:

public static void ForEach(this SPWebCollection webs,
                           Action<SPWeb> action,
                           Func<SPWeb, bool> predicate)
{
    foreach (SPWeb web in webs.Where(predicate))
    {
        action(web);
        web.Dispose();
    }
}

Which would let us do something like this to print all publishing sites in a collection:

site.AllWebs.ForEach(
    w => { Console.WriteLine(w.Title); },
    w => { return w.Features.Contains(FeatureIds.OfficePublishingWeb); }
);

That is, if we define yet another useful, if simple, extension method:

public static bool Contains(this SPFeatureCollection features, Guid featureID)
{
    return features[featureID] != null;
}

And a final note: why did the LINQ team use Func<T, bool> instead of Predicate<T>, which has existed since .NET 2.0?

Multi-Purpose PowerShell Using Function

It’s always bothered me that there isn’t a clean way to deal with IDisposables in PowerShell. It seems Adam Weigert came to the same conclusion and implemented a using function much like the statement found in C# and VB. Note that he also makes use of his implementation of PowerShell try..catch..finally, which is pretty slick. Meanwhile, I’m told Raymond Mitchell has his own using function that he uses to load assemblies, which certainly makes sense to me.

I figure the next evolution is to provide a generic using that covers all the bases:

function using {
    param (
        $inputObject = $(throw "The parameter -inputObject is required."),
        [ScriptBlock] $scriptBlock
    )

    if ($inputObject -is [string]) {
        if (Test-Path $inputObject) {
            [system.reflection.assembly]::LoadFrom($inputObject)
        } elseif($null -ne (
              new-object System.Reflection.AssemblyName($inputObject)
              ).GetPublicKeyToken()) {
            [system.reflection.assembly]::Load($inputObject)
        } else {
            [system.reflection.assembly]::LoadWithPartialName($inputObject)
        }
    } elseif ($inputObject -is [System.IDisposable] -and $scriptBlock -ne $null) {
        Try {
            &$scriptBlock
        } -Finally {
            if ($inputObject -ne $null) {
                $inputObject.Dispose()
            }
            Get-Variable -scope script |
                Where-Object {
                    [object]::ReferenceEquals($_.Value.PSBase, $inputObject.PSBase)
                } |
                Foreach-Object {
                    Remove-Variable $_.Name -scope script
                }
        }
    } else {
        $inputObject
    }
}

Some notes on the code:

  • If $inputObject is a string, I assume it’s an assembly reference…
    • If the string is a path, load as a path
    • Rather than parse the string, I figure the framework knows best; the presence of a PublicKeyToken means it’s probably a full assembly name
    • I considered adding support for this alternative to LoadWithPartialName, but I don’t feel like managing a global “assembly map”; the deprecated shortcut will have to do for now
  • If $inputObject is IDisposable and a script block was supplied…
    • Wrap script execution in Try..Finally to make sure we get to Dispose()
    • Here I disagree with Adam – if the PSObject’s Dispose method was overridden, we should assume it was done for good reason (more on this in a later post) and that the override will respect the object’s disposability.
    • After disposal, I thought it might be nice to take the variable out of scope like C#/VB. Using -scope script will look at variables in the scope where our function was called, and since we don’t know what $inputObject was named before it was passed in, I just compare references instead.
  • Otherwise just punt the object along in the pipeline

Usage

Loading assemblies is pretty straightforward:

using System.Windows.Forms
using 'System.Web, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'
using C:\Dev\DisposeTest\bin\Debug\DisposeTest.dll

To test IDisposable handling, I use a simple disposable object which returns its hash code in ToString(), instantiated by a helper function:

function new-disp { new-object DisposeTest.Disposable }

To verify that variable scope is handled properly, we need two test scripts. gv is an alias for Get-Variable.

NestedTest.ps1

gv x
using ($x = new-disp) { gv x }
gv x

UsingTest.ps1

$x = 'X'
.\NestedTest.ps1
using ($y = new-disp) { gv y }
gv y

From the behavior in C#/VB, we expect that the object being ‘used’ will only be available within the scope of the script block. So when we enter NestedTest.ps1, we should see the $x remains ‘X’, inherited from the parent scope, both before and after the using statement. Similarly, we expect $y will not be accessible outside of the using block:

SharePoint Example

using Microsoft.SharePoint
using ($s = new-object Microsoft.SharePoint.SPSite('http://localhost/')) {
  $s.AllWebs | %{ using ($_) { $_ | select Title, Url } }
}
if($s -eq $null) { 'Success!' }

It’s not exceedingly friendly for interactive mode, particularly for tab completion, but it should aid script readability.