Find large files through Search in SharePoint 2010

How to get a list of large files in a (very large) SharePoint 2010 Farm, sorted by file size?

I discussed about this requirement with my friend and teammate Riccardo Celesti and we ended up with the same answer: use search!

Unfortunately, since the customer’s environment is still running on top of SharePoint Server 2010, the solution was not that immediate: we would need to add a metadata property mapped to the file size (and schedule a full crawl afterwards, which may be not feasible).

Or… it’ SharePoint 2010, right? We still have the FullText SQL Query Language available.

Typically, the SQL Query Language should not be an option, since this feature has been removed in SharePoint 2013 and was already considered obsolete in SharePoint 2010 (take a look at the “tip” at the beginning the MSDN page).

Anyway, this was a “fire and forget” requirement, so here’s what we ended up with:

Add-PSSnapin Microsoft.SharePoint.PowerShell
$site = Get-SPSite http://the-url-of-a-site
$query = new-Object Microsoft.Office.Server.Search.Query.FullTextSqlQuery($site);
$query.ResultTypes = "RelevantResults"
$query.RowLimit = 1000
$query.EnableStemming = $false
$query.TrimDuplicates = $false
$query.QueryText = "SELECT URL, filename, size, path FROM SCOPE() WHERE IsDocument=1 ORDER BY size DESC"
$results = $query.Execute()
$table = New-Object System.Data.DataTable
$table.Load($results["RelevantResults"], "OverwriteChanges")
$table | ogv

 

Hope useful 🙂

PowerShell Profiles and Processor Architectures

I’m using PowerShell profiles quite extensively, especially because I’m typically working on several (virtual) machines and profiles are definitely easy to keep settings in sync.

Profiles are independent from the Processor Architecture, which means that you should be careful when loading snap-ins or invoking features that rely on a specific architecture.

One way to ensure that your profile scripts run smoothly on both x86 and x64 processes is to apply conditional logics based on the PROCESSOR_ARCHITECTURE environment variable ($env:PROCESSOR_ARCHITECTURE).

I would suggest you to perform this check even if you think that you will only be executing scripts with the x64 Shell.

You may think “no, I’ll never be using the x86 version of PowerShell”… you’re probably right, until you try to launch a script from within the Visual Studio IDE 🙁

Uffa

PowerShell “here” strings

I’m using verbatim strings in C# quite a lot. Even when I have no explicit need, for example because there’s no escape sequence nor line break and I could just avoid typing the extra @ at the beginning.

I definitely like the fact that there’s a similar syntactic behavior in PowerShell too.

These are called “here” strings (more information here).

Although these two features have much in common, you have to pay some more attention when using here-strings in PowerShell, since the behavior is not *exactly* the same you get when you write C# code.

For example, line breaks are mandatory.

The “start sequence” tokens are indeed @”+CRLF or @’+CRLF.

And the same applies to “end sequence” tokens as well (CRLF+@” or CRLF+@’ respectively).

And in case you are wondering what’s the difference between the single quote vs. double quote syntax, well… the single quote version disables escape sequences, whereas the double quote version does not.

Strings have always been tricky.

When I started writing C/C++ code against the Windows API I was struggling with LPSTR, LPTSTR, LPCSTR, LPCTSTR to name a few (probably any combination of characters here would be a valid name, I’m sure there’s a typedef or a MACRO definition somewhere even for LPABCSTR :)).

Then came COM and BSTR and smart pointers and… and that was real pain!!

How to quickly identify large lists with PowerShell

It’s easy, and it’s just one line (without word wrapping J).

Get-SPWebApplication http://webappurl | Get-SPSite | Get-SPWeb |% { $_.Lists | select @{n=“Url”;e={$_.RootFolder.ServerRelativeUrl} }, Title, ItemCount } | sort ItemCount -Descending

Here I’m traversing the contents of a Web Application in order to iterate over each List (in each Web, in each Site of the Web Application):

Get-SPWebApplication http://webappurl | Get-SPSite | Get-SPWeb |% { $_.Lists | … }

Then, I’m using the Select-Object command-let to add a custom object to the pipeline, built from the list data using both standard properties (Title, ItemCount) and a calculated property for the list url:

{ $_.Lists | select @{n=“Url”;e={$_.RootFolder.ServerRelativeUrl} }, Title, ItemCount }

Finally, it’s just a matter of manipulating the custom objects collections, applying sorting, filtering, grouping or any other set operations according to your specific needs.

Here’s the output you may receive:

In addition, you may choose to bring these data into Excel for further analysis, which is extremely easy to achieve adding the Export-CSV command-let at the end of the pipeline.

Here’s a sample Excel spreadsheet generated from the data above, where I’ve applied a custom number filter to the data set and I have created a line chart on the data series:

How to issue Http Web Requests to a SharePoint 2010 site with FBA and SSL

The .NET Framework provides some handy classes that help you manage HTTP communication through requests and response objects.

Everything is plain and simple, as long as you are connecting to a resource endpoint that does not require authentication and is available through an unsecure channel (i.e. HTTP). Which, of course, is not always the case.

I had a requirement to “invoke” a SharePoint 2010 resource (a file, for example) that is protected by Forms Based Authentication and is exposed by a secured SSL channel. The task is slightly more complicated, since it has to be performed by a client that has no easy way to invoke web services using an autogenerated proxy. That is, no “Add Web Reference” available. Indeed, no RAD environment at all.

You can find my solution in the code snippet below. I have written it with PowerShell just as a prototyping tool, it will have to be translated. But anyway some interesting points can be highlighted:

  • I used the .NET Fx API to “ignore” SSL certificate warnings. This may not be an option sometimes, but the solution can be extended adding some certificate chain verification or whatever you need to check. This is done setting the ServerCertificateValidationCallback static property (line 1).
  • I had to invoke the Authentication.asmx SharePoint Web Service in order to authenticate and, then, reuse the authentication cookie in subsequent requests. This is a requirement for FBA access. I handcrafted the SOAP message since there’s no easy-to-use helper/proxy (line 2-24).
  • I finally issued my HTTP request, using the authentication cookie and adding an HTTP header (line 26-28).
   1: [System.Net.ServicePointManager]::ServerCertificateValidationCallback = {$true}

   2: $cookieContainer = new-object System.Net.CookieContainer

   3: $authEnvelope = '<?xml version="1.0" encoding="utf-8"?>

   4:                 <soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">

   5:                   <soap:Body>

   6:                     <Login xmlns="http://schemas.microsoft.com/sharepoint/soap/">

   7:                       <username>yourusername</username>

   8:                       <password>yourpassword</password>

   9:                     </Login>

  10:                   </soap:Body>

  11:                 </soap:Envelope>';

  12: $encoder = [System.Text.Encoding]::UTF8

  13: $authEnvelopeBytes = $encoder.GetBytes($authEnvelope)

  14:  

  15: $authRequest = [System.Net.HttpWebRequest]([System.Net.HttpWebRequest]::Create("https://yoursite/_vti_bin/authentication.asmx"))

  16: $authRequest.CookieContainer = $cookieContainer

  17: $authRequest.Headers.Add("SOAPAction", "http://schemas.microsoft.com/sharepoint/soap/Login");

  18: $authRequest.ContentType = "text/xml; charset=utf-8";

  19: $authRequest.ContentLength = $authEnvelopeBytes.Length

  20: $authRequest.Method = "POST";

  21: $authRequestStream = $authRequest.GetRequestStream()

  22: $authRequestStream.Write($authEnvelopeBytes, 0, $authEnvelope.Length)

  23: $authRequestStream.Flush()

  24: $authRequest.GetResponse()

  25:  

  26: $request = [System.Net.HttpWebRequest]([System.Net.HttpWebRequest]::Create("https://yoursite/yourcontent"))

  27: $request.CookieContainer = $authRequest.CookieContainer

  29: $request.GetResponse()

 

SharePoint Management Shell vs standard PowerShell console

You know that most of the administrative tasks you perform on a SharePoint Farm can be accomplished using PowerShell scripting. You get a bunch of commandlets (more than 600 indeed!) that are registered through the Microsoft.SharePoint.PowerShell snapin, and you can even add your own commandlets using standard SharePoint deployment techiques.

That’s why sometime you just end up firing the default PowerShell console and typing

Add-PSSnapIn Microsoft.SharePoint.PowerShell

Well… you get something that is “similar” to the SharePoint Management Shell Smile

The SharePoint Management Shell is nothing more than a standard PowerShell console (powershell.exe) where tha snapin is automatically loaded upon startup, leveraging a startup script:

C:\Windows\System32\WindowsPowerShell\v1.0\PowerShell.exe  -NoExit  ” & ‘ C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\CONFIG\POWERSHELL\Registration\\sharepoint.ps1 ‘ “

The sharepoint.ps1 script is just a few lines of code:

$ver = $host | select version
if ($ver.Version.Major -gt 1)  {$Host.Runspace.ThreadOptions = “ReuseThread”}
Add-PsSnapin Microsoft.SharePoint.PowerShell
Set-location $home

In the above snippet you can notice the automatic loading of the SharePoint snapin and a “change directory” instruction that brings you to your default home directory (tipically, c:\users\yourusername).

But there’s an additional line:

if ($ver.Version.Major -gt 1) {$Host.Runspace.ThreadOptions = “ReuseThread”}

This line sets the PSThreadOptions of the PowerShell runspace to ReuseThread, which means that the same thread is used for all command line invocations since the shell has started.

The other options are (see http://msdn.microsoft.com/it-it/library/system.management.automation.runspaces.psthreadoptions(v=vs.85).aspx):

  • Default
    On the local computer,
    UseNewThread is used for runspaces and ReuseThread is used for runspace pools. Server settings are used for remote runspaces and runspace pools. This field is introduced in Windows PowerShell 2.0.
  • ReuseThread
    Creates a new thread for the first invocation and then re-uses that thread in subsequent invocations. This field is introduced in Windows PowerShell 2.0.
  • UseCurrentThread
    Execution occurs on the thread that called the Invoke method. This option is not valid for asynchronous calls. This field is introduced in Windows PowerShell 2.0.
  • UseNewThread
    Creates a new thread for each invocation. This field is introduced in Windows PowerShell 2.0.

If you want a proof of this behavior, just run both a standard powershell console and a SharePoint Management Shell, then run the followind command a few times in each shell.

[Threading.Thread]::CurrentThread.ManagedThreadId

This instruction prints the internal identifier of the managed thread (i.e. the CLR Thread object)

You’ll notice that you get the same value in the SharePoint Management Shell, whereas you get different values in the standard one. By the way, PowerShell ISE uses the ReuseThread option too.

What’s the issue with the default value, i.e. having a new thread spawn (or picked up from a pool) for every command invocation?

Well… first of all there are some SharePoint objects that are not completely thread safe, sou you should pay careful attention when using them from different threads (even if, in this particular case, you have multiple threads that are not executed in parallel Smile)

Another noticeable issue you may experience, though, is resource leakage: if you create SharePoint objects in one thread and suddenly have no reference to that thread any longer, you end up wasting resources that are not disposed correctly.

Wrapping it up… the SharePoint Management Shell is nothing special, but it’s a little bit more than adding a snap in!!

Efficiently purge large lists

Sometimes you may need to clear the contents of a huge SharePoint list.

You can do this through the web user interface, but you will probably face some limitations: bulk operations are limited to 100 elements, the folder structure may be an obstacle, etc…

If you prefer writing script code (and if you have remote access to a farm server), you may gain quite a lot of time.

Here’s how:

function ClearList([Microsoft.SharePoint.SPWeb]$web, [Microsoft.SharePoint.SPList]$list)

{

    $sbDelete = new-object System.Text.StringBuilder

    $sbDelete.Append("<?xml version=`"1.0`" encoding=`"UTF-8`"?><Batch>")

    $command = "<Method><SetList>" + $list.ID + "</SetList><SetVar Name=`"ID`">{0}</SetVar><SetVar Name=`"Cmd`">Delete</SetVar></Method>";

    $items = $l.Items

    foreach ($item in $items)

    {

        [void]$sbDelete.Append([System.String]::Format($command, $item.ID.ToString()))

    }

    [void]$sbDelete.Append("</Batch>");

    try

    {

        $web.ProcessBatchData($sbDelete.ToString());

    }

    catch [Exception] {

        Write-Host $_.Exception.ToString()

    }   

}

This can be used just for SharePoint list, for document libraries you should use a slightly different approach, anyway…

Hope useful!

Setting IIS Mime Type via script

If you need to add (or edit) a mime type entry stored in IIS configuration files, you have a few tools that you can use.

First off, and this is by far the easiest way, you can leverage the IIS Manager.

  • Open inetmgr.exe
  • Select the “scope”
  • Select the Mime Types feature
  • Edit an existing entry or add a new one by clicking the “Add..” action in the right pane

Of course, if you need to automate these settings, the graphical UI is not the tool of your choice.

You could, instead, use scripting, either leveraging the appcmd console application that is part of the IIS management tools, or writing a couple of lines of code (PowerShell, why not?).

The approach I prefer is using appcmd, which is not so complicated indeed.

For example, here I’m registerint the video/mp4 mime type for the .mp4 extemsion (type the following command in a single line, here you may find line breaks for reading convenience):

appcmd.exe set config “Default Web Site” -section:system.webServer/staticContent /+”[fileExtension=’mp4′,mimeType=video/mp4]”

PowerShell – Importing and Exporting the command History

You know… one of the key architectural design of PowerShell states that “everything is an object”.

So, once you start thinking with this approach in mind, even some tedious task can become extremely easy.

Consider, for example, a situation when you have typed tons of commands (or a combination of those) in a PowerShell console. and you wish to “save your work” without having to retype everything again.

If you are thinking abuot copying and pasting everything from the console ui (whose screen buffer size you had previously set to 1 million rows, didn’t you? Smile), wait a while and try to execute:

Get-History

You’ll get back a collection of objects that represent the commands you have typed so far.

And of course, since they are PowerShell objects, they can be serialzed and saved just by executing the Export-Clixml commandlet:

Get-History | Export-Clixml -Path c:\yourhistoryfile.xml

Then, just close your console, run it again and run type something like:

Add-History -InputObject (Import-Clixml -Path c:\yourhistoryfile.xml)

Now, type Get-History again and… voila, everything’s there!

You can now execute one of the commands just by running the Invoke-History cmdlet, passing in the command index as returned by the Get-History output.

Easy, isn’t it?

PowerShell input parameters – No script is an island!

A script is almost always dependent on values provided by the caller, or at least on the environment that hosts the script execution.

Thus, if it’s true that the “core” of a script is its embedded logic, it is also very important to write a robust data collection strategy: parameters, type checking, value checking and fallback logics are key topics in this area.

Enter the power of script blocks!

One of my favorite features of the PowerShell scripting language is its use of script blocks, something similar to anonymous functions which you may be used to writing lambda expressions in C# or using some functional language of your choice.

A nice use of script blocks is for script parameters initialization.

Consider, for example, this simple script file:

Param([int]$n)

$n * 10

This extremely useful 🙂 computation relies on an integer being passed in by the caller.

But what happens if the user forgets to pass a number to the script?

Well… nothing indeed, the $n variable will have a default value of 0 and the script will return 0.

If 0 is not a good value for you, you can use default values for parameters:

Param([int]$n = 1)

$n * 10

Which returns 10 (10 * 1) if no input is received.

What if you want to do some value checking, instead, in order to ensure that a value is provided explicitly by the user?

A default parameter value can be an expression… oops… a script block!

Just try this, if you want to throw an error if no input is available:

Param([int]$n = $(throw “No input”))

$n * 10

Or if you want to give your users a last chance, you could force him to provide data by explicitly requesting it:

Param([int]$n = $(Read-Host -Prompt “Tell me more….”))

$n * 10

Cool, isnt’it?