SharePoint Management Shell vs standard PowerShell console

You know that most of the administrative tasks you perform on a SharePoint Farm can be accomplished using PowerShell scripting. You get a bunch of commandlets (more than 600 indeed!) that are registered through the Microsoft.SharePoint.PowerShell snapin, and you can even add your own commandlets using standard SharePoint deployment techiques.

That’s why sometime you just end up firing the default PowerShell console and typing

Add-PSSnapIn Microsoft.SharePoint.PowerShell

Well… you get something that is “similar” to the SharePoint Management Shell Smile

The SharePoint Management Shell is nothing more than a standard PowerShell console (powershell.exe) where tha snapin is automatically loaded upon startup, leveraging a startup script:

C:\Windows\System32\WindowsPowerShell\v1.0\PowerShell.exe  -NoExit  ” & ‘ C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\CONFIG\POWERSHELL\Registration\\sharepoint.ps1 ‘ “

The sharepoint.ps1 script is just a few lines of code:

$ver = $host | select version
if ($ver.Version.Major -gt 1)  {$Host.Runspace.ThreadOptions = “ReuseThread”}
Add-PsSnapin Microsoft.SharePoint.PowerShell
Set-location $home

In the above snippet you can notice the automatic loading of the SharePoint snapin and a “change directory” instruction that brings you to your default home directory (tipically, c:\users\yourusername).

But there’s an additional line:

if ($ver.Version.Major -gt 1) {$Host.Runspace.ThreadOptions = “ReuseThread”}

This line sets the PSThreadOptions of the PowerShell runspace to ReuseThread, which means that the same thread is used for all command line invocations since the shell has started.

The other options are (see http://msdn.microsoft.com/it-it/library/system.management.automation.runspaces.psthreadoptions(v=vs.85).aspx):

  • Default
    On the local computer,
    UseNewThread is used for runspaces and ReuseThread is used for runspace pools. Server settings are used for remote runspaces and runspace pools. This field is introduced in Windows PowerShell 2.0.
  • ReuseThread
    Creates a new thread for the first invocation and then re-uses that thread in subsequent invocations. This field is introduced in Windows PowerShell 2.0.
  • UseCurrentThread
    Execution occurs on the thread that called the Invoke method. This option is not valid for asynchronous calls. This field is introduced in Windows PowerShell 2.0.
  • UseNewThread
    Creates a new thread for each invocation. This field is introduced in Windows PowerShell 2.0.

If you want a proof of this behavior, just run both a standard powershell console and a SharePoint Management Shell, then run the followind command a few times in each shell.

[Threading.Thread]::CurrentThread.ManagedThreadId

This instruction prints the internal identifier of the managed thread (i.e. the CLR Thread object)

You’ll notice that you get the same value in the SharePoint Management Shell, whereas you get different values in the standard one. By the way, PowerShell ISE uses the ReuseThread option too.

What’s the issue with the default value, i.e. having a new thread spawn (or picked up from a pool) for every command invocation?

Well… first of all there are some SharePoint objects that are not completely thread safe, sou you should pay careful attention when using them from different threads (even if, in this particular case, you have multiple threads that are not executed in parallel Smile)

Another noticeable issue you may experience, though, is resource leakage: if you create SharePoint objects in one thread and suddenly have no reference to that thread any longer, you end up wasting resources that are not disposed correctly.

Wrapping it up… the SharePoint Management Shell is nothing special, but it’s a little bit more than adding a snap in!!

SharePoint Internet Sites – Performance Optimization for Data Access (1)

Then it comes to displaying something on your pages!

I presume that you are building a dynamic web site 🙂

By “dynamic” I mean that its contents comes from a data storage (SharePoint being one of these) where it is written in a serialized format used for persistence. These data are then extracted from the data store during the request processing phase, and some sort of transformation is applied to make them presentable as HTML markup on your final pages.

This process (extract => transform => render) can be extremely slow, and you may guess the reason for this consideration: if you wish to load a million rows from a database table (or 4K rows from a SharePoint list), then transform the resultset in XML format, then apply a complex XSLT transformation that finally produces 4 bytes of HTML markup, it’s clear that something is missing from your architecture design!

But even if you optimize every single step in the above process, you may end-up with a CPU and memory consumption that is excessive in heavy load scenarios.

The answer seems obvious: the best way to reduce the time required to load data from a persistent storage is… just not reading anything at all!

That is, use caching!

Cache?

Cool, you may say, and while saying “cool” you enable output caching on every page of your portal.

It will take just a couple of minutes for you to receive a phone call be your customer, saying that the users are complaining about strange behaviors during navigation. For example:

  • “I logged in but the name that is displayed on the top of the page is not my name”
  • “I did not add any item to the cart, but suddenly I see the cart is filling up with articles I’m not even interested about!”

Well, caching has its own drawbacks, for sure.

Here’s a list of pain points you need to be aware of:

  • Cache data should be saved somewhere, and it will consume resources
  • Windows processes do not share memory (unless you do this explicitly, which I don’t suggest anyway), so in a multi-server scenario you get duplicated information (one copy of each data set for each process serving http requests)
  • Sometimes you end up having multiple processes, even on a single WFE server topology (this is called web gardening)
  • If you choose to externalize data to a common, shared location, you probably need to consider data serialization as a limitation (you can save a string, but you cannot save an XslCompiledTransform instance, just to give you an example)
  • Once you put data into a caching location, this data becomes old, unless you implement a valid cache invalidation mechanism
  • This cache invalidation mechanism is often hard to implement
  • Coding can be tricky
  • Coding can be error prone (you should never rely on a copy of your data being available in the caching storage)

This list is by no mean a suggestion to avoid caching. On the contrary, I strongly suggest you to apply caching whenever it fits.

Therefore, I would like to summarize what SharePoint offers OOTB, trying to provide you some best practices in each case.

Cache

You get three different flavors of cache in SharePoint 2010.

Here’s a small diagram that display them, giving you some background that we will use later to discuss about when you should use any of these techniques.

Object caching

In a word: use it!

SharePoint uses it by default as an optimization for some key components of a typical web site (ContentByQueryWebParts, Navigation structure, etc…).

You should just be aware that some query filters (for example, one based on the current user) makes it not applicable (and indeed the site query engine prevents caching in these situations).

And…

<developerOnly>

I would encourage you to use object caching when you write code against the SharePoint server object model.

How? You cannot explicitly query the cache structure, but you should use classes (SPSiteDataQuery, CrossSiteQueryInfo and CrossSiteQueryCache) that can do the hard work for you. This is transparent, which is fine since you can forget about check for null data or stale data: everything is under the control of the Cache Manager.

Output Caching

In a word: always consider output caching while designing and developing pages and page components, and try to apply a design that makes output caching applicable.

A little example could be helpful in this case.

Imagine you have implemented a page layout that displays a lot of aggregated data coming from external resources. This data takes quite a long time to load, and the presentation layer takes some time to render it too. Plus, this data does not change very often, so you should not worry about invalidation.

This is a perfect candidate for output caching, unless for a very small portion of the page layout, more specifically a box that displays weather information reading it from an external RSS service, filtered by the location that a user has specified in his profile settings.

If you apply output caching to the page layout, every user will see the weather for a single location (the one of the first user hitting the page), and the weather will be constant for the whole duration of the page layout caching time.

This should not be an obstacle to applying output caching to the page layout. How can you do this?

Here’s a coupe of possible approaches:

  • Use a combination of AJAX requests and JS elaboration to read information “on the fly” and transform the page accordingly. The html code of the page can be “weather ignorant”, since the only pieces remaining there are an empty container and the client script code that issues the asynchronous HTTP request and parses the results producing the final markup. And both the empty container and the script code can be cached!
  • Use Post Cache Substitution. This is a somewhat complex technique (I mean, it’s easy for simple tasks, but it may get tricky easily). In a nutshell, you register a control for post cache substitution, and the ASP.NET runtime calls back your control asking for a string value the it will insert into the page exactly where the control markup had been rendered, replacing it with something else. The page keeps being cached, although part of it are indeed recalculated for every request.

Blob Caching

I’m mentioning Blob Caching here for the sake of completeness. But I would like to point out that it is not at all related to data or markup caching, so it does not reduce the computation and rendering time of a page “per-se”. It creates copies of static resources (css, js, images, etc… you can specify the resource by extension) that are saved to the file system of each web frontend server. An http module is responsible of the resource retrieval, effectively bypassing need for the document to be loaded from SharePoint (then from SQL, which is expensive if compared to raw filesystem access).

I’m going to talk about Blob Caching in a future part of this articles series, but I hope that this was enough to explain at least what blob caching is, especially compared to the other available caching techniques.

Tools

That said, what tools can help you investigate data access issues related to caching?

Here I’ll name a few, but consider that this list is by no mean exhaustive.

  • SharePoint logging
    • ULS logs contain information about Cross Site Queries, which may or may not use caching
    • Logging database for blocking queries reports (a blocking query is a good candidate for substitution with some data access logics)
  • Developer Dashboard
    • You get the execution time at a very detailed level, which may help you investigate which part of the page lifecycle needs further optimization
    • If you are a developer, you can use the SPMonitoredScope for instrumentation
  • Performance counters
    • Monitoring resource consumption you may discover that you need some caching optimization
    • ASP.NET provides several counters related to its Cache Engine
  • DbgView
    • You can output trace messages that will be consumable even on a live production server. This is not related to caching by itself, but it can definitely be a useful companion

XSLT and verbatim content with CDATA sections

Here’s a quick tip you may find useful while writing some – maybe complex – XSLT transformations.

I adopted this solution recently for some library code that emits RSS complaint XML documents.

My need was to create XML fragments that contain unescaped HTML content, so that I could easily create HTML formatted elements values without worrying too much about HTML encoding.

Of course, CDATA sections were a suitable (and easy) option.

But you cannot include CDATA sections as such in the processing transformation, since it would have caused an automatic escaping of the inner content…. unless… you specify that one or more of the output elements will indeed contain CDATA sections.

You can achieve this goal just by specifying a cdata-section-elements as an attribute of the XSL element output (see this excerpt from the XSL reference documentation):

<output>
method = xml | html | text | qname-but-not-ncname
version = nmtoken
encoding = string
omit-xml-declaration = yes | no
standalone = yes | no
doctype-public = string
doctype-system = string
cdata-section-elements = qnames
indent = yes | no
media-type = string
Model: EMPTY
</output>

Credits go to Bernie Zimmermann, whose detailed post you can find here: http://www.bernzilla.com/2008/02/12/utilizing-cdata-section-elements-in-xsl/

Efficiently purge large lists

Sometimes you may need to clear the contents of a huge SharePoint list.

You can do this through the web user interface, but you will probably face some limitations: bulk operations are limited to 100 elements, the folder structure may be an obstacle, etc…

If you prefer writing script code (and if you have remote access to a farm server), you may gain quite a lot of time.

Here’s how:

function ClearList([Microsoft.SharePoint.SPWeb]$web, [Microsoft.SharePoint.SPList]$list)

{

    $sbDelete = new-object System.Text.StringBuilder

    $sbDelete.Append("<?xml version=`"1.0`" encoding=`"UTF-8`"?><Batch>")

    $command = "<Method><SetList>" + $list.ID + "</SetList><SetVar Name=`"ID`">{0}</SetVar><SetVar Name=`"Cmd`">Delete</SetVar></Method>";

    $items = $l.Items

    foreach ($item in $items)

    {

        [void]$sbDelete.Append([System.String]::Format($command, $item.ID.ToString()))

    }

    [void]$sbDelete.Append("</Batch>");

    try

    {

        $web.ProcessBatchData($sbDelete.ToString());

    }

    catch [Exception] {

        Write-Host $_.Exception.ToString()

    }   

}

This can be used just for SharePoint list, for document libraries you should use a slightly different approach, anyway…

Hope useful!

Refinement Filter Generator – Show more links

Refinements are a welcome addition to the Search/Query capabilities in the default SharePoint 2010 user interface. I remember that most of our SharePoint 2007 projects that involved some search results customization have been implemented relying on the Faceted Search project (still available on CodePlex) or on some commercial tolls (Ontolica, now Surfray, being one of those).

The way refinement panels work is somewhat complicated, as it entails server components (aka filter generators), server controls (OOTB) as well as some XSLT tricks that you can use to customize the refinement UI.

One of these tricks determines how the “Show More Links” feature is realized.

You need the help of the filter generator in order to discriminate between “visible” links and links that are instead hidden, waiting for an explicit request by the user.

The filter generator returns a bunch of XML markup, where you may have two sections (two elements) that contain either “top results” and “all results” (the threshold is typically defined by a property of the generator instance).

You may end-up with something like this [most of the code has been omitted]:

XmlElement element = filterXml.CreateElement("FilterCategory");

Element.SetAttribute("Id", category.Id);

element.SetAttribute("ShowMoreLink", category.ShowMoreLink);

element.SetAttribute("MoreLinkText", category.MoreLinkText);

element.SetAttribute("LessLinkText", category.LessLinkText);

XmlElement topElements = filterXml.CreateElement("Filters");

XmlElement moreElements = filterXml.CreateElement("MoreFilters");

Of course, most of the hard work is done by the logics of the refinement filter generator.

But… there’s something that is up to you – the site builder – which is of course the presentation of these aggregations.

OOTB, SharePoint 2010 comes with some XSLT that probably fits most of your needs.

But this may not be enough when, for example, you have developed a custom filter generator (or you have downloaded or bought one) and you need to make it available.

The example I’m bringing to your attention, and which of course gives the title to this post, is related to the way the OOTB XSLT renders the “All Results” fragment.

Just take a look and try to find-out what I’m talking about:

<xsl:if test="$ShowMoreLink=’TRUE’">

<xsl:variable name="MoreFilters" select="MoreFilters/Filter" />

<xsl:choose>

<xsl:when test="$FilterCategoryId and ($FilterCategoryId != ”) and ($FilterCategoryType = ‘Microsoft.Office.Server.Search.WebControls.ManagedPropertyFilterGenerator’)">

</xsl:when>

<xsl:when test="$FilterCategoryId and ($FilterCategoryId != ”) and ($FilterCategoryType = ‘Microsoft.Office.Server.Search.WebControls.TaxonomyFilterGenerator’)">…

</xsl:when>

<xsl:choose>

Got it?

The OOTB XSLT template applies (and then renders) the “All Results” section differently according to the type (the .NET Type) of the refinement filter generator. In the above snippet, there’s a “choose/when” statement (the equivalent of the “switch/case” or “Select/Case” syntax construct you may be familiar with) that is there exactly for this purpose.

Needless to say that, if you need to render links also for refinement panels based on your custom generator, you also need to update the XSLT transformation accordingly:

<xsl:when test="$FilterCategoryId and ($FilterCategoryId != ”) and ($FilterCategoryType = ‘GreenTeam.SharePoint2010.EnterpriseSearch.MultiValueFilterGenerator, GreenTeam.SharePoint2010.EnterpriseSearch, Version=1.0.0.0, Culture=neutral,PublicKeyToken=7b398aba874a4ea9’)">

</xsl:when>

Voila!

PowerShell – Importing and Exporting the command History

You know… one of the key architectural design of PowerShell states that “everything is an object”.

So, once you start thinking with this approach in mind, even some tedious task can become extremely easy.

Consider, for example, a situation when you have typed tons of commands (or a combination of those) in a PowerShell console. and you wish to “save your work” without having to retype everything again.

If you are thinking abuot copying and pasting everything from the console ui (whose screen buffer size you had previously set to 1 million rows, didn’t you? Smile), wait a while and try to execute:

Get-History

You’ll get back a collection of objects that represent the commands you have typed so far.

And of course, since they are PowerShell objects, they can be serialzed and saved just by executing the Export-Clixml commandlet:

Get-History | Export-Clixml -Path c:\yourhistoryfile.xml

Then, just close your console, run it again and run type something like:

Add-History -InputObject (Import-Clixml -Path c:\yourhistoryfile.xml)

Now, type Get-History again and… voila, everything’s there!

You can now execute one of the commands just by running the Invoke-History cmdlet, passing in the command index as returned by the Get-History output.

Easy, isn’t it?

PowerShell input parameters – No script is an island!

A script is almost always dependent on values provided by the caller, or at least on the environment that hosts the script execution.

Thus, if it’s true that the “core” of a script is its embedded logic, it is also very important to write a robust data collection strategy: parameters, type checking, value checking and fallback logics are key topics in this area.

Enter the power of script blocks!

One of my favorite features of the PowerShell scripting language is its use of script blocks, something similar to anonymous functions which you may be used to writing lambda expressions in C# or using some functional language of your choice.

A nice use of script blocks is for script parameters initialization.

Consider, for example, this simple script file:

Param([int]$n)

$n * 10

This extremely useful 🙂 computation relies on an integer being passed in by the caller.

But what happens if the user forgets to pass a number to the script?

Well… nothing indeed, the $n variable will have a default value of 0 and the script will return 0.

If 0 is not a good value for you, you can use default values for parameters:

Param([int]$n = 1)

$n * 10

Which returns 10 (10 * 1) if no input is received.

What if you want to do some value checking, instead, in order to ensure that a value is provided explicitly by the user?

A default parameter value can be an expression… oops… a script block!

Just try this, if you want to throw an error if no input is available:

Param([int]$n = $(throw “No input”))

$n * 10

Or if you want to give your users a last chance, you could force him to provide data by explicitly requesting it:

Param([int]$n = $(Read-Host -Prompt “Tell me more….”))

$n * 10

Cool, isnt’it?

SharePoint Internet Sites – Performance Optimization for IT Professionals

If you have read the introduction to this articles series, you will know that Web Sites implementation should be done by an etherogenous team. A senior and clever system administrator should be part of this team.

Why?

Well, first of all SharePoint needs to be installed (this is easy) and configured (this is not always easy). I should say it should be configured well, with security and performance in mind. And this, believe me, is not easy at all.

This cannot be a guide to SharePoint configuration (I suggest you get a book on this topic, where you will find valuable information on each and every configuration topic).

But anyway I would like to point-out something you should consider especially within public web sites projects.

Network I/O

This may seem obvious, but a low network throughput is one of the most frequent reasons why you get slow response time (and unsatisfied users!).

As a system administrator you are not always in charge of network connectivity, especially when the web site is hosted by an ISP. But as an expert, you should always give suggestions to your customer and be prepared to test the network connectivity, defining metrics and possibly a baseline that you will use for simulations when you will perform stress tests.

Sometimes, though, you control part of the network of the hosting system: maybe not the peripheral segment, but the internal segment is often on your control.

Here you may suffer from a very high latency in server-to-server communication. Please, do not use a 10/100 cable to connect your SharePoint servers to the SQL backend!

And even if the network connectivity between the servers is considered good in low traffic conditions, you should consider isolating the SharePoint farm and its SQL back-end in a private subnet, maybe planning for multihoming. This way you will reduce the “noise” that other services could introduce into the network traffic, preventing contention with the packets that the SharePoint services generate.

The Microsoft Windows Performance Monitor is a great tool that can help you investigate these issues. Combining HTTP traffic reports generated by a Fiddler session can also be a valid aid, although you need some elaboration over the data you will collect.

Disk I/O

Network connectivity is not the only point you should pay attention to: disk I/O may be another bottleneck if you buy a 99$ external hard drive for your SQL data files!

As usual, you need some capacity planning beforehand, as well as some baseline and some support tool.

I would suggest you take a look at these two valuable resources related to capacity planning and SQL I/O subsystem measurement:

 

Authentication

Your web site will, probably, be accessible to anonymous users and to authenticated users as well.

What is the authentication authority you are going to use? The answer to this question may require some special consideration, since it may involve SSL protection (SSL is secure, but it adds some overhead due to traffic decryption) or the connection to an external authentication authority you trust.

The claims based authentication that SharePoint 2010 supports in centered on the concept of security tokens that are typically saved as cookies and, as such, passed back and forth increasing the requests payload: if you start playing with claims augmentation and have dozens of claims assignable to users, your security token size will increase accordingly.

And this is just about user-to-server authentication.

But you should remember that the SharePoint servers, the SQL servers and potentially any other service you are using on the server side usually requires authentication: this authentication happens on the server side only, is typically based on Windows identities, may be claims based, may be based on NTLM or Kerberos authentication. Some of these settings are not depending on the configuration you may apply, some other settings are completely under your responsibility (NTLM vs Kerberos is one example… and you are choosing Kerberos, right?!!).

Taking these considerations to the extreme (not so extreme, believe me) sometimes you end up with a domain controller within your network segment, so that you reduce the latency that is caused by authentication requests. Maybe you do not need this kind of topology, but this should give you an idea of how performance optimization is an extremely hard topic that requires a wider knowledge than the basic SharePoint configuration 🙂

Scaling

Needless to say, you will need to scale because a single-box server will hardly be enough for a heavy load web site.

Talking about scaling, you know that you have the option of either:

  • Increase the resources of server (scaling-in)
  • Add additional servers

In the first case, you should have a deep knowledge of what type of resources should be multiplied: do you need additional RAM? Faster CPUs? Additional disk space as a support for a more aggressive blob caching (I’m going to talk about blob caching later within another article of this series)? This list could continue…

In the second case, you should decide what you are going to duplicate. In other words, if you add servers you need to know which server roles you want to be redundant (which may add fault tolerance, together with performance improvements!) .

Sometimes you need to add a balancer (hardware or software) in front of your servers. This is the case for your web front end servers: without a NLB in front of them, who will instruct the client requests to be routed somewhere different than the single server you had before? 🙂

SharePoint Internet Sites – Performance Optimization

SharePoint has evolved over times. There’s been a significant step forward with the release of Windows SharePoint Services 3.0 and Microsoft Office SharePoint Server 2007, and several architectural improvements we all can see now, when the 2010 wave has been widely adopted by customers worldwide.

From a Web Content Management perspective, MOSS 2007 brought into the SharePoint family the former MCMS 2002 product, which was modified in order to make it an integral part of the SharePoint platform.

Since then, a constantly increasing number of web sites have been developed on top of MOSS 2007 and, now, on top of SPS2010 (at the risk of being redundant, I have to name Ferrari.com as a stunning example).

Web sites, especially those who will be visited by hundreds of thousands of users, need special considerations up-front, starting with the architectural phase where the global components and services are envisioned and planned.

During this early steps, a team needs to be created so that every single aspect of the web site implementation is taken into account.

You need a deep understanding of the network and server infrastructure you are going to put in place, as well as solid knowledge of the HTML/CSS/JS standards on which you will be building the pages that will be presented to the final user. And… well, you will be developing something custom (SharePoint is a platform, not a complete and ready-to-use product, isn’t it?), and you need to do this special attention, trying to minimize the server load to make it possibly scale-out and reach a wider audience with service continuity.

That’s why this series of articles tries to categorize some best practices you need to be aware of when designing and building public, internet facing web sites, and the categorization I’m going to propose is based on your role on the project: either you are an ITPRO, a Web Designer or a Developer, there’s something you should think about within this particular kind of projects.

Enough for an introduction, let’s start with some real world insight!

(…continued…)

Refinement Panel Metadata Threshold

If you find your refinement suddenly disappearing, leaving a confused user (and some hedache for you), double check this property!

Its behavior is straightforward, and is exactly what the propperty name suggests.

You can define a threshold (unsigned integer value) that controls when the refiner is shown, based on the results that are returned by the search query.

If the occurrences of the underlying metadata property do not exceeed this threshold, the refiner simply disappears.

This may be appreciated: imagine a situation where you have defined, say, twenty refiners and you do not want to flood the page with every single kind of faceted filter. You need a way to prioritize this, and the Metadata Threshold property is what you are looking for.

On the opposite, you can of course eliminate it (just set the value to 1) if you need to keep a consistent layout across query executions.

If a property is there, and if you know what it does and how to prevent it, you gain flexibility!