Generating documents for the Excel Calculation Services

Excel Services 101

I guess that many of you know about the Excel Calculation Services, one of the key technologies that are available with the Enterprise Edition of Microsoft Office SharePoint Server 2007 and SharePoint Server 2010.

Even if they are often considered only for one of the features they provide – the rendering of an Excel document inside the browser – the Excel Calculation Services are indeed much more than this.

They provide a remote API, based on SOAP Web Services, that you can use in order to start a calculation (based, of course, on an Excel document) leveraging the resources (CPU and memory) of your SharePoint Server.

Moreover, starting from SharePoint Server 2010, a REST based API is available that makes part of the Excel document (yes, charts included!) available as data objects that can be retrieved by using simple HTTP GET calls, making this API a perfect tool for some client-side “mashup” logic.

Combining these features with some Excel Sheets manipulation API can help you create complex solutions with a very limited effort.

Open Xml 101

Since Office 2007, the legacy “xls” binary file format has been superseded by the brand-new Office OpenXml specification, which is an ECMA and ISO standard and is implemented through the “xlsx” file format.

Xlsx files are just an archive, so that if you rename an xslx file with a .zip extension and try to open it with your favorite software, you’ll end up with a nested folder structure containing images, media files and… xml files!

Those XML file are the core structure of the document, and can be manipulated in order to edit the contents of the Excel document itself.

Although you could do this “by hand”, if you have to develop against an Office OpenXml document you should definitely leverage the OpenXml SDK, a managed .NET library that makes available a bunch of classes that expose entities such as Cells, Sheets, Charts and so on.

These APIs mimic the internal representations of these entities, and you will notice that there is very little abstraction over the XML file structure, but anyway they are definitely easier to use and, of course, to learn, so I would suggest you to follow this way if you need to develop against Excel 2007/2010 documents.

Using OpenXml to produce ECS compatible documents

So… after this very (very, very) short introduction to the Excel Calculation Services and the OpenXml API, let’s try to combine them in a simple scenario.

Imagine you want to collect information from your users leveraging the OOTB SharePoint user interface (i.e. Lists and List Forms), but later use that information to drive some complex calculation and provide an output based on charts and pivot tables. That complex calculation is, indeed, the core of your solution.

You would use the Excel Calculation Services as the “brain” of your solution, so the first step is to prepare a “template” that can be rendered by the browser (the Excel Calculation Services do not support all the features of an Excel document, so be prepared to test your template as long as you implement it).

Your template will contain all the calculation logics (i.e. formulas, etc…) that will be applied to a set of cells where the user will be able to insert its own data. But of course, in the template those are cells “null”.

Once ready, you could just save it into a common repository (a document library, of course) that you make “unreachable” by the user (i.e. protected by authorization).

The reason why we might want to “hide” the main document to the user is to protect our intellectual property: if we would let the user download the template, edit it and upload it back to SharePoint, the user would be able to inspect the document and grab the logics behind it, which is not always acceptable.

Now… the template is there, it cannot be downloaded and it contains tons of computation formulas that are applied to null cells… mmhhh… how will it ever work??!

Well… here you start writing code J

First of all, you should collect the data entered by the user through the OOTB SharePoint user interface: since you are developing a server component, you could just use the SharePoint Server Object Model to access the list and its items, maybe feeding some in-memory objects for later use.

Then, and here comes the interesting part, you could open the calculation template using OpenXml. Since the document is not accessible by the user, if the code you are developing is supposed to run within the security context of the user (a web part? An application page triggered by some Ribbon button?) you should use some impersonation to access it.

I’m not displaying all of the previous steps since I wish to focus on the OpenXml part, as displayed by the following code snippet:

using (MemoryStream templateStream = new MemoryStream())

{

templateStream.Write(templateBytes, 0, (int)templateBytes.Length);

using (var xlsxDoc = SpreadsheetDocument.Open(streamTemplate, true))

{

}

}

Here I’m supposing you have loaded a Byte[] instance (templateBytes) with the contents coming from the XLSX template, and I’m using that object to open an instance of SpreadsheetDocument, the main entry point for our logics.

From there, you can iterate through the sheets using some straightforward LINQ syntax. Below you can find a short snippet that gets access to a Sheet given its name (sheetName):

IEnumerable<Sheet> sheets = document.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>().Where(s => s.Name == sheetName).FirstOrDefault();

Then you can obtain the reference to a cell, identified by a row index and a column name:

var row = worksheet.GetFirstChild<SheetData>().Elements<Row>().Where(r => r.RowIndex == rowIndex).FirstOrDefault();

var cell = row.Elements<Cell>().Where(c => string.Compare(c.CellReference.Value, columnName + rowIndex, true) == 0).FirstOrDefault();

Setting the cell value is of course the last operation you should perform:

if (cell.DataType != "s")

{

cell.CellValue.Text = inputFromUser;

}

Be careful in this last step: string cells (the ones identified by an “s” data type) are treated differently since they are stored in a separate worksheet part, the Shared String Table Part, and the cell value just contains a numeric identifier that identifies the string in the Shared String Table. I’m not showing you how to copy a string value, but you can imagine it’s just a matter of writing some additional lines of code in order to fin (or insert) the string in the Shared String Table before setting the cell value.

That’s it!

Once you have finished copying the values inserted by the user, you just need to save the resulting stream as a document in another SharePoint document library, which should be – of course – accessible to the user. Before uploading the final document, just remember to “finalize” the modifications:

xlsxDoc.WorkbookPart.Workbook.CalculationProperties.ForceFullCalculation = true; xlsxDoc.WorkbookPart.Workbook.CalculationProperties.FullCalculationOnLoad = true;

xlsxDoc.WorkbookPart.Workbook.CalculationProperties.CalculationOnSave = true;

xlsxDoc.WorkbookPart.Workbook.Save();

xlsxDoc.Close();

At the end of this process, the user will find the “merge results” of its own data with the calculation logics applied by the template, which makes it possible to display charts, pivot tables and reports based on the data collected through a SharePoint List user interface.

But… won’t the user be able to read the formulas that sit behind the Excel document?

It’s just a matter of authorizations! If you just assign the View Only permissions to the user, he will only be able to open the document in the browser or download a “snapshot” of it: opening the snapshot with the Excel client will only display the results of the computation, without any reference to formulas that are completely hidden (I should say “flattened”) from the user.

Considerations

Of course, I have just scratched the surface of what you can do using OpenXml and the Excel Calculation Services.

For example, I did not mention Data Connections, which could be an alternative approach to the scenario I have just described: I could have defined a data connection based on Web Services that would read the user data directly from the SharePoint list, obtaining similar results. Although that would have been a “no-code” solutions, it would have been much less flexible: suppose you wish to hide/show a sheet based on some condition, well, that is only possible through the use of some API.

Also, I did not talk about the infrastructure requirements for the Excel Service Service Application, nor I talked about the settings that you need to apply (trusted libraries, global settings, identity, etc…).

But I hope that the topics I talked you about can at least give you an idea of what you can accomplish with these two powerful technologies J

SP Documentation Kit

I got an email by my friend and colleague Toni Frankola, where he announced the release of its great SP Documentation Kit.

Here’s an excerpt:

The idea here is simple: every once in a while I had to document a SharePoint farm. That was boring, repetitive task I did not like to that, so I decided to make a tool to help me automate that. Here it is: http://www.spdockit.com/

Key features:

•         Create professional looking (Word format) SharePoint 2010 Farm documentation

•         Compare settings between different farms (e.g. QA/Production or two different customers)

•         Track farm changes

Links and instructions:

1. Download from: http://www.spdockit.com/downloads/

2. Quick Start Guide: http://www.spdockit.com/help/quick-start-guide/

3. How to generate farm documentation: http://www.spdockit.com/help/how-to/create-sharepoint-farm-documentation/

I remember that Toni talked to me about this project almost one year ago, when we met at the Slovenian SharePoint Conference in Ljubljana.

At the time, I could only say “great idea, Toni”.

Now I can definitely say “great tool Toni”.

Well done!!

How to set a Custom SP2010 Error Page with PowerShell

Custom application pages mapping is a brand new feature you get in SharePoint 2010.

What this does is let you specify what page you want the system to redirect to after some occurrence.

These occurrences can be the user login, the Access Denied, as well as the generic error page.

Here’s how you can set this all up, with just a couple of lines of PowerShell scripting:

Add-PSSnapin microsoft.sharepoint.powershell –ErrorAction SilentlyContinue

$wa = Get-SPWebApplication http://yoursite.com

$url = "/_LAYOUTS/GreenPoint2010/Error.aspx"

$cp = [Microsoft.SharePoint.Administration.SPWebApplication+SPCustomPage]"Error"

$wa.UpdateMappedPage($cp, $url)

$wa.Update()

Bugs by Design

Did you know this?

“If a BDC model was created by using SharePoint Designer 2010, you must use SharePoint Designer to export the BDC model. If a model that was created by using SharePoint Designer is exported by using Central Administration or the Export-SPBusinessDataCatalogModel cmdlet, the exported model file includes only placeholders for the external content types.”

Now, just to be concise and clear, how is this feature filed? “By Design”!

I hate “bugs by design”!!

SharePoint Internet Sites – Performance Optimization for IT Professionals

If you have read the introduction to this articles series, you will know that Web Sites implementation should be done by an etherogenous team. A senior and clever system administrator should be part of this team.

Why?

Well, first of all SharePoint needs to be installed (this is easy) and configured (this is not always easy). I should say it should be configured well, with security and performance in mind. And this, believe me, is not easy at all.

This cannot be a guide to SharePoint configuration (I suggest you get a book on this topic, where you will find valuable information on each and every configuration topic).

But anyway I would like to point-out something you should consider especially within public web sites projects.

Network I/O

This may seem obvious, but a low network throughput is one of the most frequent reasons why you get slow response time (and unsatisfied users!).

As a system administrator you are not always in charge of network connectivity, especially when the web site is hosted by an ISP. But as an expert, you should always give suggestions to your customer and be prepared to test the network connectivity, defining metrics and possibly a baseline that you will use for simulations when you will perform stress tests.

Sometimes, though, you control part of the network of the hosting system: maybe not the peripheral segment, but the internal segment is often on your control.

Here you may suffer from a very high latency in server-to-server communication. Please, do not use a 10/100 cable to connect your SharePoint servers to the SQL backend!

And even if the network connectivity between the servers is considered good in low traffic conditions, you should consider isolating the SharePoint farm and its SQL back-end in a private subnet, maybe planning for multihoming. This way you will reduce the “noise” that other services could introduce into the network traffic, preventing contention with the packets that the SharePoint services generate.

The Microsoft Windows Performance Monitor is a great tool that can help you investigate these issues. Combining HTTP traffic reports generated by a Fiddler session can also be a valid aid, although you need some elaboration over the data you will collect.

Disk I/O

Network connectivity is not the only point you should pay attention to: disk I/O may be another bottleneck if you buy a 99$ external hard drive for your SQL data files!

As usual, you need some capacity planning beforehand, as well as some baseline and some support tool.

I would suggest you take a look at these two valuable resources related to capacity planning and SQL I/O subsystem measurement:

 

Authentication

Your web site will, probably, be accessible to anonymous users and to authenticated users as well.

What is the authentication authority you are going to use? The answer to this question may require some special consideration, since it may involve SSL protection (SSL is secure, but it adds some overhead due to traffic decryption) or the connection to an external authentication authority you trust.

The claims based authentication that SharePoint 2010 supports in centered on the concept of security tokens that are typically saved as cookies and, as such, passed back and forth increasing the requests payload: if you start playing with claims augmentation and have dozens of claims assignable to users, your security token size will increase accordingly.

And this is just about user-to-server authentication.

But you should remember that the SharePoint servers, the SQL servers and potentially any other service you are using on the server side usually requires authentication: this authentication happens on the server side only, is typically based on Windows identities, may be claims based, may be based on NTLM or Kerberos authentication. Some of these settings are not depending on the configuration you may apply, some other settings are completely under your responsibility (NTLM vs Kerberos is one example… and you are choosing Kerberos, right?!!).

Taking these considerations to the extreme (not so extreme, believe me) sometimes you end up with a domain controller within your network segment, so that you reduce the latency that is caused by authentication requests. Maybe you do not need this kind of topology, but this should give you an idea of how performance optimization is an extremely hard topic that requires a wider knowledge than the basic SharePoint configuration 🙂

Scaling

Needless to say, you will need to scale because a single-box server will hardly be enough for a heavy load web site.

Talking about scaling, you know that you have the option of either:

  • Increase the resources of server (scaling-in)
  • Add additional servers

In the first case, you should have a deep knowledge of what type of resources should be multiplied: do you need additional RAM? Faster CPUs? Additional disk space as a support for a more aggressive blob caching (I’m going to talk about blob caching later within another article of this series)? This list could continue…

In the second case, you should decide what you are going to duplicate. In other words, if you add servers you need to know which server roles you want to be redundant (which may add fault tolerance, together with performance improvements!) .

Sometimes you need to add a balancer (hardware or software) in front of your servers. This is the case for your web front end servers: without a NLB in front of them, who will instruct the client requests to be routed somewhere different than the single server you had before? 🙂

SharePoint Internet Sites – Performance Optimization

SharePoint has evolved over times. There’s been a significant step forward with the release of Windows SharePoint Services 3.0 and Microsoft Office SharePoint Server 2007, and several architectural improvements we all can see now, when the 2010 wave has been widely adopted by customers worldwide.

From a Web Content Management perspective, MOSS 2007 brought into the SharePoint family the former MCMS 2002 product, which was modified in order to make it an integral part of the SharePoint platform.

Since then, a constantly increasing number of web sites have been developed on top of MOSS 2007 and, now, on top of SPS2010 (at the risk of being redundant, I have to name Ferrari.com as a stunning example).

Web sites, especially those who will be visited by hundreds of thousands of users, need special considerations up-front, starting with the architectural phase where the global components and services are envisioned and planned.

During this early steps, a team needs to be created so that every single aspect of the web site implementation is taken into account.

You need a deep understanding of the network and server infrastructure you are going to put in place, as well as solid knowledge of the HTML/CSS/JS standards on which you will be building the pages that will be presented to the final user. And… well, you will be developing something custom (SharePoint is a platform, not a complete and ready-to-use product, isn’t it?), and you need to do this special attention, trying to minimize the server load to make it possibly scale-out and reach a wider audience with service continuity.

That’s why this series of articles tries to categorize some best practices you need to be aware of when designing and building public, internet facing web sites, and the categorization I’m going to propose is based on your role on the project: either you are an ITPRO, a Web Designer or a Developer, there’s something you should think about within this particular kind of projects.

Enough for an introduction, let’s start with some real world insight!

(…continued…)

Refinement Panel Metadata Threshold

If you find your refinement suddenly disappearing, leaving a confused user (and some hedache for you), double check this property!

Its behavior is straightforward, and is exactly what the propperty name suggests.

You can define a threshold (unsigned integer value) that controls when the refiner is shown, based on the results that are returned by the search query.

If the occurrences of the underlying metadata property do not exceeed this threshold, the refiner simply disappears.

This may be appreciated: imagine a situation where you have defined, say, twenty refiners and you do not want to flood the page with every single kind of faceted filter. You need a way to prioritize this, and the Metadata Threshold property is what you are looking for.

On the opposite, you can of course eliminate it (just set the value to 1) if you need to keep a consistent layout across query executions.

If a property is there, and if you know what it does and how to prevent it, you gain flexibility!

Excel Services Cheat Sheet

You want (or maybe you need) to learn Excel Services in 3 minutes?

Ok, ok, I know it’s impossible, but anyway, if you have to cheat, here’s some help.

Ready? Go!

· Their name is Excel Calculation Services

· They are exposed by a Service Application, that I strongly suggest you create J

· You need the Enterprise Edition of SPS2010

· The ECS perform computations and rendering on the serve side, starting from an Excel 2007/2010 file that should be “compatible” and uploadedl into a SharePoint library

· What compatible means?

o Long story: http://office.microsoft.com/en-us/excel-help/differences-between-using-a-workbook-in-excel-and-excel-services-HA010021716.aspx?pid=CH101024611033. It’s for Office2007/SP2007, most of these limitations still apply

o Short Story: no VBA macros, no data validation, no external worksheet references, etc…

· You access computation results from a client application using SOAP Web Services or the brand new Rest Services API

· You access the sheet (or part of it) rendered by the server just by clicking the “View in browser” link on the ECB, or maybe the document link itself if the default library settings state that the default open behavior is “Open with Server Side Handler”

· When the document is viewed in the browser, all logics behind it (i.e. all formulas) are flattened and you only get the results of the computation

· If a user has just the ViewOnly permissions, he can just view the document in the browser or download a snapshot of it. Once again, the snapshot contains just data, without formulas

· If a user has, at least, the Read permissions, he can also download a copy of the document, which contains also formulas and logics

· From an administrative perspective, you – the Farm Admin guy – should probably:

o Define “trusted locations” (i.e. libraries where the ECS consider documents trusted and, as such, can manage them as discussed above)

o Define external data connections (upload Data Connection Files, specify how connections are performed, take care about authentication, etc…)

· If you are guessing what a data connection is… well… it’s a connection to some external data repository! More seriously speaking, y can define such connections with the Excel Client (or retrieve them from a central location that is, of course, your SharePoint environment). Data that flow through data connections can be displayed in a sheet, with al sort of cool features that Excel provide (pivot tables and reports, slices, etc…)

Mmmmmhhh… it took more that 3 minutes, probably J

Improving the CQWP Markup with SuppressWebPartChrome

Several SharePoint web parts, the ContentByQueryWebPart being one of those, expose a property named SuppressWebPartChrome.

When set to true, the web part code strips out the surrounding layout that constitutes the “chrome” of the web part itself.

An example will explain this much better.

Consider the following HTML markup, generated with the SuppressWebPartChrome property set to false (its default):

<table class=”s4-wpTopTable” border=”0″ cellpadding=”0″ cellspacing=”0″ width=”100%”>

     <tr>

          <td valign=”top”><div WebPartID=”00000000-0000-0000-0000-000000000000″ HasPers=”true” id=”WebPartWPQ3″ width=”100%” class=”ms-WPBody noindex” OnlyForMePart=”true” allowDelete=”false” style=”” ><div id=”cbqwpctl00_News_g_ded0dcdb_9aaf_44c3_98b5_6af1963c4f3b” class=”cbq-layout-main”><ul class=”dfwp-column dfwp-list”><li class=”dfwp-item”><div class=”item link-item”><a href=”http://erickson.sharepointjukebox.vmw/News/Pages/Test.aspx” title=””>Test</a></div></li></ul></div></div></td>

     </tr>

</table>

And compare it with the following markup, where the SuppressWebPartChrome property has been set to true:

<div id=”cbqwpctl00_News_g_ded0dcdb_9aaf_44c3_98b5_6af1963c4f3b” class=”cbq-layout-main”><ul class=”dfwp-column dfwp-list”><li class=”dfwp-item”><div class=”item link-item”><a href=”http://erickson.sharepointjukebox.vmw/News/Pages/Test.aspx” title=””>Test</a></div></li></ul></div

In the first snippet I have higlighted the HTML portion that gets stripped out.

This is a huge improvement, especialloy because:

  • The resulting markup is smaller, with a lower network bandwidth consumption as the first consequence
  • You get rid of a surrounding table, which makes your markup more compliant to accessibility requirements (and, more generally, it just makes your markup better from a semantic perspective)

Needless to say, I suggest you to “think in advance” and apply this very early during the development lifecycle, since you may need to adjust your CSS or JS libraries accordingly!