XPS Full Text Search in MOSS 2007 and Windows Server 2008

5. April 2010

Late in 2007 the XML Paper Specification (XPS) was published.  The means to create, view and print XPS files are integrated in you Windows OS, dare I say it is as ubiquitous as PDF and you may not even know it.  If you don’t have the XPS features installed you can get it free from Microsoft or one of the many 3rd party vendors deploying solutions for XPS.

Get an XPS Viewer  XPS Showcase

What is an XPS document?

The XML Paper Specification itself is platform independent, openly published, and available royalty-free and Microsoft has integrated XPS-based technologies into Microsoft Windows Vista operating system and the 2007 Microsoft Office system. Microsoft brings additional document value to its customers, partners, and the computing industry through the XPS-based technologies.

An XPS document is any file that is saved to the XML Paper Specification, or .xps, file format. You can create XPS documents (.xps files) by using any program that you can print from in Windows; however, you can view XPS documents only by using the XPS Viewer, which is included in this version of Windows.

An XML Paper Specification (XPS) document is a document format you can use to view, save, share, digitally sign, and protect your document’s content. An XPS document is like to an electronic sheet of paper: You can’t change the content on a piece of paper after you print it, and you can’t edit the contents of an XPS document after you save it in the XPS format. In this version of Windows, you can create an XPS document in any program you can print from, but you can only view, sign, and set permissions for XPS documents in the XPS Viewer.

XPS FTS in MOSS 2007

The XPS format is great for SharePoint.  Not only for view-ability but for it’s Full Text Search-ability (FTS).  The following is a step-by-step guide to enabling and configuring XPS IFilter support in Windows Server 2008 and MOSS 2007.  Note: I have a single server running all my MOSS farm services.  If I had a distributed farm with my MOSS Index service running on a dedicated server I would enable and configure the XPS feature on the dedicated Index server.

Step-By-Step

1. On the MOSS Index server launch the Server Manager and select Add Features, select the XPS Viewer and click Next.

image

2. Click Install.

image

3. Click Close.

image

4. From the SharePoint 3.0 Central Administration select the Share Service Provider/Search Settings and under Crawling select File types and then select New File Type.

image

5. Enter xps and click on OK.

image

6. You will see the new file type in the list.

image

7. You can also confirm it has been enabled by reviewing the registry setting for the following key.

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0 \Search\Applications\<SITE-GUID>\Gather\Portal_Content\Extensions\ExtensionList

image

8. Next you’ll need to enter the following details in the HKLM hive for the XPS ifilter.

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\Filters\ .xps]
        Default = (value not set)
        Extension = xps
        FileTypeBucket REG_DWORD = 0x00000001 (1)
        MimeTypes = application/xps

image

9.  In addition, you’ll need to add and set the Class ID for the XPS iFilter in the following HKLM hive location.

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.xps]

             Set the "Default" value to the CLSID of XPS IFilter.

             Default REG _SZ = {1E4CEC13-76BD-4ce2-8372-711CB6F10FD1}

image

image

10.  Next, stop and start the Office SharePoint Server Search service.

C:\>net stop osearch
C:\>net start osearch

image

11. Next, run a full or incremental crawl. If you’re interested, keep an eye on the C:\Users\<search-service-account>\AppData\Local\Temp\gthrsvc folder and you’ll see MOSS crawl writing the images to this folder to index.  This of course is why you need a beefy server for Indexing, lot’s of file IO.

image

12.  Once the crawl is complete you can verify the XPS files have been crawled via the Crawl Log’s URL Summary.

image

Searching

When I search for the keyword “galleries” in MOSS Advanced Search I get hits from the result FTS.

image

 image

 


For More Information

XML Paper Specification: Overview

ECMA International XPS Specification and Reference Guide

ECMA International XPS White Paper

Microsoft MSDN XPS Team BLOG

iFilter, MOSS, Search, SharePoint, Crawl, Windows Server 2008, XPS , , , , , ,

Maximum Crawl Size

27. June 2009

By default the maximum crawl size a MOSS Search Server is 16 MB. In many situations you'll have documents that exceed the 16 MB default threshold.

In the example below I've created and uploaded a 20+ MB searchable PDF of a series of Leo Tolstoy's famous novels all acquired thought the Gutenberg Online Project.

When I use the MOSS Advanced Search looking for the word Gutenberg in the contents of the file (assuming the PDF was Full Test Indexed with the PDF iFilter from Foxit Software I have installed on my Index Server), I'm not able to find my file.

A quick look at the Crawl Log alerts me to a warning for my PDF that "The file reached the maximum download limit. Check that the full text of the document can be meaningfully crawled." And therefore I'll not be able to search within the contents of the file via a Full Text Search.

To allow my PDF that is larger than 16 MB to be crawled and fully indexed I need to add the MaxDownloadSize dword value to the "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Global\Gathering Manager" hive location and set it to the decimal data value of 25 for MB.

I then need to reboot my server (or dedicated index Server).

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Global\Gathering Manager]

"MaxDownloadSize"=dword:00000019

 

Note: Hexadecimal 19 = Decimal 25.

After a new Full crawl I reissue the search for "Gutenberg" and I now find my PDF and my Crawl Log does not have the warning message.

Crawl, MOSS, Search , ,

Property Searching & Metadata Property Mapping

27. June 2009

MOSS offers a Live Search or Federated content searching, but sometime you'd like to search for content based on Site Column metadata values. To do so there are several steps to insure your content is searchable.

Let's walk through a scenario. I have a web application with a site column "InvoiceCompany". This column is associated to a Content Type.

Site Column

Shared Services Administration

In my Central Administrator, in the Shared Services Administration, we'll look at and adjust some Search Settings.

Metadata Property Mappings - Managed Properties

Let's take a look at some important items in the Managed Properties. We'll get there from the Metadata property mappings page.

A property has mapping(s) to site columns.

Edit Managed Property

In the Metadata Property Mappings page we'll find the "InvoiceCompany" that was created in my web applications. SharePoint adds the "ows" prefix to the Property Name and "ows_" to the Mapping name.

Edit Crawled Property

If we explore further in the ows_InvoiceCompany mapping we'll see the property details.

Metadata Property Mapping - Crawled Properties

In order to search for a piece of content in your farm the content must be crawled. In this scenario, we're going to make sure we automatically generate a new managed property in the SharePoint category.

Crawled Properties View – SharePoint

The SharePoint Crawled Property category contains many crawled properties, possibly the most of all the categories. A crawled property will automatically be created for every content type and column name. Automatically generated crawled properties in this category begin with "ows"prefix as shown in this image.

Configure Search Settings

You'll need to make sure the content is crawled via the Content sources and crawl schedules.

Manage Content Sources

 

 

The Starting Addresses is an important concept. In the illustration above we're crawling only the content on site http://server.

Advanced Search Properties Customization

To allow your users to now search on the Site Column you'll need to edit the XSLT (Properties) of the Advanced Search text box and add the PropertyDef in the PropertyDefs and ResultTypes as illustrated below.

 

Conclusion

Allowing your users to search via Managed Property adds some nice usability but the example above still will be require some work to customize and the rendering the results will be in the traditional Live Search result set view. For your more serious implementations you'll need the KnowledgeLake Imaging Server that will allow users to perform exact relevance searches using any combination of index values. Searches can be pre-configured and saved so that users can simply enter the appropriate index values and search across multiple SharePoint sites and document libraries. The KnowledgeLake Imaging search interface is a web part that is completely customizable to display the required document and associated index columns in either list or column mode.

For More Information

TechNet: http://technet.microsoft.com/en-us/library/cc262933.aspx

MSDN: http://msdn.microsoft.com/en-us/library/ms497276.aspx

MOSS, Search, metadata , ,