Home > Blog

Attackmonkey Blog

Posts Tagged: Examine

Limiting an Examine Search to the Current Site 2

After some feedback from Jeroen and Shannon about yesterday's blog post, I've had a look at another method for having searches on multiple sites, which works much better for site that use multiple languages.

The issue is that different languages have different stop words, and word stems etc, so using the standard Examine analysers on say French content means that the results won't be as accurate as they would be using a French language specific indexer.

How do we do this? Firstly, we need to set up a specific index for each site, telling each on to start at the root node for the language. Set up the index set as you would normally, and then add the IndexParentId parameter to your declaration, like this:

<IndexSet  SetName="enSiteSearchIndexSet"  IndexPath="~/App_Data/TEMP/ExamineIndexes/enSiteSearch/"  IndexParentId="1090">

Once this is done, the index will ONLY index content beneath the parent node that you specified. You can then create an index for each site, allowing you to use different analyzers for each index if you want to.

If you prefix your searchers/indexes etc with the name of the root node, you can get that in your search code and use it to get the right searcher, so you'll never have to change the search code when adding new language sites (just add the new indexes etc to your Examine config).

Limiting an Examine Search to the Current Site

If you have a multi-language or multi site installation in Umbraco where you might want to have a site search using Examine, you'll run into the issue that the indexes contain the reults for ALL of the sites, not just the current site that the user is on.

I've been working on a multi-language site recently and ran into just this issue. Here's how I got round it and made a search that can be included on all of the sites, with no changes needing to be made.

First up, how can we limit the search? Handily, we can use the path variable, which stores the path of the page in the Umbraco content tree, in a format something like: -1,1060,1075,1230, where -1 denotes the content root, and the rest of the numbers are the nodes between the root and page that you're looking at.

In our user control that does the search, we can get the current node, and rather than jumping back up the tree, we can just split the path variable out and get the 2nd item in the array to get the id of the site root node, like this:

var currentPage = umbraco.NodeFactory.Node.GetCurrent();
string parentId = currentPage.Path.Split(',')[1];

Now we know the root node of the current site, how can we use it with our search? Handily, you can just add the path to your index settings file. However, the path gets stored in the index in a comma separated format, which is no good for searching, as Examine treats it as one big string, so searching for the root node on the raw path will return no results. However, if you were to replace the commas with spaces in the index, the numbers in the path would be treated like words, so you could search for your root node on it, and it would return only pages with the root node in their path.

So how to alter the index? Easy! You can plug into the Examine events to alter the index as it's being written. Basically we want to hook into ther event, get the path field, replace the commas with spaces, and then save it as a new field in the Examine index. Here's an example of the code that we used in an AppliactionBase class to hook into the event handler and make the changes:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using umbraco.BusinessLogic;
using Examine;

namespace MySite.UmbracoExtensions.EventHandlers
    public class CmsEvents : ApplicationBase
        public CmsEvents()
            //Add event to allow searching by site section
            var indexerSite = ExamineManager.Instance.IndexProviderCollection["SiteSearchIndexer"];
            indexerSite.GatheringNodeData += new EventHandler(SetSiteSearchFields);

        //modifies the index field for the path variable, so that it can be searched properly
        void SetSiteSearchFields(object sender, IndexingNodeDataEventArgs e)
            //grab the current data from the Fields collection
            var path = e.Fields["path"];

            //let's get rid of those commas!
            path = path.Replace(",", " ");

            //add as new field, as path seems to be a reserved word in Lucene
            e.Fields.Add("searchPath", path);

Obviously you'd need to change the "SiteSearchIndexer" part to the name of your indexer to get it to work! You'll also need to make sure that the path is included in your index (look at the default indexes in your Examine config files for an example of this).

Now all we need to do is make our Examine search look for the root id in the "searchPath" field. Here's the finished code where we get the root node, and use it in an example Examine search:

//do search
var searcher = ExamineManager.Instance.SearchProviderCollection["SiteSearchSearcher"];

var criteria = searcher.CreateSearchCriteria(UmbracoExamine.IndexTypes.Content);

Examine.SearchCriteria.IBooleanOperation filter = null;

//search on main fields
filter = criteria.GroupedOr(new string[] { "pageHeading", "pageContent", "navigationText" }, Search);

//only show results in the current path
var currentPage = umbraco.NodeFactory.Node.GetCurrent();
string parentId = currentPage.Path.Split(',')[1];

filter.And().Field("searchPath", parentId);

//don't show hidden pages
    .Field("umbracoNaviHide", "1");

var resultsTemp = searcher.Search(filter.Compile());

And now your search should only return results for pages in the current site, not pages from ALL of the sites! Nice and easy to do, and a good example of how easy it is to extend Umbraco with its event model!

You can also use this technique to search a specific area of the site, e.g. have a dropdown to filter the search by the News area, or Events area. You could also have a single index for multiple sites, allowing for a search that spaned all the sites as well.