Examine RC3 Released

by Shannon Deminick 18. August 2010 18:30

Hopefully this will be a quick RC! I’m really hoping to release v1.0 RTM by early next week (latest). If you are able to help out with some testing it would be amazing!!

Here's what's new:

  • PDF Indexing
  • Easily implement custom data indexing outside of Umbraco
  • More XSLT Extensions for Umbraco
  • Some framework refactoring so a new DLL: Examine.LuceneEngine.dll which contains all of the Lucene.Net implementation
    • Because of this refactoring, if you've built your own providers, you may need to update our code to work, otherwise it is backwards compatible for most people.
  • More unit tests
  • More documentation

Get it while it’s hot! And don’t forget to read the release notes.

DOWNLOAD FROM CODEPLEX HERE

Categories: Examine | Umbraco

Using Examine to index & search with ANY data source

by Shannon Deminick 10. August 2010 10:38

During CodeGarden 2010 a few people were asking how to use Examine to index and search on data from any data source such as custom database tables, etc… Previously, the only way to do this was to override the Umbraco Examine indexing provider, remove the Umbraco functionality embedded in there, and then do a lot of coding yourself.  …But now there’s some great news! As of now you can use all of the Examine goodness with it’s embedded Lucene.Net with any data source and you can do it VERY easily.

Some things you need to know about the new version:

  1. I haven’t made a release version of this yet as it still needs some more testing, though we are putting this into a production site next week.
  2. If you want to try this, currently you’ll need to get the latest source from Examine @ CodePlex
  3. If you are using a previous version of Examine, there’s a few breaking changes as some of the class structures have been moved, however you config file should still work as is… HOWEVER, you should update your config file to reflect the new one with the new class names
  4. There is now 3 DLLs, not just 2:
    • Examine.DLL
      • Still pretty much the same… contains the abstraction layer
    • Examine.LuceneEngine.DLL
      • The new DLL to use to work with data that is not Umbraco specific
    • UmbracoExamine.DLL
      • The DLL that the Umbraco providers are in

Ok, now on to the good stuff. First, I’ve added a demo project to this post which you can download HERE. This project is a simple console app that contains a sample XML data file that has 5 records in it. Here’s what the app does:

  1. This re-indexes all data
  2. Searches the index for node id 1
  3. Ensures one record is found in the index
  4. Updates the dateUpdated time stamp for the data record
  5. Re-indexes the record with node id 1’

So assuming that you have some custom data like a custom database table, xml file, or whatever, there’s really only 3 things that you need to do to get Examine indexing your custom data:

  1. Create your own ISimpleDataService
    • There is only 1 method to implement: IEnumerable<SimpleDataSet> GetAllData(string indexType)
    • This is the method that Examine will call to re-index your data
    • A SimpleDataSet is a simple object containing a Dictionary<string, string> and a IndexedNode object (which consists of a Node Id and a Node Type)
    • For example, if you had a database row, your SimpleDataSet object for the row would be the dictionary of the rows values, it’s node id and type … easy.
  2. Use the ToExamineXml() extension method to re-index individual nodes/records
    • Examine relies on data being in the same XML structure as Umbraco (which we might change in version 2 sometime in the future… like next year) so we need to transform simple data into the XML structure. We’ve made this quite easy for you; all you have to do is get the data from your custom data source into a Dictionary<string, string> object and use this extension method to pass the xml structure in to Examine’s ReIndexNode method.
    • For example: ExamineManager.Instance.ReIndexNode(dataSet.ToExamineXml(dataSet["Id"], "CustomData"), "CustomData");  where dataSet is a Dictionary<string, string> .
  3. Update your Examine config to use the new SimpleDataIndexer index provider and the new LuceneSearcher search provider

If you’re not using Umbraco at all, then you’ll only need to have the 2 Examine DLLs which don’t reference the Umbraco DLLs whatsoever so everything is decoupled.

I’d recommend downloading the demo app and running it as it will show you everything you need to know on how to get Examine running with custom data. However, i know that people just like to see code in blog posts, so here’s the config for the demo app:

<?xml version="1.0" encoding="utf-8" ?> <configuration> <configSections> <section name="Examine" type="Examine.Config.ExamineSettings, Examine"/> <section name="ExamineLuceneIndexSets" type="Examine.LuceneEngine.Config.IndexSets, Examine.LuceneEngine"/> </configSections> <Examine> <ExamineIndexProviders> <providers> <!-- Define the indexer for our custom data. Since we're only indexing one type of data, there's only 1 indexType specified: 'CustomData', however if you have more than one type of index (i.e. Media, Content) then you just need to list them as a comma seperated list without spaces. The dataService is how Examine queries whatever data source you have, in this case it's a custom data service defined in this project. A custom data service only has to implement one method... very easy. --> <add name="CustomIndexer" type="Examine.LuceneEngine.Providers.SimpleDataIndexer, Examine.LuceneEngine" dataService="ExamineDemo.CustomDataService, ExamineDemo" indexTypes="CustomData" runAsync="false"/> </providers> </ExamineIndexProviders> <ExamineSearchProviders defaultProvider="CustomSearcher"> <providers> <!-- A search provider that can query a lucene index, no other work is required here --> <add name="CustomSearcher" type="Examine.LuceneEngine.Providers.LuceneSearcher, Examine.LuceneEngine" /> </providers> </ExamineSearchProviders> </Examine> <ExamineLuceneIndexSets> <!-- Create an index set to hold the data for our index --> <IndexSet SetName="CustomIndexSet" IndexPath="App_Data\CustomIndexSet"> <IndexUserFields> <add Name="name" /> <add Name="description" /> <add Name="dateUpdated" /> </IndexUserFields> </IndexSet> </ExamineLuceneIndexSets> </configuration>
Categories: .Net | Examine | Umbraco

Examine demo site source code from CodeGarden 2010

by Shannon Deminick 1. July 2010 17:49

A few people were asking for the source code from my Examine presentation at CodeGarden, so here it is. I’m not going to go in to all of the details of this site or the Examine config as it’s pretty simple. However, i will give you a very quick run down of it and if you attended CodeGarden and my presentation, you’d probably already know this.

The Umbraco config for this demo site is simple: A search form, a couple of search result pages with different templates (results using XSLT extensions, results to query media using the FluentAPI, custom results using the FluentAPI). Then there’s the content: 5 very simple nodes consisting of a text field and a numeric field and a miniature blog with some posts and comments.

I’ve included all of the source files and a backup of the database that it was running on. The source files are left in the same state as we left the demo during the presentation. So to get it up and running, just restore the database to your MS SQL server, update your web.config, and put the project files into IIS (or open the solution in Visual Studio).

Download here

Categories: .Net | Examine | Umbraco

Examine slide deck for CodeGarden 2010

by Shannon Deminick 29. June 2010 16:55

A few people had asked during CodeGarden 2010 if I would post up the slide deck for my Examine presentation, so here it is. There’s not a heap of information in there since i think people would have soaked up most of the info during the examples and coding demos but it’s posted here regardless and hopefully it helps a few people.

I’ve included a PDF version (link at the bottom) and also the image version below (if you’re too lazy to download it :)

Slide2 Slide3 Slide4 Slide5 Slide6 Slide7 Slide8 Slide9 Slide10 Slide11 Slide12 Slide13 Slide14 Slide15

Download slide deck here

Categories: Examine | .Net | Umbraco

Examine RC2 posted

by Aaron Powell 17. April 2010 05:08

I’ve just released Examine RC2 into the while, you can download it from our CodePlex site.

RC2 fixes a bug in RC1 which wasn’t indexing user fields, only attribute fields.

There’s a few breaking changes with RC2:

  • IQuery.MultipleFields has been removed. Use IQuery.GroupedAnd, IQuery.GroupedOr, IQuery.GroupedNot or IQuery.GroupedFlexible to define how multiple fields are added
  • ISearchCriteria.RawQuery added which allows you to pass a raw query string to the underlying provider
  • ISearcher.Search returns a new interface ISearchResults (which inherits IEnumerable<SearchResult>)
  • New interface ISearchResults which exposes a Skip to support paging and TotalItemCount

 

Will be working on more documentation to explain some of the newly added and obscure features shortly :P.

Categories: .Net | Examine | Umbraco

Examine hits RC1

by Shannon Deminick 5. April 2010 12:04

I’m happy to announce that Examine and UmbracoExamine have today hit RC1!FileDownload[1]

The Codeplex site also has more extensive documentation about how to get UmbracoExamine up and running within your Umbraco website.

Go, download your copy today.

Categories: .Net | Examine | Umbraco

Examine, but not as you knew it

by Aaron Powell 21. March 2010 14:07

Almost 12 months ago Shannon blogged about Umbraco Examine a Lucene.NET indexer which works nicely with Umbraco 4.x. Since then we’ve done quite a bit of work on Examine, and as people will may be aware we’ve integrated Examine into the Umbraco core and it will be shipped out of the box with Umbraco 4.1.

Something Shannon and I had discussed a few times was that we wanted to decouple Examine from Umbraco so it could be used for indexing on sites other than Umbraco.
You’ll also notice that I keep referring to it as Examine, not Umbraco Examine which most people are more familiar with.
This is because over the last week we have achieved what we’d wanted to do, we’ve decoupled Examine from Umbraco!

So what’s Examine?

Examine is a provider based, config driven search and indexer framework. Examine provides all the methods required for indexing and searching any data source you want to use.

Examine is now agnostic of the indexer/ searcher API, as well as the data source. That’s right Examine has no references within itself to Umbraco, nor does it have any references to Lucene.NET.
We have still maintained a usage of XML internally for passing the data-to-index around, as it’s the easiest construct which we could think to work with and pass around.

You could implement the Examine framework in any solution, to index any data you want, it could be from a SQL server, or it could be from web-scraped content.

Where does that leave Umbraco Examine?

Umbraco Examine still exists, in fact it’s the primary (and currently only) implementer of Examine. Over the last week though we’ve done a lot of refactoring of Umbraco Examine to work with some changes we’ve done to the underlying Examine API.

Changes? What changes?

Last week anyone who follows me on Twitter will have seen a lot of tweets around Umbraco Examine which was about a new search API and the breaking changes we were implementing.

While looking to refactor the underlying API of a large Umbraco site we have running I found that Examine was actually not properly designed if you wanted to search for data in specific fields, or build complex search queries.

This was a real bugger, I had many different parameters I needed to optionally search on, and only in certain fields, but since Umbraco Examine works with just a raw string this wasn’t possible.

So I set about creating a new fluent search API. This has actually turned out quite well, in fact so well that we new have this as the recommended search method, not raw text (which is still available).

The fluent API is part of the Examine API so it’s also available for any implementation, not just Umbraco! Since we’ve used Lucene.NET as the initial support model the API is designed similarly to what you’d expect from Lucene.NET, but we hope that it’s generic enough to look and feel right for any indexer/ searcher.

Here’s how the fluent API looks:

searchCriteria
.Id(1080)
.Or()
.Field("headerText", "umb".Fuzzy())
.And()
.NodeTypeAlias("cws".MultipleCharacterWildcard())
.Not()
.NodeName("home");

All you have to do is pass that into your searcher. That easy, and that beautiful. I’ll do a blog post where we’ll look more deeply into the fluent API separately.

Additionally we’ve done some other changes, because of what the framework new is we’ve renamed our assemblies and namespaces:

  • Examine.dll
    • This was formally UmbracoExamine.Core.dll
    • Root namespace Examine
    • Contains all the classes and methods to create your own indexer and searcher
  • UmbracoExamine.dll
    • This was formally UmbracoExamine.Providers.dll
    • Root namespace UmbracoExamine.dll
    • Contains all the classes and methods of an Umbraco & Lucene.NET

Apologies to any existing implementations of Umbraco Examine, this will result in breaking changes but since we’ve not hit RC yet too bad :P.

There are also some changes to the config, <IndexUserFields /> has become <IndexStandardFields />, and obviously the config registrations are different with the assembly and namspace changes.

The last change is that we’ve moved to the Ms-PL license for Examine, whos source is available on codeplex.

 

Currently we’re working to tidy up the API and the documentation so that we can get the RC release out shortly, so watch this space.

Categories: Umbraco | .Net | Examine

Umbraco 5th birthday meetup in Sydney @ TheFARM

by Shannon Deminick 12. February 2010 11:39

Come on down to TheFARM to share in some beers and take part in the global Umbraco 5th birthday festivities.

Your hosts will be core Umbraco team developers Shannon Deminick & Aaron Powell, both of whom work for TheFARM (http://www.thefarmdigital.com.au).

The plan is for Shan and Aaron to run a Q&A session with some demo’s of the fun stuff TheFARM has been doing with Umbraco 4.1 and the work they have been doing on this next release.

  • They'll have a look at all of the new features/fixes for 4.1 (are there are TONS)
  • They'll go into a bit more in detail on some of the new things that we've integrated into the core such as LINQ to Umbraco, Umbraco Examine, new controls, enhancements, preview, etc…
  • They’ll show you some of the sites we’ve built and talk through some of the implementation’s with things like Flash

Hopefully, with two of the core team on hand we should be able to answer most questions thrown at us – give us a go!

Once we're out of beers... TO THE PUB!

All of the details, address, etc.. is on the Our Umbraco website. Have a look and RSVP now!

http://our.umbraco.org/events/umbraco-5th-birthday-meetup-in-sydney

 

Just in case you don’t want to click through here’s the event details:

Tuesday, February 16, 2010 - 6:00 PM
Suite 101, 4 - 14 Buckingham st Surry Hills, NSW

Categories: .Net | Flash | Umbraco

Umbraco Examine v4.x - Powerful Umbraco Indexing

by Shannon Deminick 20. April 2009 17:55

This post it outdated. For the latest information on Examine please refer to either the Examine page on our site or the Examine CodePlex project home

Umbraco Examine is a powerful, fully configurable, and extensible library used for indexing Umbraco content to allow for fast and easy content searching. It utilizes the Lucene.Net library which is included in the Umbraco installation (v2.x). It is extremely easy to setup and caters for simple indexing/searching to very complex index/searching by utilizing it's fully extensible codebase and it's event model. The library was built with .Net 3.5 SP1 and has not been tested with previous versions of .Net.

Basic Setup

  • Copy the DLL files to the bin folder
  • Add the following to the <configSections> portion of your Web.config file:
<section name="UmbLuceneIndex" 
type="TheFarm.Umbraco.Lucene.Configuration.IndexSets, TheFarm.Umbraco.Lucene" />
  • For the most basic setup, add the following to the configuration in your Web.config (Also see the readme.txt and app.config files in the binaries download!):
<UmbLuceneIndex DefaultIndexSet="MyIndexSet" EnableDefaultActionHandler="true">
<IndexSet SetName="MyIndexSet" IndexPath="~/data/UmbracoExamine/" MaxResults="100">
<IndexUmbracoFields>
<add Name="id" /> <!-- REQUIRED -->
<add Name="nodeName" /> <!-- REQUIRED -->
<add Name="updateDate" />
<add Name="writerName" />
<add Name="path" />
<add Name="nodeTypeAlias" /> <!-- REQUIRED -->
</IndexUmbracoFields>
<IndexUserFields>
<add Name="PageTitle"/>
<add Name="PageContent"/>
</IndexUserFields>
<IncludeNodeTypes />
<ExcludeNodeTypes />
</IndexSet>
</UmbLuceneIndex>
  • Create the folder: ~/data/UmbracoExamine/ since this is what has been specified for the index path above. 
    • Ensure that the IIS user has full control on this folder.
  • Since EnableDefaultActionHandler is set to true, each time a node is published, it will be indexed based on the rules suplied in the configuration. When a node is unpublished, it will automatically be removed from the index.
  • Log into Umbraco, publish a node and verify that files have been created in the index path as specified above.

Basic Search

  • To perform a search:
UmbracoIndexer examine = new UmbracoIndexer();
List<SearchResult> results = examine.Search("find this", true);
  • The returned structure is simple, containing 3 properties: Id, Score and Fields:
public int Id { get; set; }
public float Score { get; set; }
public Dictionary<string, string> Fields { get; set; }
  • The Fields property contains all of the field data that has been configured in the web.config file.

Advanced Setup

You can create multiple indexes depending on your needs. For example, you may want to have different indexes for different portal sites in your content tree, or different indexes to separate the type of content being indexed such as one for News and one for Forum, as an example. Creating different indexes if easy:

<UmbLuceneIndex DefaultIndexSet="Site1" EnableDefaultActionHandler="true"> 	<!-- Create an index for a site called 'Site1' which has a starting parent   	node in the content tree of 1234. Only nodes that have the Id, or are children of  	node 1234 will be indexed. -->     <IndexSet SetName="Site1" IndexPath="~/data/indexes/site1/" MaxResults="100" 	IndexParentId="1234">       <IndexUmbracoFields>         <add Name="id" /> <!-- REQUIRED -->         <add Name="nodeName" /> <!-- REQUIRED -->         <add Name="updateDate" />         <add Name="writerName" />         <add Name="path" />         <add Name="nodeTypeAlias" /> <!-- REQUIRED -->         <add Name="parentID"/>       </IndexUmbracoFields>       <IndexUserFields>         <add Name="PageTitle"/>         <add Name="PageContent"/> 	<add Name="CommentText"/> 	<add Name="CommentUser"/>         <add Name="umbracoNaviHide"/>       </IndexUserFields>       <IncludeNodeTypes> 	<add Name="HomePage" />         <add Name="BasicPage" />         <add Name="Comment" />       </IncludeNodeTypes>       <ExcludeNodeTypes />     </IndexSet> 	<!-- Create an index for a site called 'Site2' which has a starting parent node in  	the content tree of 4567. Only nodes that have the Id, or are children of node 4567  	will be indexed. -->     <IndexSet SetName="Site2" IndexPath="~/data/indexes/site2/" MaxResults="100"  	IndexParentId="4567">       <IndexUmbracoFields>         <add Name="id" /> <!-- REQUIRED -->         <add Name="nodeName" /> <!-- REQUIRED -->         <add Name="updateDate" />         <add Name="writerName" />         <add Name="path" />         <add Name="nodeTypeAlias" /> <!-- REQUIRED -->       </IndexUmbracoFields>       <IndexUserFields>         <add Name="PageTitle"/>         <add Name="PageContent"/>         <add Name="umbracoNaviHide"/>	 	<!-- You can add as many user fields here that you would like to be indexed... -->		       </IndexUserFields>       <IncludeNodeTypes />       <ExcludeNodeTypes> 	<!-- Index everything except for document types of 'UserNotes' --> 	<add Name="UserNotes" />       </ExcludeNodeTypes>     </IndexSet> </UmbLuceneIndex>

Advanced Search

There are a few overriden search methods you can use to perform different types of searches, all depends on what kind of results you want to acheive:

//This will create a new examiner to search in Site1 since Site 1 is //listed as the default Index in the configuration. UmbracoIndexer examine = new UmbracoIndexer(); List<SearchResult> results = examine.Search("find this in Site1", true);  //This will create a new examiner to search in Site2 UmbracoIndexer examine2 = new UmbracoIndexer("Site2"); List<SearchResult> results2 = examine2.Search("find this in Site2", true);  //disables wild card searching List<SearchResult> results3 = examine2.Search("find exact matches in Site2", false);  //searches site 2 but only in NewsArticle document types List<SearchResult> results4 = examine2.Search("find news in Site2",  	"NewsArticle", true, null);  //searches site 2 but only for nodes that are children of the node with ID 4999 List<SearchResult> results5 = examine2.Search("find something in Site2", "", true, 4999);  //searches site 1, in all of it's defined doc types to be searched but only in  //the properties: PageTitle and PageContent and will only return a maximum  //of 10 results. List<SearchResult> results6 = examine.Search("find in Site1", "",  	true, null, new string[] {"PageTitle","PageContent"}, 10);
Categories: .Net | Umbraco