public class HarisDocument

A class that holds information about the document that will be indexed. This is a uniform document model that can be used for indexing webpages, text-based documents such as Word, PDF or multimedia documents such as videos, images.

Property List
Property Description
Url This is the uniform identifier of the document that will be indexed. If the document is created for a webpage this will be url of the page eg. http://www.yourdomain.com/yourblog/?postid=1. If the document is created for a PDF file, this will be uri of the pdf file eg. http://cdn.yourdomain.com/docs/le-wild-book.pdf.
PartSpace Gets or sets the partspace of the document. Partspaces can be used to seperate indexed documents inside a single cloud database. Common use case for this feature is categorizing indexed documents. eg. 'news', 'articles' etc.
Content Each word in this field will be indexed for this document with a calculated correlation value. Later, when a word searched, this correlation values will be used to return related documents with searched word.
Fields This is a key value pair dictionary for storing complimentary information about the document. One thing to note here is all keys must be unique in the array. Later this fields can bu used to filter or sort to create a faceted search application in your client. For example; if the resource you are indexing has an image you can add a key-value pair like: "ImageURI", "http://cdn.somedomain.com/images/image1.png". Later, when you show search results to your users, you can use this image to enrich your application's UI. Or you can define a size parameter for indexing products you sell and create a filtering system based on size.
Tag The words specified in this property affect search results but not shown to end-user. It is especially useful when used with text content less then 25 words. For example; if you are indexing a video portal you can add additional text content here other than description shown to the user thus increasing accuracy of the search results.
DocumentDate Document's original creation date. If left null, this property will be set to current datetime during the indexing for sorting purposes.
Culture Two characters culture code of the documnet. eg. 'en', 'tr', 'it'. This value will be read from config file by default. It's important to set this value correctly for each document's culture; because of its vital importance on calculating correlation values for indexing words.

Last edited Sep 25, 2012 at 11:09 AM by Kimola, version 8

Comments

No comments yet.