Skip to content

Latest commit

 

History

History
359 lines (292 loc) · 12.3 KB

configuration.md

File metadata and controls

359 lines (292 loc) · 12.3 KB

Configuration

Most of the configuration surface of this module lies in the appropriately titled IndexConfiguration class. This namespace is used primarily for specifying the indexing behaviour of content types, but it is also used to store platform agnostic settings.

Basic configuration

Let's index our pages!

# example configuration
SilverStripe\Forager\Service\IndexConfiguration:
  indexes:
    main:
      includeClasses:
        SilverStripe\CMS\Model\SiteTree:
          fields:
            title:
              property: Title
            content: true
            term_ids:
              property: Terms.ID
              options:
                type: number
            

Let's start with a few relevant nodes:

  • main: The name of the index. The rules on what this can be named will vary depending on your service provider. EG: For EnterpriseSearch, it should only contain lowercase letters, numbers, and hyphens

  • includedClasses: A list of content classes to index. Versioned DataObjects are supported by default (see Indexing DataObjects below). To add other kinds of objects you need to add a Document Type

Indexing DataObjects

To put a DataObject in the index it needs to be added to the index configuration and it needs to have the the SearchServiceExtension added:

SilverStripe\Forager\Service\IndexConfiguration:
  indexes:
    main:
      includeClasses:
        MyProject\MyApp\Product:
         # see below for per-class options
MyProject\MyApp\Product:
  extensions:
    - SilverStripe\Forager\Extensions\SearchServiceExtension

DataObjects also require the SilverStripe\Versioned\Versioned extension. Non-versioned content is not yet supported. By default a versioned object will be added to the index when it is published and removed when it is unpublished.

DataObject Fields

To define what content should be indexed you need to add keys to the fields object. This tells the module which fields to send to the index and allows you do do some customisation. For example with the following configuration:

SilverStripe\Forager\Service\IndexConfiguration:
  indexes:
    main:
      includeClasses:
        SilverStripe\CMS\Model\SiteTree:
          fields:
            title:
              property: getSearchTitle
            count:
              property: Count
              options:
                type: number
            content: true
            
  • fields Is a map of the search field name as the key. This matches the field name in your search index. The value can be boolean or a configuration map with the following options.

    • property: getSearchTitle: This tells the field resolver on the document how to map the instance of the source class (SiteTree) to the value in the document (title). In this case, we want the getSearchTitle method to be called to get the value for title.
    • options.type: number this tells the search provider what type to store the field as. Types may differ between providers so refer to the provider module for more detail.
  • content: true: This is a shorthand that only works on DataObjects. The resolver within DataObjectDocument will first look for the php property $content but if that is not found SiteTree it will look for a DataObject property with an uppercase first letter e.g. Content.

It is important to note that the keys of fields can be named anything you like, so long as it is valid in your search service provider (for EnterpriseSearch, that's all lowercase and underscores). There is no reason why title cannot be document_title for instance.

Indexing relational data

Content on related objects can be added to a search document as an array:

SilverStripe\Forager\Service\IndexConfiguration:
  indexes:
    myindex:
      includeClasses:
        MyProject\MyApp\BlogEntry:
          fields:
            title: true
            content: true
            tags:
              property: 'Tags.Title'
            imagename:
              property: 'FeaturedImage.Name'
            commentauthors:
              property: 'Comments.Author.Name'
            term_ids:
              property: Terms.ID
              options:
                type: number

For DataObject content, the dot syntax allows traversal of relationships. If the final property in the notation is on a list, it will use the ->column() function to derive the values as an array.

This will roughly get indexed as a structure like this:

{
  "title": "My Blog",
  "tags": ["tag1", "tag2"],
  "imagename": "Some image",
  "commentauthors": ["Author one", "Author two", "Author three"],
  "term_ids": [1, 2, 3]
}

For more information on EnterpriseSearch specific configuration, see the Search- Service - Elastic module.

Batch size

Documents are sent to the search provider to be indexed. These requests are batched together to allow provider modules to reduce API calls. You can control the batch size gobally and at a per class level.

The global batch size is set on the Index configuration class. The default is 100; below is an example of reducing it to 75.

SilverStripe\Forager\Service\IndexConfiguration:
  batch_size: 75 # global batch size          

The global size will apply to all classes that are indexed but you can change it per class. For example the below configuration will set the batch size for the SilverStripe\CMS\Model\SiteTree class to 50. All other that do not define a batch_size classes will use the global batch size of 75.

SilverStripe\Forager\Service\IndexConfiguration:
  batch_size: 75
  indexes:
    myindex:
      includeClasses:
        SilverStripe\CMS\Model\SiteTree:
          batch_size: 50
            

Batch cooldown

If you would like to specify a "cooldown period" after each batch of a Job is processed, then you can do so with the following configuration.

SilverStripe\Forager\Jobs\BatchJob:
  # Set a cooldown of 2 seconds
  batch_cooldown_ms: 2000

Use cases:

  • Some services include rate limits. You could use this feature to effectively "slow down" your processing of records

  • Some classes can be quite process intensive (EG: Files that require you to load them into memory in order to send them to your service provider). This "cooldown", plus batch_sizes at a class level, should provide you with some dials to turn to try and reduce the impact that reindexing has on your application

Advanced configuration

Let's look at all the settings on the IndexConfiguration class:

Setting Type Description Default value
enabled bool A global setting to turn indexing on and off true
batch_size int The default batch sized used when bulk indexing (EG EnterpriseSearch has a limit of `100` documents per batch. 100
crawl_page_content bool If true, attempt to render pages in a controller and extract their content into its own field. true
include_page_html bool If true, leave HTML in the crawled page content defined above. false
use_sync_jobs bool If true, run queued jobs as synchronous processes. Not recommended for production, but useful in dev mode. false
id_field string The name of the identifier field on all documents "id"
source_class_field string The name of the field that stores the source class of the document (e.g. "SilverStripe\CMS\Model\SiteTree") "source_class"
auto_dependency_tracking bool If true, allow DataObject documents to compute their own dependencies. This is particularly relevant for content types that declare relational data as indexable. More information in the usage section "source_class"
max_document_size int|null An int specifying the max size a document can be in bytes. If set any document that is larger than the defined size will not be indexed and a warning will be thrown with the details of the document null

Per environment indexing

By default, index names are decorated with the environment they were created in, for instance dev-myindex, live-myindex This ensures that production indexes don't get polluted with sensitive or test content. This decoration is known as the index_variant, and the environment variable it uses can be configured. By default, as described above, the environment variable is SS_ENVIRONMENT_TYPE.

SilverStripe\Core\Injector\Injector:
  SilverStripe\Forager\Service\IndexConfiguration:
    constructor:
      index_variant: '`MY_CUSTOM_VAR`'

This is useful if you have multiple staging environments and you don't want to overcrowd your search instance with distinct indexes for each one.

Full page indexing

Page and DataObject content is eligible for full-page indexing of its content. This is predicated upon the object having a Link() method defined that can be rendered in a controller.

The content is extracted using an XPath selector. By default, this is //main, but it can be configured.

SilverStripe\Forager\Service\PageCrawler:
  content_xpath_selector: '//body'

Subsites

Due to the way that filtering works with (eg) Elastic Enterprise Search, you may want to split each subsite's content into a separate engine. To do so, you can use the following configuration:

SilverStripe\Forager\Service\IndexConfiguration:
  indexes:
    content-subsite0:
      subsite_id: 0
      includeClasses:
        Page: &page_defaults
          fields:
            title: true
            content: true
            summary: true
        My\Other\Class: &other_class_defaults
          fields:
            title:
              property: Title
            summary:
              property: Summary
    content-subsite4:
      subsite_id: 4 # or you can use environment variable such as 'NAME_OF_ENVIRONMENT_VARIABLE'
      includeClasses:
        Page:
          <<: *page_defaults
          My\Other\Class:
          <<: *other_class_defaults

Note the syntax to reduce the need for copy-paste if you want to duplicate the same configuration across.

Additional note:

In the sample above, if the data object (My\Other\Class) does not have a subsite ID, then it will be included in the indexing as it is explicitly defined in the index configuration

This is handled via SubsiteIndexConfigurationExtension - this logic could be replicated for other scenarios like languages if required.

Configuring search exclusion for files

By default, SilverStripe\Assets\Image is excluded from the search. To change this default setting, use the code snippet below.

---
After: silverstripe-forager-form-extension
---
SilverStripe\Forager\Extensions\SearchFormFactoryExtension:
  exclude_classes: null

If you want to exclude certain file extensions from being added to the search index, add the following configuration to your code base:

SilverStripe\Forager\Extensions\SearchFormFactoryExtension:
  exclude_file_extensions: 
    - svg
    - mp4

More information