Categories

Versions

You are viewing the RapidMiner Studio documentation for version 9.8 - Check here for latest version

Search Solr (Documents) (Solr)

Synopsis

This operator searches for Solr entries and generates a document for each result.

Description

To connect to a Solr server, you have to specify a Solr connection. This comprises the URL of a Solr server and an optional user/password combination for authentication. Typically, the Solr server URL ends with the string '/solr'.

The next step is to select a collection on the server. A collection can be imagined as a table. It is composed of several columns, which are called Solr fields. A Solr field has a type (e.g. number) and a key (the name of the column). Each entry in Solr can be imagined as a row and contains values for the respective fields.

A RapidMiner document has a set of metadata records, which consist of a key and a related value. The metadata keys are mapped to the Solr attributes. RapidMiner documents have an additional body. Therefore you can select a Solr field, whose contents will bestored in the RapidMiner document body.

To search Solr, you have to specify a query string. You can add filters to refine your query. E.g., if you want to receive no items with an attribute key "popularity" and the value "6", use "!popularity:6". The range of the entries to receive can be set by the attributes offset and rows. You can specify, which field is used to sort the received entries. It is also possible to enable faceting. Faceted search breaks up search results into multiple categories. Use "facet fields" and "date facets" to specify Solr fields for faceting.

If a Solr field supports multiple elements, the related values are provided as a JSON array.

Input

  • connection (Connection)

    This input port expects a Connection object if any. See the parameter connection entry for more information.

Output

  • output (Collection)

    This port provides the main search result. It consists of a collection of documents.

  • facets (Data Table)

    This port is used to provide results of the faceted search. An example set is provided and contains the field name, the value which was found, and the number of occurrences.

  • connection (Connection)

    This output port delivers the Connection object from the input port. If the input port is not connected the port delivers nothing.

Parameters

  • connection_source This parameter indicates how the connection should be specified. It gives you two options, predefined and repository. The parameter is not visible if the connection input port is connected. Range: selection
  • connection_entry This parameter is only available when the connection source parameter is set to repository. This parameter is used to specify a repository location that represents a connection entry. The connection can also be provided using the connection input port. Range: string
  • connection This parameter is only available when the connection source parameter is set to predefined. The connection details for the Solr connection have to be specified. If you have already configured a Solr connection, you can select it from the drop-down list. If you have not configured a Solr connection yet, select the icon to the right of the drop-down list. Create a new Solr connection in the Manage connections dialog. The Solr server URL is required. Additionally, you can specify a username/password combination for authentication. Range: configurable
  • collection Provide the name of the Solr collection, which has to be used to access data. Range: string
  • query The term to search for. Range: string
  • document_body_field The Solr field, which is used as the RapidMiner document body. Range: string
  • filter_query A filter, which does not influence the relevancy score, which is the default sort order. With this field, you can refine your query. E.g. if the field name has to contain John, but must not contain Doe, you can use 'name:John -name:Doe'. Range: string
  • offset The first document index to fetch. Range: integer
  • limit The maximum number of results. Range: integer
  • sort Specifies, if search results are sorted. Range: boolean
  • sort_field The Solr field which is used for sorting. Range: string
  • sort_order The sorting order of results. Range: selection
  • faceted_search Specifies, if faceted searching is used. Range: boolean
  • categorical_facets The facets to use for faceted search. Range: enumeration
  • date_facets The date facets to use for faceted search. A single date facet consists of the field name, a start date, an end date, and a gap. Range: enumeration
  • include_generated_fields Specifies, if automatically generated fields are included into search results. These fields can consist of SolrCloud fields or can be based on dynamic Solr fields. Range: boolean