Field types (data types) and field attributes (options)

What is the meaning for field option indexed?

(true | false) True if this field should be "indexed". A field must be indexed if you want it to be searchable, sortable, and facetable.

What is the meaning for field option stored?

(true | false) True if the value of the field should be retrievable during a search. The content of a "stored" field are saved. This is useful for retrieving and highlighting the contents for display but is not necessary for actual search.

What is the meaning for field option "default"?

The default value for this field if not provided while adding documents

What is the meaning for field option "compressed"?

(true | false) True if this field should be stored using gzip compression. (This will only apply if the field type is compressable; among the standard field types, only TextField and StrField are.)

You may want to reduce the storage size at the expense of slowing down indexing and searching by compressing the field's data. Only the fields with a class of StrField, or TextField are compressible. This is usually only suitable for fields that have over 200 characters. You can set this threshold with the compressThreshold option in the field type, not the field definition.

What is the meaning of field option "compressThreshold"?

An integer. compressThreshold is the minimum length required for text compression to be invoked. This applies only if compressed=true. A common pattern is to set compressThreshold on the field type definition, and turn compression on and off in the individual field definitions.

What is the meaning for field option "omitNorms"?

(true | false) Set to true to omit the norms associated with this field (this disables length normalization and index-time boosting for the field, and saves some memory). Only full-text fields or fields that need an index-time boost need norms.

Basically, if the length of a field does not affect your scores for the field, and you are not doing index-time document boosting, then enable this. Some memory will be saved. For typical general text fields, you should not set omitNorms. Enable it if you aren't scoring on a field, or if the length of the field would be irrelevant if you did so.

What is the meaning for field option "omitTermFreqAndPosition"?

(true | false) If set, omit term freq, positions and payloads from postings for this field. This can be a performance boost for fields that don't require that information and reduces storage space required for the index. Queries that rely on position that are issued on a field with this option will silently fail to find documents.

What is the meaning for field option "sortMissingLast" and "sortMissingFirst"?

Sorting on a field with one of these set to true indicates on which side of the search results to put document that have no data for the specified field, regardless of the sort direction. The default behavior for such documents is to appear first for ascending, and last for descending.

What is the meaning for field option "termVectors"?

This will tell Lucene to store information that is used in a few cases to improve performance. If a field is to used by the MoreLikeThis feature, or if you are using it and it's a large field for highlighting, then enable this.

What is the meaning for field option "multiValue"?

(true | false) True if this field may contain multiple values per document, i.e. if it can appear multiple times in a document.

Enable this if a field can contain more than one value. Order is maintained from that supplied at index-time. This is internally implemented by separating each value with a configurable amount of white space — the positionIncrementGap

What is the meaning for field option "positionIncrementGap"?

For a multiValued field, this is the number of (virtual) spaces between each value to prevent inadvertent querying across field values.

How to track when a document was added to the index?

Add a date field and specify field option default="NOW";

What is the meaning for field option "required"?

Set this to true if you want Solr to fail to index a document that does not have a value for this field.

What is the meaning of field type "poly"?

poly: Some FieldTypes can be "poly" field types. A Poly FieldType is one that can potentially create multiple Fields per "declared" field. The primary example in Solr is the PointType. Depending on the dimension specified, one or more Fields will be created. For example:

<fieldType name="location" class="solr.PointType" dimension="2" subFieldTypes="double"/>

Declares a FieldType that can be used to represent a point in 2 dimensions (i.e. a lat/lon). The subFieldTypes value tells Solr what the underlying representation will be for the values in the field, in this case a FieldType called "double".

Thus, a Field declaration like:

<field name="store" type="location" indexed="true" stored="true"/>

can be indexed like:

<add>
<doc>
<field name="store">35.9,-79.0</field>
</doc>
</add>

Underneath the hood, Solr will create two fields (using dynamic fields) to store the information.

What is the default value for multiValued attribute?

In 1.0, multiValued attribute did not exist. All fields were multiValued by nature. In 1.1, multiValued attribute was introduced (false by default)

What is the default value for omitTermFreqAndPositions attribute?

omitTermFreqAndPositions attribute was introduced in 1.2, (true by default except for text fields)

What is the fully qualified classname for solr.BoolField?

A field type has a unique name and is implemented by Java class specified by the class attribute:

<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" omitNorms="true"/>

A fully qualified classname in Java looks like org.apache.solr.schema.BoolField. The last piece is the simple name of the class, and the part preceding it is called the package name. In order to make configuration files in Solr more concise, the package name can be abbreviated to just solr for most of Solr's built-in classes.


For the most part, Lucene only deals with strings, so integers, floats, dates, and doubles require special handling to be searchable.

The <types> section allows you define a list of <fieldtype>, the underlying Solr class that should be used for that type, as well as the default options you want for fields that use that type.

Any subclass of FieldType may be used as a field type class, using either its full package name, or the "solr" alias if it is in the default Solr package.

For common numeric types (integer, float, etc…) there are multiple implementations provided depending on your needs, please see SolrPlugins for information on how to ensure that your own custom Field Types can be loaded into Solr.

http://wiki.apache.org/solr/FieldOptionsByUseCase
http://wiki.apache.org/solr/SolrPerformanceFactors
http://wiki.apache.org/solr/SchemaXml
http://www.ibm.com/developerworks/java/library/j-solr1/

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License