Categories

Versions

You are viewing the RapidMiner Studio documentation for version 9.9 - Check here for latest version

Date to Nominal (RapidMiner Studio Core)

Synopsis

This operator parses the date values of the specified date attribute with respect to the given date format string and transforms the values into nominal values.

Description

The Date to Nominal operator transforms the specified date attribute and writes a new nominal attribute in a user specified format. This conversion is done with respect to the specified date format string that is specified by the date format parameter. This operator might be useful for time base OLAP to change the granularity of the time stamps from day to week or month. The date attribute is selected by the attribute name parameter. The old date attribute will be removed and replaced by a new nominal attribute if the keep old attribute parameter is not set to true. The understanding of Date and Time patterns is very important for using this operator properly.

Date and Time Patterns

This section explains the date and time patterns. Understanding of date and time patterns is necessary especially for specifying the date format string in the date format parameter. Within date and time pattern strings, unquoted letters from 'A' to 'Z' and from 'a' to 'z' are interpreted as pattern letters that represent the components of a date or time. Text can be quoted using single quotes (') to avoid interpretation as date or time components. All other characters are not interpreted as date or time components; they are simply matched against the input string during parsing.

Here is a brief description of the defined pattern letters. The format types like 'Text', 'Number', 'Year', 'Month' etc are described in detail after this section.

  • G: This pattern letter is the era designator. For example: AD, BC etc. It follows the rules of 'Text' format type.
  • y: This pattern letter represents year. yy represents year in two digits e.g. 96 and yyyy represents year in four digits e.g. 1996. This pattern letter follows the rules of the 'Year' format type.
  • M: This pattern letter represents the month of the year. It follows the rules of the 'Month' format type. Month can be represented as; for example; March, Mar or 03 etc.
  • w: This pattern letter represents the week number of the year. It follows the rules of the 'Number' format type. For example, the first week of January can be represented as 01 and the last week of December can be represented as 52.
  • W: This pattern letter represents the week number of the month. It follows the rules of the 'Number' format type. For example, the first week of January can be represented as 01 and the forth week of December can be represented as 04.
  • D: This pattern letter represents the day number of the year. It follows the rules of the 'Number' format type. For example, the first day of January can be represented as 01 and last day of December can be represented as 365 (or 366 in case of a leap year).
  • d: This pattern letter represents the day number of the month. It follows the rules of the 'Number' format type. For example, the first day of January can be represented as 01 and the last day of December can be represented as 31.
  • F: This pattern letter represents the day number of the week. It follows the rules of the 'Number' format type.
  • E: This pattern letter represents the name of the day of the week. It follows the rules of the 'Text' format type. For example, Tuesday or Tue etc.
  • a: This pattern letter represents the AM/PM portion of the 12-hour clock. It follows the rules of the 'Text' format type.
  • H: This pattern letter represents the hour of the day (from 0 to 23). It follows the rules of the 'Number' format type.
  • k: This pattern letter represents the hour of the day (from 1 to 24). It follows the rules of the 'Number' format type.
  • K: This pattern letter represents the hour of the day for 12-hour clock (from 0 to 11). It follows the rules of the 'Number' format type.
  • h: This pattern letter represents the hour of the day for 12-hour clock (from 1 to 12). It follows the rules of the 'Number' format type.
  • m: This pattern letter represents the minutes of the hour (from 0 to 59). It follows the rules of the 'Number' format type.
  • s: This pattern letter represents the seconds of the minute (from 0 to 59). It follows the rules of the 'Number' format type.
  • S: This pattern letter represents the milliseconds of the second (from 0 to 999). It follows the rules of the 'Number' format type.
  • z: This pattern letter represents the time zone. It follows the rules of the 'General Time Zone' format type. Examples include Pacific Standard Time, PST, GMT-08:00 etc.
  • Z: This pattern letter represents the time zone. It follows the rules of the 'RFC 822 Time Zone' format type. Examples include -0800 etc.

Please note that all other characters from 'A' to 'Z' and from 'a' to 'z' are reserved. Pattern letters are usually repeated, as their number determines the exact presentation. Here is the explanation of various format types:

  • Text: For formatting, if the number of pattern letters is 4 or more, the full form is used; otherwise a short or abbreviated form is used (if available). For parsing, both forms are acceptable independent of the number of pattern letters.
  • Number: For formatting, the number of pattern letters is the minimum number of digits. The numbers that are shorter than this minimum number of digits are zero-padded to this amount. For example if the minimum number of digits is 3 then the number 5 will be changed to 005. For parsing, the number of pattern letters is ignored unless it is needed to separate two adjacent fields.
  • Year: If the underlying calendar is the Gregorian calendar, the following rules are applied: For formatting, if the number of pattern letters is 2, the year is truncated to 2 digits; otherwise it is interpreted as a 'Number' format type. For parsing, if the number of pattern letters is more than 2, the year is interpreted literally, regardless of the number of digits. So using the pattern 'MM/dd/yyyy', the string '01/11/12' parses to 'Jan 11, 12 A.D'. For parsing with the abbreviated year pattern ('y' or 'yy'), this operator must interpret the abbreviated year relative to some century. It does this by adjusting dates to be within 80 years before and 20 years after the time the operator is created. For example, using a pattern of 'MM/dd/yy' and the operator created on Jan 1, 1997, the string '01/11/12' would be interpreted as Jan 11, 2012 while the string '05/04/64' would be interpreted as May 4, 1964. During parsing, only strings consisting of exactly two digits will be parsed into the default century. Any other numeric string, such as a one digit string, a three or more digit string, or a two digit string that is not all digits (for example, '-1'), is interpreted literally. So '01/02/3' or '01/02/003' are parsed, using the same pattern, as 'Jan 2, 3 AD'. Likewise, '01/02/-3' is parsed as 'Jan 2, 4 BC'. Otherwise, if the underlying calendar is not the Gregorian calendar, calendar system specific forms are applied. If the number of pattern letters is 4 or more, a calendar specific long form is used. Otherwise, a calendar short or abbreviated form is used.
  • Month: If the number of pattern letters is 3 or more, the month is interpreted as 'Text' format type otherwise, it is interpreted as a 'Number' format type.
  • General time zone: Time zones are interpreted as 'Text' format type if they have names. It is possible to define time zones by representing a GMT offset value. RFC 822 time zones are also acceptable.
  • RFC 822 time zone: For formatting, the RFC 822 4-digit time zone format is used. General time zones are also acceptable.
This operator also supports localized date and time pattern strings by defining the locale parameter. In these strings, the pattern letters described above may be replaced with other, locale-dependent pattern letters.

The following examples show how date and time patterns are interpreted in the U.S. locale. The given date and time are 2001-07-04 12:08:56 local time in the U.S. Pacific Time time zone.

  • 'yyyy.MM.dd G 'at' HH:mm:ss z': 2001.07.04 AD at 12:08:56 PDT
  • 'EEE, MMM d, yy': Wed, Jul 4, '01
  • 'h:mm a': 12:08 PM
  • 'hh 'oclock' a, zzzz': 12 oclock PM, Pacific Daylight Time
  • 'K:mm a, z': 0:08 PM, PDT
  • 'yyyy.MMMMM.dd GGG hh:mm aaa': 2001.July.04 AD 12:08 PM
  • 'EEE, d MMM yyyy HH:mm:ss Z': Wed, 4 Jul 2001 12:08:56 -0700
  • 'yyMMddHHmmssZ': 010704120856-0700
  • 'yyyy-MM-dd'T'HH:mm:ss.SSSZ': 2001-07-04T12:08:56.235-0700

Input

  • example set input (Data Table)

    This input port expects an ExampleSet. It is the output of the Subprocess operator in the attached Example Process. The output of other operators can also be used as input. The ExampleSet should have at least one date attribute because if there is no such attribute, the use of this operator does not make sense.

Output

  • example set output (Data Table)

    The selected date attribute is converted to a nominal attribute according to the specified date format string and the resultant ExampleSet is delivered through this port.

  • original (Data Table)

    The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

Parameters

  • attribute_nameThe name of the date attribute is specified here. The attribute name can be selected from the drop down box of the attribute name parameter if the meta data is known. Range: string
  • date_formatThis is the most important parameter of this operator. It specifies the date time format of the desired nominal attribute. This date format string specifies what portion of the date attribute should be stored in the nominal attribute. Date format strings are discussed in detail in the description of this operator. Range:
  • localeThis is an expert parameter. A long list of locales is provided; users can select any of them. Range: selection
  • keep_old_attributeThis parameter indicates if the original date attribute should be kept or it should be discarded. Range: boolean

Tutorial Processes

Introduction to the Date to Nominal operator

This Example Process starts with a Subprocess operator. The subprocess delivers an ExampleSet with just a single attribute. The name of the attribute is 'deadline_date'. The type of the attribute is date. A breakpoint is inserted here so that you can view the ExampleSet. As you can see, all the examples of this attribute have both date and time information. The Date to Nominal operator is applied on this ExampleSet to change the type of the 'deadline_date' attribute from date to nominal type. Have a look at the parameters of the Date to Nominal operator. The attribute name parameter is set to 'deadline_date'. The date format parameter is set to 'EEEE', here is an explanation of this date format string:

'E' is the pattern letter used for the representation of the name of the day of the week. As explained in the description, if the number of pattern letters is 4 or more, the full form is used. Thus 'EEEE' is used for representing the day of the week in full form e.g. Monday, Tuesday etc. Thus the date attribute is changed to a nominal attribute which has only name of days as possible values. Please note that this date format string is used for specifying the format of the nominal values of the new nominal attribute of the input ExampleSet.