Categories

Versions

You are viewing the RapidMiner Studio documentation for version 9.2 - Check here for latest version

Nominal to Date (RapidMiner Studio Core)

Synopsis

This operator converts the selected nominal attribute into the selected date time type. The nominal values are transformed into date and/or time values. This conversion is done with respect to the specified date format string.

Description

The Nominal to Date operator converts the selected nominal attribute of the input ExampleSet into the selected date and/or time type. The attribute is selected by the attribute name parameter. The type of the resultant date and/or time attribute is specified by the date type parameter. The nominal values are transformed into date and/or time values. This conversion is done with respect to the specified date format string that is specified by the date format parameter. The old nominal attribute will be removed and replaced by a new date and/or time attribute if the keep old attribute parameter is not set to true.

Date and Time Patterns

This section explains the date and time patterns. Understanding of date and time patterns is necessary specially for specifying the date format string in the date format parameter. Within date and time pattern strings, unquoted letters from 'A' to 'Z' and from 'a' to 'z' are interpreted as pattern letters that represent the components of a date or time. Text can be quoted using single quotes (') to avoid interpretation as date or time components. All other characters are not interpreted as date or time components; they are simply matched against the input string during parsing.

Here is a brief description of the defined pattern letters. The format types like 'Text', 'Number', 'Year', 'Month' etc are described in detail after this section.

  • G: This pattern letter is the era designator. For example: AD, BC etc. This pattern letter follows the rules of 'Text' format type.
  • y: This pattern letter represents year. yy represents year in two digits e.g. 96 and yyyy represents year in four digits e.g. 1996. This pattern letter follows the rules of the 'Year' format type.
  • M: This pattern letter represents the month of the year. This pattern letter follows the rules of the 'Month' format type. Month can be represented as; for example; March, Mar or 03 etc.
  • w: This pattern letter represents the week number of the year. This pattern letter follows the rules of the 'Number' format type. For example, the first week of January can be represented as 01 and the last week of December can be represented as 52.
  • W: This pattern letter represents the week number of the month. This pattern letter follows the rules of the 'Number' format type. For example, the first week of January can be represented as 01 and the forth week of December can be represented as 04.
  • D: This pattern letter represents the day number of the year. This pattern letter follows the rules of the 'Number' format type. For example, the first day of January can be represented as 01 and last day of December can be represented as 365 (or 366 in case of a leap year).
  • d: This pattern letter represents the day number of the month. This pattern letter follows the rules of the 'Number' format type. For example, the first day of January can be represented as 01 and the last day of December can be represented as 31.
  • F: This pattern letter represents the day number of the week. This pattern letter follows the rules of the 'Number' format type.
  • E: This pattern letter represents the name of the day of the week. This pattern letter follows the rules of the 'Text' format type. For example, Tuesday or Tue etc.
  • a: This pattern letter represents the AM/PM portion of the 12-hour clock. This pattern letter follows the rules of the 'Text' format type.
  • H: This pattern letter represents the hour of the day (from 0 to 23). This pattern letter follows the rules of the 'Number' format type.
  • k: This pattern letter represents the hour of the day (from 1 to 24). This pattern letter follows the rules of the 'Number' format type.
  • K: This pattern letter represents the hour of the day for 12-hour clock (from 0 to 11). This pattern letter follows the rules of the 'Number' format type.
  • h: This pattern letter represents the hour of the day for 12-hour clock (from 1 to 12). This pattern letter follows the rules of the 'Number' format type.
  • m: This pattern letter represents the minutes of the hour (from 0 to 59). This pattern letter follows the rules of the 'Number' format type.
  • s: This pattern letter represents the seconds of the minute (from 0 to 59). This pattern letter follows the rules of the 'Number' format type.
  • S: This pattern letter represents the milliseconds of the second (from 0 to 999). This pattern letter follows the rules of the 'Number' format type.
  • z: This pattern letter represents the time zone. This pattern letter follows the rules of the 'General Time Zone' format type. Examples include Pacific Standard Time, PST, GMT-08:00 etc.
  • Z: This pattern letter represents the time zone. This pattern letter follows the rules of the 'RFC 822 Time Zone' format type. Examples include -08:00 etc.

Please note that all other characters from 'A' to 'Z' and from 'a' to 'z' are reserved. Pattern letters are usually repeated, as their number determines the exact presentation. Here is the explanation of various format types:

  • Text: For formatting, if the number of pattern letters is 4 or more, the full form is used; otherwise a short or abbreviated form is used (if available). For parsing, both forms are acceptable independent of the number of pattern letters.
  • Number: For formatting, the number of pattern letters is the minimum number of digits. The numbers that are shorter than this minimum number of digits are zero-padded to this amount. For example if the minimum number of digits is 3 then the number 5 will be changed to 005. For parsing, the number of pattern letters is ignored unless it is needed to separate two adjacent fields.
  • Year: If the underlying calendar is the Gregorian calendar, the following rules are applied: For formatting, if the number of pattern letters is 2, the year is truncated to 2 digits; otherwise it is interpreted as a 'Number' format type. For parsing, if the number of pattern letters is more than 2, the year is interpreted literally, regardless of the number of digits. So using the pattern 'MM/dd/yyyy', the string '01/11/12' parses to 'Jan 11, 12 A.D'. For parsing with the abbreviated year pattern ('y' or 'yy'), this operator must interpret the abbreviated year relative to some century. It does this by adjusting dates to be within 80 years before and 20 years after the time the operator is created. For example, using a pattern of 'MM/dd/yy' and the operator created on Jan 1, 1997, the string '01/11/12' would be interpreted as Jan 11, 2012 while the string '05/04/64' would be interpreted as May 4, 1964. During parsing, only strings consisting of exactly two digits will be parsed into the default century. Any other numeric string, such as a one digit string, a three or more digit string, or a two digit string that is not all digits (for example, '-1'), is interpreted literally. So '01/02/3' or '01/02/003' are parsed, using the same pattern, as 'Jan 2, 3 AD'. Likewise, '01/02/-3' is parsed as 'Jan 2, 4 BC'. Otherwise, if the underlying calendar is not the Gregorian calendar, calendar system specific forms are applied. If the number of pattern letters is 4 or more, a calendar specific long form is used. Otherwise, a calendar short or abbreviated form is used.
  • Month: If the number of pattern letters is 3 or more, the month is interpreted as 'Text' format type otherwise, it is interpreted as a 'Number' format type.
  • General time zone: Time zones are interpreted as 'Text' format type if they have names. It is possible to define time zones by representing a GMT offset value. RFC 822 time zones are also acceptable.
  • RFC 822 time zone: For formatting, the RFC 822 4-digit time zone format is used. General time zones are also acceptable.
This operator also supports localized date and time pattern strings by defining the locale parameter. In these strings, the pattern letters described above may be replaced with other, locale-dependent pattern letters.

The following examples show how date and time patterns are interpreted in the U.S. locale. The given date and time are 2001-07-04 12:08:56 local time in the U.S. Pacific Time time zone.

  • 'yyyy.MM.dd G 'at' HH:mm:ss z': 2001.07.04 AD at 12:08:56 PDT
  • 'EEE, MMM d, yy': Wed, Jul 4, '01
  • 'h:mm a': 12:08 PM
  • 'hh 'oclock' a, zzzz': 12 oclock PM, Pacific Daylight Time
  • 'K:mm a, z': 0:08 PM, PDT
  • 'yyyy.MMMMM.dd GGG hh:mm aaa': 2001.July.04 AD 12:08 PM
  • 'EEE, d MMM yyyy HH:mm:ss Z': Wed, 4 Jul 2001 12:08:56 -0700
  • 'yyMMddHHmmssZ': 010704120856-0700
  • 'yyyy-MM-dd'T'HH:mm:ss.SSSZ': 2001-07-04T12:08:56.235-0700

Input

  • example set (IOObject)

    This input port expects an ExampleSet. It is the output of the Retrieve operator in the attached Example Process. The output of other operators can also be used as input. It is essential that meta data should be attached with the data for the input because attributes are specified in their meta data. The ExampleSet should have at least one nominal attribute because if there is no such attribute, the use of this operator does not make sense.

Output

  • example set (IOObject)

    The selected nominal attribute is converted to date type and the resultant ExampleSet is delivered through this port.

  • original (IOObject)

    The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

Parameters

  • attribute_nameThe name of the nominal attribute that is to be converted to date type is specified here. Range: string
  • date_typeThis parameter specifies the type of the resultant attribute.
    • date: If the date type parameter is set to 'date', the resultant attribute will be of date type. The time portion (if any) of the nominal attribute will be ignored.
    • time: If the date type parameter is set to 'time', the resultant attribute will be of time type. The date portion (if any) of the nominal attribute will be ignored.
    • date_time: If the date type parameter is set to 'date_time', the resultant attribute will be of date_time type.
    Range: selection
  • date_formatThis is the most important parameter of this operator. It specifies the date time format of the selected nominal attribute. Date format strings are discussed in detail in the description of this operator. Range:
  • time_zoneThis is an expert parameter. A long list of time zones is provided; users can select any of them. Range: selection
  • localeThis is an expert parameter. A long list of locales is provided; users can select any of them. Range: selection
  • keep_old_attributeThis parameter indicates if the original nominal attribute should be kept or if it should be discarded. Range: boolean

Tutorial Processes

Introduction to the Nominal to Date operator

This Example Process starts with a subprocess. The subprocess delivers an ExampleSet with just a single attribute. The name of the attribute is 'deadline_date'. The type of the attribute is nominal. A breakpoint is inserted here so that you can view the ExampleSet. As you can see, all the examples of this attribute have both date and time information. The Nominal to Date operator is applied on this ExampleSet to change the type of the 'deadline_date' attribute from nominal to date type. Have a look at the parameters of the Nominal to Date operator. The attribute name parameter is set to 'deadline_date'. The date type parameter is set to 'date'. Thus the 'deadline_date' attribute will be converted from nominal to date type (not date_time) therefore the time portion of the value will not be available in the resultant attribute. The date format parameter is set to 'EEEE, MMMM d, yyyy h:m:s a z', here is an explanation of this date format string: 'E' is the pattern letter used for the representation of the name of the day of the week. As explained in the description, if the number of pattern letters is 4 or more, the full form is used. Thus 'EEEE' is used for representing the day of the week in full form e.g. Monday, Tuesday etc. 'M' is the pattern letter used for the representation of the name of the month of the year. As explained in the description, if the number of pattern letters is 4 or more, the full form is used. Thus 'MMMM' is used for representing the month of the year in full form e.g. January, February etc. 'y' is the pattern letter used for the representation of the year portion of the date. 'yyyy' represents year of date in four digits like 2011, 2012 etc. 'h' is the pattern letter used for the representation of the hour portion of the time. 'h' can represent multiple digit hours as well e.g. 10, 11 etc. The difference between 'hh' and 'h' is that 'hh' represents single digit hours by appending a 0 in start e.g. 01, 02 and so on. But 'h' represents single digits without any modifications e.g. 1, 2 and so on. 'm' is the pattern letter used for the representation of the minute portion of the time. 'm' can represent multiple digit minutes as well e.g. 51, 52 etc. The difference between 'mm' and 'm' is that 'mm' represents single digit minutes by appending a 0 in start e.g. 01, 02 and so on. But 'm' represents single digits without any modifications e.g. 1, 2 and so on. 's' is the pattern letter used for the representation of the second portion of the time. 's' can represent multiple digit seconds as well e.g. 40, 41 etc. The difference between 'ss' and 's' is that 'ss' represents single digit seconds by appending a 0 in start e.g. 01, 02 and so on. But 's' represents single digits without any modifications e.g. 1, 2 and so on. 'a' is the pattern letter used for the representation of the 'AM/PM' portion of the 12-hour date and time. 'z' is the pattern letter used for the representation of the time zone.

Please note that this date format string represents the date format of the nominal values of the selected nominal attribute of the input ExampleSet. The date format string helps RapidMiner to understand which portions of the nominal value represent which component of the date or time e.g. year, month etc.