You are viewing the RapidMiner Studio documentation for version 9.6 - Check here for latest version
Using the Sensor Link extension
The Sensor Link extension connects RapidMiner to the OSIsoft PI System, allowing easy extraction of operational data and updating data points using RapidMiner processes.
Sensor Link utilizes the PI Web API and is compatible with API versions 2017 R2 and newer.
Please note that OSIsoft requires a PI System Access (PSA) license for using programmatic APIs such as the PI Web API. In particular, the availability of the API, e.g., as part of a PI Vision installation, does not guarantee a PSA license is available.
Install the Sensor Link extension
To install the extension, go to the Extensions menu, open the Marketplace (Updates and Extensions), and search for Sensor Link. For more detail, see Adding extensions.
Connect to the PI Web API
Sensor Link makes use of RapidMiner’s connection framework. This allows managing connections centrally and to reuse connections between operators. The extension supports both HTTP Basic Authentication (username and password) and Windows Authentication using the current Windows user (Kerberos, NTLM). You can create a new connection from the Connections menu:
For any connection you will need to specify the PI web API endpoint and the default root path to use. The root path should be the name of a data server. It can be overridden in the operator parameters if required.
HTTP Basic Authentication
The following connection is an example using HTTP Basic Authentication. It connects RapidMiner to OSIsoft’s public test system:
Windows authentication
To connect to an instance that requires Windows Authentication (Kerberos), select Windows SSO (Kerberos/NTLM) as authentication method:
When using this authentication method, Sensor Link will use the current Windows user for authentication.
SSL settings and troubleshooting
By default, Sensor Link only trusts secured connections if RapidMiner recognizes the certificate used by the endpoint, and if the certificate was issued for the hostname of the endpoint.
If a connection to an internal system fails with an SSL error, this is most likely due to one of these two requirements not being met. In this case, we strongly recommend adding the certificate to RapidMiner Studio first. For more details, see Trust a self-signed SSL certificate.
Alternatively, you can configure the connection to trust any self-signed certificate (less secure). Furthermore, the setting Verify host can be deactivated to trust certificates with mismatching host names.
Examples
You can find the operators Compressed Data, Current Values, Sample Data, Calculate Data, and Publish Data by searching for PI in the Operators panel:
For all operators you can specify the connection either by connecting the input port or by selecting the connection in the corresponding parameter (only visible if no input is connected). For most operators the only other mandatory parameter is the first data item or an expression (performance equation). By default, the PI Web API will answer with data for the last 24 hours.
Calculate Data
The following example queries 5 minute averages for the data points BA:CONC.1 and BA:TEMP.1 (specified under additional data items). The start and end time parameters use the relative expressions Y and T for yesterday and today respectively. For an overview of supported time strings for start and end times, please refer to the Web API documentation.
This query results in a data table similar to this:
Please note that Calculate Data only provides limited support for non-numeric data points. Only the calculation methods count and percent good can be used in combination with such points.
Sample Data
This example samples the data points BA:CONC.1, BA:TEMP.1, and BA:ACTIVE.1 (specified under additional data items) at 10 minute intervals. Samples are taken by interpolating between the nearest two recorded values. This time, we use absolute start and end times:
Sample Data supports both numeric and non-numeric data points. The operator will look up the correct type automatically and map it to the corresponding RapidMiner attribute types (real, integer, and polynominal).
This query results in the following data table:
Compressed Data
The Compressed Data operator can be used to retrieve raw recorded data. The PI Data Archive might compress the recorded data over time, e.g., it removes data points that have little importance for interpolation. Thus, the name of the operator. Its interface is similar to the other operators:
However, its output differs in that the timestamps are not necessarily equidistant. Furthermore, when retrieving data for multiple data points it is not guaranteed that the same timestamps are returned for the different points. The operator handles this by performing an outer join on the timestamps and leaving missing cells empty (displayed as ‘?’):
Current Values
The Current Values operator is similar to the Compressed Data operator in that it returns raw recorded data. But as its name suggest, it only returns the most recent value for each data point:
However, when querying multiple data points at the same time, we might still end up with multiple rows due to mismatching timestamps:
Time filtering with performance equations
The connector does not implement the Time Filtered function known from PI Data Link. However, it is possible to calculate the amount of time a performance equation evaluates to true using the equivalent expressions within the performance equation itself.
The example above evaluates the expression TimeLt('BA:TEMP.1', '*-1h', '*', 10) once every full hour. The expression itself computes the amount of time the recorded temperature BA:TEMP.1 was below 10 degrees in the last hour (in seconds):
Filter expressions
All data retrieval operators support filter expressions. Let’s revisit the example of the Sample Data operator. If we only want to retrieve data for rows where BA:ACTIVE.1 indicates an active state, we can do so by using the filter 'BA:ACTIVE.1' = "Active":
Please note that after filtering the returned data is no longer guaranteed to be equidistant. For example, there is a 20min gap between rows 2 and 3 because we dropped an inactive row in between (see the original table in the Sample Data example):