Categories

Versions

Get Page (Web Mining)

Synopsis

Gets a page via HTTP.

Description

This operator sends a GET request via HTTP. The returned page is output as a document.

Output

  • output

    The output port.

Parameters

  • urlThe URL from which should be read.
  • random user agentChoose a user agent randomly from a set of 7000 user agents
  • user agentThe user agent property.
  • connection timeoutThe timeout (in ms) for the connection.
  • read timeoutThe timeout (in ms) for reading from the URL.
  • follow redirectsSpecifies, whether redirects should be followed.
  • accept cookiesSpecifies, whether cookies should be accepted.
  • cookie scopeSpecifies the scope of the cookies used
  • request methodSpecifies the request method.
  • query parametersThe query parameters as key/value pairs.
  • request propertiesWith this parameter you can define all properties that are sent with the HTTP request to match the needs of your webservice.
  • override encodingNormally, the encoding of the retrieved page is determined automatically. In some rare cases this does not work well or the server provides a wrong encoding string. In this case, you can enable this option to override the automatically detected encoding.
  • encodingThe encoding used for reading or writing files.
  • keep sensitive headersKeep "Authorization" and "Cookie" header during a redirect to a different domain or subdomain.