Get Pages (Web Mining)


Gets pages from URLs in an attribute and stores them into a new attribute.


This operator retrieves pages, whose URLs are contained in the input data set. For each row in the data set, the URL is extracted from the specified attribute. A GET request is sent and a page is acquired. This page is stored in a new attribute specified by the parameter page attribute.


  • Example Set (Data Table)

    The Example Set port.


  • Example Set (Data Table)

    The Example Set port.


  • link_attributeThe attribute that contains the URLs. Range:
  • page_attributeThe name of the attribute that should contain the pages. Range:
  • random_user_agentChoose a user agent randomly from a set of 7000 user agents Range:
  • user_agentThe user agent property. Range:
  • connection_timeoutThe timeout (in ms) for the connection. Range:
  • read_timeoutThe timeout (in ms) for reading from the URL. Range:
  • follow_redirectsSpecifies, whether redirects should be followed. Range:
  • accept_cookiesSpecifies, whether cookies should be accepted. Range:
  • cookie_scopeSpecifies the scope of the cookies used Range:
  • request_methodSpecifies the request method. Range:
  • delaySpecifies whether execution should not be delayed, delayed by a fixed or random amount of time. Range:
  • delay_amountThe delay amount in ms. Range:
  • min_delay_amountThe minimum delay amount in ms. Range:
  • max_delay_amountThe maximum delay amount in ms. Range: