Categories

Versions

Loop Azure Data Lake Storage Gen2 (Cloud Connectivity)

Synopsis

This operator loops over all files in the specified folder of your Microsoft Azure Data Lake Storage Gen2.

Description

After you have configured your Azure Data Lake Storage Gen2 account, you can process all Azure Data Lake Storage files within the selected folder.

Be aware that the operator cannot read the file as example set. For this reason, you must connect the file input in the inner process of this operator to another appropriate operator to process the file. For example, if you want to load Excel files from your Azure Data Lake Storage folder, you must connect the file input in the inner process with the Read Excel operator.

Input

  • in (IOObject)

    Optional input data which is delivered to the inner process.

  • connection (Connection)

    This input port expects a Connection object if any. See the parameter connection entry for more information.

Output

  • out (IOObject)

    Output data of the inner process.

  • connection (Connection)

    This output port delivers the Connection object from the input port. If the input port is not connected the port delivers nothing.

Parameters

  • connection source This parameter indicates how the connection should be specified. It gives you two options, predefined and repository. The parameter is not visible if the connection input port is connected.
  • connection entry This parameter is only available when the connection source parameter is set to repository. This parameter is used to specify a repository location that represents a connection entry. The connection can also be provided using the connection input port.
  • connection This parameter is only available when the connection source parameter is set to predefined. The connection details for the Azure Data Lake Storage Gen2 connection have to be specified. If you have already configured an Azure Data Lake Storage Gen2 connection, you can select it from the drop-down list. If you have not configured an Azure Data Lake Storage yet, select the icon to the right of the drop-down list. Create a new Azure Data Lake Storage Gen2 connection in the Manage connections box. The account name and account key are required.
  • folder Provide the name of the Azure Data Lake Storage folder over which you want to loop. Note that you need Read and Execute permissions on the root directory to be able to list its content.
  • filter Optional filter via a regular expression which is used to exclude files from looping over them, e.g. 'a.*b' for all files starting with 'a' and ending with 'b'. Ignored if empty.
  • filtered stringIndicates which part of the file name is matched against the filter expression.
    • file_name: Filtered on the name, e.g. 'myfile.txt'
    • full_path: Filtered on the full path, e.g. '/myfolder/myfile.txt'
    • parent_path: Filtered on the parent folder, e.g. 'myfolder/'
  • file name macro The name of the macro which will contain the name of the current file for each file the loop iterates over, e.g. 'myfile.txt'
  • file path macro The name of the macro which will contain the full path of the current file for each file the loop iterates over, e.g. e.g. '/myfolder1/myfolder2/myfile.txt'
  • parent path macro The name of the macro which will contain the parent folder of the current file for each file the loop iterates over, e.g. e.g. '/myfolder1/myfolder2/'
  • recursive If selected, the loop will also iterate over all files in all subfolders of the selected folder. Otherwise, it will only iterate over the files in the selected folder.