Categories

Versions

You are viewing the RapidMiner Hub documentation for version 10.2 - Check here for latest version

Job Archive

Job Cleanup only considers archived jobs.

RapidMiner AI Hub uses the Job Archive feature to keep the working database tables small. Past jobs which are in a final state like FINISHED, ERROR, STOPPED or TIMEDOUT are automatically moved to dedicated archive database tables. In addition to the final state deciding if a job is being archived, the last update of a job is the second factor for it (see JOBSERVICE_SCHEDULED_ARCHIVE_MAX_AGE property).

By default, the Executions webinterface page will only show jobs which are pending, being executed or have not been archived yet.

To view archived jobs, press the Show archived executions checkbox in the web interface.

The following image shows four jobs in the Job Archive, all of them being stopped or finished successfully. Archived jobs can only be viewed and not stopped again.

To change the behavior when and how frequently jobs are being archived, the following environment variables can be set: 1. JOBSERVICE_SCHEDULED_ARCHIVE_ENABLED: A boolean (true or false) value to determine if the Job Archive is enabled or disabled. By default, the Job Archive feature is enabled.

  1. JOBSERVICE_SCHEDULED_ARCHIVE_CRON_EXPRESSION: This property defines the point in time when the archiving will be executed with the help of a cron expression. By default, the archive task is configured to run every 5 minutes. It follows the cron pattern <second> <minute> <hour> <day> <month> <weekday>. So 0 */30 * * * * would run the job archive task every 30 minute whereas 0 0 0 * * * would run it daily.

  2. JOBSERVICE_SCHEDULED_ARCHIVE_MAX_AGE: This property defines the maximum age of potential candidates for jobs to be archived in minutes in addition to the job being in a final state. By default, this value is set to 5 minutes. Set this to any arbitrary number greater than zero. Please note that the property value needs to reflect the execution time of the task itself (see JOBSERVICE_SCHEDULED_ARCHIVE_CRON_EXPRESSION). If you like to archive all jobs which are older than 2 minutes, you also need to run the task at least every 2 minutes.