Categories

Versions

Job Archive

Job Cleanup only considers archived jobs.

RapidMiner AI Hub uses the Job Archive feature to keep the working database tables small. Past jobs which are in a final state like FINISHED, ERROR, STOPPED or TIMEDOUT are automatically moved to dedicated archive database tables. In addition to the final state deciding if a job is being archived, the last update of a job is the second factor for it (see JOBSERVICE_SCHEDULED_ARCHIVE_MAX_AGE property).

By default, the Executions webinterface page will only show jobs which are pending, being executed or have not been archived yet.

img/job-archive-1.png

To view archived jobs, press the Show archived executions checkbox in the web interface.

img/job-archive-filter.png

The following image shows four jobs in the Job Archive, all of them being stopped or finished successfully. Archived jobs can only be viewed and not stopped again.

img/job-archive-2.png

To change the behavior when and how frequently jobs are being archived, the following environment variables can be set: 1. JOBSERVICE_SCHEDULED_ARCHIVE_ENABLED: A boolean (true or false) value to determine if the Job Archive is enabled or disabled. By default, the Job Archive feature is enabled.

  1. JOBSERVICE_SCHEDULED_ARCHIVE_CRON_EXPRESSION: This property defines the point in time when the archiving will be executed with the help of a cron expression. By default, the archive task is configured to run every 5 minutes. It follows the cron pattern <second> <minute> <hour> <day> <month> <weekday>. So 0 */30 * * * * would run the job archive task every 30 minute whereas 0 0 0 * * * would run it daily.

  2. JOBSERVICE_SCHEDULED_ARCHIVE_MAX_AGE: This property defines the maximum age of potential candidates for jobs to be archived in minutes in addition to the job being in a final state. By default, this value is set to 5 minutes. Set this to any arbitrary number greater than zero. Please note that the property value needs to reflect the execution time of the task itself (see JOBSERVICE_SCHEDULED_ARCHIVE_CRON_EXPRESSION). If you like to archive all jobs which are older than 2 minutes, you also need to run the task at least every 2 minutes.