Job Archive

Job Archive feature is only available starting from RapidMiner AI Hub versions 9.10.4 and above.

Job Cleanup only considers archived jobs.

RapidMiner AI Hub uses the Job Archive feature to keep the working database tables small. Past jobs which are in a final state like FINISHED, ERROR, STOPPED or TIMEDOUT are automatically moved to dedicated archive database tables. In addition to the final state deciding if a job is being archived, the last update of a job is the second factor for it (see maxAge property).

By default, the Executions webinterface page will only show jobs which are pending, being executed or have not been archived yet.

To view archived jobs, press the Only archived executions checkbox in the web interface. The following image shows four jobs in the Job Archive, all of them being stopped or finished successfully. Archived jobs can only be viewed and not stopped again.

To change the behavior when and how frequently jobs are being archived, the following properties can be changed inside the file within the <rapidminer-home>/configuration folder of your RapidMiner Server home directory.

  1. jobservice.scheduled.archive.enabled: A boolean (true or false) value to determine if the Job Archive is enabled or disabled. By default, the Job Archive feature is enabled.

  2. jobservice.scheduled.archive.cronExpression: This property defines the point in time when the archiving will be executed with the help of a cron expression. By default, the archive task is configured to run every 5 minutes. It follows the cron pattern <second> <minute> <hour> <day> <month> <weekday>. So 0 */30 * * * * would run the job archive task every 30 minute whereas 0 0 0 * * * would run it daily.

  3. jobservice.scheduled.archive.maxAge: This property defines the maximum age of potential candidates for jobs to be archived in minutes in addition to the job being in a final state. By default, this value is set to 5 minutes. Set this to any arbitrary number greater than zero. Please note that the property value needs to reflect the execution time of the task itself (see cronExpression). If you like to archive all jobs which are older than 2 minutes, you also need to run the task at least every 2 minutes.