Archivematica DIP export
General
This module serves the export of content that has already been ingested in archivematica for further processing of a copy of the archived material.
Module logic
DIP Export is dependent on the archivematica SIP ID saved during execution of the archivematia_(parallel_)_ingest module.
As the standard DIP export does not keep the folder structure intact, the archivematica automation tools are used for DIP export instead. These are installed on the virtual machine that also runs the archivematica instance and are accessed via SSH using the fabric library. There, a shell script is executed that controls the automation tools. For the automation tools to directly store the DIP, the processing network storage is mounted on the Archivematica virtual machine. The standard output is delivered back to the module.
Processing steps
Read SIP ID
First, the module will extract the SIP ID from the JSON file written by the archivematica ingest module. This is done via the function tasks.retrieve_sip_uid() that needs the temporary processing module for the current article as parameter. The file name where the SIP is stored is taken from the constants.py
Create DIP folder
Subsequently, using the function tasks.create_dip_dir(), a subdirectory in the temporary processing folder will be created where the automation tools can save the exported dip.
Start the automation tools
The command to start the automation tools are delivered to the archivematica virtual machine by the tasks.export_dip() function. SIP ID and xport target folder are among the parameters delivered to the automation tools (in addition to credentials for the archivematica storage service).
The task for the export function is started via the task.submit method of prefect tasks. This is done to obtain a prefect future that can be checked after some time to see whether the task was completed successfully. This check is performed after 180 seconds if the export has not been finished at that point anyway using the future.result() method. An unfinished export will lead to a Timeout error. In that case, the ssh connection is closed and reopened leading to interruption of the automation tools.
The reason for this approach is that the automation tools sometimes stop reacting without successfully exporting the DIP
Write harvesting logs
The module will add information about the newly created DIP (Input: SIP UUID) to the history.csv as processing logs.
Finish folder processing
Finally, the temp folder will be renamed for processing by the next module