1

I have a large file system in which I have to delete certain directories from time to time. Currently I have a script which amongst other things, deletes a folder and subsequently generates an email notification. However, as the deletion of a directory can take anything from a few seconds to a few days, I would like to do this asychronously.

I can cook up a solution by say, generating little snippets like rm -rf /some/directory in the appropriate cron directory, but that might get clogged if a large number of large directories need to be deleted.

Is anyone aware of a better solution?

2 Answers 2

0

What is slowing down your deletion is not the file removal by itself (as such operations are batched in the journal and committed to the main filesystem in large chunks, so they already are async in a sense), rather the sync reads needed to discover what to delete. In other words, is the metadata traversal needed to list all the inodes to be deleted that commands the biggest hit - by far. There is no real escaping from that, unfortunately.

Some things you can do:

  • use a fast cache device to cache as many metadata as possible
  • use disposable volumes/filesystem, where "delete many files" becomes "simply discard the entire volume or filesystem"
  • schedule partial, progressive deletion via cron or similar tools

For more info about delete performance and other things which slow down file removal, you can read this answer.

1
  • The directories I want to delete are actually the home and scratch (on GPFS and Lustre, respectively) directories of former users of an HPC system. I don't have much latitude to tweak the basic configuration, but I am happy to just deal with the problem at the level of directories. I don't really care that deletion will take a long time, I just don't want it to delay the script which performs the other housekeeping activities associated with removing a user. I guess I'll just generate some sort of list of directories which can then be removed by a cron job.
    – loris
    May 17 at 12:28
0

Deleting a folder should be nearly instantaneous. It is searching the directory tree and deleting multiple files and directories which is likely the issue.

that might get clogged

I don't know what you mean by this.

If you worry that execution of a single instance may overlap with the subsequent execution, then why is that an issue? If there is a valid for reason for ensuring exclusivity of instances, then use a lock file or limit the run time with timeout.

1
  • Yes, I am deleting large directory trees. By clogging I mean that if deletions take longer than the cron interval, the number of deletion processes running could increase in an uncontrolled manner. I'd probably want a mechanism to limit that.
    – loris
    May 17 at 12:31

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .