Python: How to atomically hardlink files for deduplication? -
i have application deduplicates contents of 2 directories using hardlinks. i'd protect code crashes, such traceback never result in loss of data.
currently, code looks this:
import os import tempfile path_a, path_b in paths_to_dedupe: handle, tmp = tempfile.mkstemp(dir=os.path.dirname(path_a)) handle.close() os.remove(tmp) os.link(path_a, tmp) os.rename(tmp, path_b)
the problem code call os.remove() -- makes call mkstemp no longer entirely safe (because tempfile of same name created, unlikely).
is there safer way this?
i considered copying , modifying tempfile._mkstemp_inner()
runs _os.link
rather _os.open
.
the simplest method loop until os.link
succeeds:
while true: handle, tmp = tempfile.mkstemp(…) os.close(handle) os.remove(tmp) try: os.link(path_a, tmp) except oserror e: if e.errno == 17: # file exists continue raise break
and in case, there's no real benefit using tempfile
(ie, because don't need end open file), might yourself:
while true: tmp = path_b + "-temp-%s" %(os.urandom(8).encode("hex"), ) try: os.link(path_a, tmp) except oserror e: if e.errno == 17: # file exists continue raise break
and, of course, you'll want throw try/finally
around whole thing make sure don't accidentally leave temp file lying around if explode before moving place :)
Comments
Post a Comment