Python: How to atomically hardlink files for deduplication? -


i have application deduplicates contents of 2 directories using hardlinks. i'd protect code crashes, such traceback never result in loss of data.

currently, code looks this:

import os import tempfile  path_a, path_b in paths_to_dedupe:     handle, tmp = tempfile.mkstemp(dir=os.path.dirname(path_a))     handle.close()     os.remove(tmp)     os.link(path_a, tmp)     os.rename(tmp, path_b) 

the problem code call os.remove() -- makes call mkstemp no longer entirely safe (because tempfile of same name created, unlikely).

is there safer way this?

i considered copying , modifying tempfile._mkstemp_inner() runs _os.link rather _os.open.

the simplest method loop until os.link succeeds:

while true:     handle, tmp = tempfile.mkstemp(…)     os.close(handle)     os.remove(tmp)     try:         os.link(path_a, tmp)     except oserror e:        if e.errno == 17: # file exists           continue        raise     break 

and in case, there's no real benefit using tempfile (ie, because don't need end open file), might yourself:

while true:     tmp = path_b + "-temp-%s" %(os.urandom(8).encode("hex"), )     try:         os.link(path_a, tmp)     except oserror e:        if e.errno == 17: # file exists           continue        raise     break 

and, of course, you'll want throw try/finally around whole thing make sure don't accidentally leave temp file lying around if explode before moving place :)


Comments

Popular posts from this blog

c# - Better 64-bit byte array hash -

webrtc - Which ICE candidate am I using and why? -

php - Zend Framework / Skeleton-Application / Composer install issue -