jemimah wrote:Here is some code from the snapmerge script. Adding more layers makes it slower because each whiteout file needs to be checked on each layer. This script is already painfully slow and the main reason I don't want to add more layers,
I supposed it's worth experimenting and seeing what how much difference it makes.
Code: Select all
while read N do BN="`basename "$N"`" DN="`dirname "$N"`" [ "$BN" = ".wh.aufs" ] && continue #w003 aufs has file .wh..wh.aufs in /initrd/pup_rw. [ "$DN" = "." ] && continue if [ "$BN" = "__dir_opaque" ];then #w003 #'.wh.__dir_opaque' marks ignore all contents in lower layers... rm -rf "${BASE}/${DN}/*" 2>/dev/null #wipe anything in save layer. #also need to save the whiteout file to block all lower layers (may be readonly)... touch "${BASE}/${DN}/.wh.__dir_opaque" 2>/dev/null rm -f "$SNAP/$DN/.wh.__dir_opaque" #should force aufs layer "reval". continue fi #comes in here with the '.wh.' prefix stripped off, leaving actual filename... rm -rf "$BASE/$N" #if file exists on a lower layer, have to save the whiteout file... BLOCKFILE="" [ -e "/initrd/pup_ro1/$N" ] && BLOCKFILE="yes" [ -e "/initrd/pup_ro2/$N" ] && BLOCKFILE="yes" [ -e "/initrd/pup_ro3/$N" ] && BLOCKFILE="yes" [ -e "/initrd/pup_ro4/$N" ] && BLOCKFILE="yes" [ -e "/initrd/pup_ro5/$N" ] && BLOCKFILE="yes" [ -e "/initrd/pup_ro6/$N" ] && BLOCKFILE="yes" [ -e "/initrd/pup_ro7/$N" ] && BLOCKFILE="yes" #v424 [ -e "/initrd/pup_ro8/$N" ] && BLOCKFILE="yes" #v424 [ -e "/initrd/pup_ro9/$N" ] && BLOCKFILE="yes" #v424 [ "$BLOCKFILE" = "yes" ] && touch "${BASE}/${DN}/.wh.${BN}" rm -f "$SNAP/$DN/.wh.$BN" #remove whiteout file. should force aufs layer "reval". done
shinobar wrote:Yes, jemimah. It is so slow.jemimah wrote:Here is some code from the snapmerge script. Adding more layers makes it slower because each whiteout file needs to be checked on each layer. This script is already painfully slow
I wonder why we need to check them. Why not unconditionally copy all the file in pup_rw...?Code: Select all
#also need to save the whiteout file to block all lower layers
I also wonder what the rc.update does...?
jemimah wrote:Say I create a new file then delete it and a white out file gets saved to the save file. Then later I add an sfs containing a file of the same name. The file will not appear because the white out file is there blocking it. I believe there is code in the init script to check for this condition and delete the interefering white outs when you add an SFS, but I know from experience that even that doesn't always work.shinobar wrote:I wonder why we need to check them. Why not unconditionally copy all the file in pup_rw...?Code: Select all
#also need to save the whiteout file to block all lower layers
I also wonder what the rc.update does...?
But that's an interesting thought - maybe the whiteout checking code in snapmerge is redundant and can be removed. However, It may be an error condition in AUFS to have a whiteout file with no file below it. I know for sure UnionFS is really picky about that, but I think AUFS is more tolerant.
However, I think the real bottleneck in the script is checking for free space in the save file for every single file copied down. That could be omitted in the case where your save file has more free space than the size of the files in RAM - but otherwise I think you need to do it.
jamesbond wrote:shinobar, sorry to hijack your thread. I'll move off your lawn very quickly after this.
Been thinking about it too ... I'm comparing the situation that requires snapmergepuppy and the one where /pup_rw is mounted directly on pupsave file. In this case, no management of whiteout files is done (as shinobar said) - and yet things will work correctly.
In the specific PUPMODE where merge script is required, these are the conditions:
a) there are, effectively, two pupsave files - the tmpfs layer, and the real pupsave (mounted ro by aufs)
b) we want to create the impression that this two pupsaves work as one
c) we don't want to duplicate items from pupsave to tmpfs
d) optionally, tmpfs and pupsave is allowed to have different size
a) & b) is rather easy to accomplish, it's c) & d) which causes the most headache and the need for merge script. Actually, c) is also the cause of problem if your real pupsave file is almost full, yet the tmpfs is empty (ie fresh boot). One can keep adding things without knowing that one cannot save the stuff anymore. Kinda like vmware thin provisioning, but without enough backing storage
If it's only a) & b) - easy - just load pupsave to tmpfs at start, and then rsync everything to pupsave during shutdown (or during merge). The real pupsave don't even need to be part of the branch.
But we need to do c) and d) since that's the agreed design criteria for now. Based on the above, I think the only check needed is as follows, for a combination of a "real file" and its corresponding whiteout file:
1. whiteout file exists in tmpfs, real file exists in pupsave
Cause ==> the file has just been deleted during user session.
Action ==> delete real file in pupsave & create the whiteout file (to prevent any file from lower layer getting exposed).
Then delete the whiteout in tmpfs.
2. whiteout file exist in tmpfs, real file doesn't exist in pupsave
Cause ==> whiteout is for a file in lower layer
Action ==> create whiteout file in pupsave
Then delete the whiteout in tmpfs.
3. real file exists in tmpfs, whiteout exist in pupsave
Cause ==> new file created over previously deleted file (from previous session)
Action ==> copy file from tmpfs to pupsave, and delete whiteout in pupsave
Then delete the real file in tmpfs.
4. real file exists in tmpfs, whiteout doesn't exist in pupsave
Cause ==> new file created in this session
Action ==> copy file from tmpfs to pupsave,
Then delete the real file in tmpfs.
5. real file exist in tmpfs, real file also exist in pupsave
Cause ==> file is updated in this session
Action ==> copy file from tmpfs to pupsave,
Then delete the real file in tmpfs.
Of course when I say "file" it also applies to directories.
I think that should handle 90% of the cases. We skip corner cases of "we only save the whiteout files only if the lower layer SFS have the real files" - I don't really see why this is necessary.
If the slowness comes from checking all those files in the SFS layers, then by dealing only with tmpfs and pupsave, this delay should be greatly reduced. If it's not, then the above may not help. In fact, I'm doubting the need to have c) and d) in the first place ... I mean, you have that very important big file you need to save, you can always save it in /mnt/home (ie the real storage).
Ok, I'm off - jemimah we can start another thread on this if you want to.
Shinobar, thanks for the update, I'll test it and get back to you.