Speeding up the SnapMerge

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
User avatar
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO

#21 Post by technosaurus »

jamesbond wrote:s7 is much faster but doesn't cope very well when savefile is almost full (because it copy before delete)
mv -f ???

Note: I never use a save file - just trying to help where I can without my puppypc available - sorry for the suggestions without real code

When I look at it, I see the possibility to combine all 3 loops into 1 and recurse the tree only once using a function that calls itself for directories.

oversimplified version

Code: Select all

case $1 in
    for x in * do
    [ -d $x ] && dirstuff_moved_out_of_loop && this_function $1/$x & || do_other_stuff
something like this can do several directories at once in separate threads and does not need find because it uses builtin functions to check each file

but really that order is important, since the first check should really cover most cases so that further checks are unnecessary
the only easy way I can think of to check for .wh files faster is to think of them as an array of characters such that ${x:0:4} is equal to .wh. (cool eh?) so since this is a simple string comparison that does not have to access the file (slow) it should be first, then dir check (to start another thread quicker if necessary), then links (because it only needs 1 check)

dirstuff first
for x in ....
if .wh* do .wh stuff
else if dir recursively call this function
else if link do link stuff
else (must be a file) ... do file stuff

recursion always used to give me a headache, so let me know if you anything is not clear and I will try to clarify

EDIT: thinking about .wh. file issues the substrings could be used to check for real files ${x:4} would be the name of the real file such that
[ -e ${DIR}/${x:4} ] && echo file ${x:4} exists in ${DIR}
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].
Posts: 3433
Joined: Mon 26 Feb 2007, 05:02
Location: The Blue Marble

#22 Post by jamesbond »

technosaurus wrote:
jamesbond wrote:s7 is much faster but doesn't cope very well when savefile is almost full (because it copy before delete)
mv -f ???
No, because it's checking for different files in different layer.
the only easy way I can think of to check for .wh files faster is to think of them as an array of characters such that ${x:0:4} is equal to .wh. (cool eh?)
I did that - an again, pleasantly surprised it works both in bash and ash. I'm not sure I understand the rest of your point, though.

Got my s8 script I hinted at previous post. Speed is same as s7. I was about to post it here but I just note a big hole in my scripts - I forgot to treat opaque dir properly. I'm just so close :evil: But I'll be back ...
Fatdog64 forum links: [url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Latest version[/url] | [url=https://cutt.ly/ke8sn5H]Contributed packages[/url] | [url=https://cutt.ly/se8scrb]ISO builder[/url]
Posts: 3179
Joined: Sat 31 May 2008, 19:00

#23 Post by jpeps »

It's interesting watching the whiteout list grow. I erased them all after doing a pfix-clean this morning. I'll have to think about why I need them.


Posts: 3433
Joined: Mon 26 Feb 2007, 05:02
Location: The Blue Marble

#24 Post by jamesbond »

Tadaa ... s9 (version 9 of the script). Same performance as s7. Comment out the echo to reduce verbosity. This code is only for the copy-down only - I'm not sure what else snapmergepuppy does. Perhaps I'll try it out later - but meanwhile, anyone is welcome to try it.

Code: Select all

# jamesbond 2011 - GPLv3
# s9 - improved s8 with bugfix for dir opaque
# check whiteout from tmpfs and puprw, make room before doing rsync
# Note: works for AUFS only
#  0m6secs (lang utf8)

# change these two variables. Do not use trailing slash.
TMPFS=/mnt/layer1     # real location is /initrd/pup_rw
PUPSAVE=/mnt/layer2   # real location is /initrd/pup_ro1

################# main ###################
# check for new whiteouts - remove them from pupsave
echo "deleting newly deleted files"
find "$TMPFS" | sed '
# dont process .wh..wh.orph
/\.wh\.\.wh\.orph/ d
# dont process .wh..wh.plnk
/\.wh\.\.wh\.plnk/ d
# dont process .wh..wh.aufs
/\.wh\.\.wh\.aufs/ d
# dont process .wh..wh..opq
/\.wh\.\.wh\.\.opq/ d
# process whiteout files
/\.wh\./ p
# and delete anything else
d' | while read -r FILE; do
	#echo $FILE					# $FILE is TMPFS_WHITEOUT
	#echo $FULLNAME
	#echo $BASE
	#echo $LEAF
	#echo $BASE/$LEAF
	echo "Deleting $PUPSAVE_FILE"
	rm -rf "$PUPSAVE_FILE"		# delete the file/dir if it's there


# check for old whiteouts - remove them from pupsave
echo "deleting old whiteouts"
find "$PUPSAVE" | sed '
# dont process .wh..wh.orph
/\.wh\.\.wh\.orph/ d
# dont process .wh..wh.plnk
/\.wh\.\.wh\.plnk/ d
# dont process .wh..wh.aufs
/\.wh\.\.wh\.aufs/ d
# dont process .wh..wh..opq
/\.wh\.\.wh\.\.opq/ d
# process whiteout files
/\.wh\./ p
# and delete anything else
d' | while read -r FILE; do
	#echo $FULLNAME
	#echo $BASE
	#echo $LEAF
	#echo $BASE/$LEAF
	#echo $TMPFS_FILE

	# delete whiteout only if a new file/dir has been created in the tmpfs layer
	if [ -e "$TMPFS_FILE" -o -L "$TMPFS_FILE" ]; then
		# if TMPFS_FILE is a dir, we need to add diropq when remove its pupsave whiteout
		[ -d "$TMPFS_FILE" ] &&	touch "$TMPFS_FILE/.wh..wh..opq"
		echo Deleting whiteout $FILE
		rm -f "$FILE"

# by now we should be consistent - so rsync everything
# and cleanup tmpfs if rsync is successful
echo rsync-ing
if rsync -a "$TMPFS"/ "$PUPSAVE"; then
	find "$TMPFS" -maxdepth 1 | sed '
	# dont process the first line - thats our tmpfs mountpoint
	1 d
	# dont process .wh..wh.orph
	/\.wh\.\.wh\.orph/ d
	# dont process .wh..wh.plnk
	/\.wh\.\.wh\.plnk/ d
	# dont process .wh..wh.aufs
	/\.wh\.\.wh\.aufs/ d' | while read -r FILE; do
		rm -rf "$FILE"
	Xdialog --infobox "Your save file is full, please copy important items manually elsewhere." 0 0 10000
Fatdog64 forum links: [url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Latest version[/url] | [url=https://cutt.ly/ke8sn5H]Contributed packages[/url] | [url=https://cutt.ly/se8scrb]ISO builder[/url]
User avatar
Posts: 1105
Joined: Thu 11 Dec 2008, 19:49

#25 Post by Q5sys »

jamesbond wrote:Tadaa ... s9 (version 9 of the script). Same performance as s7. Comment out the echo to reduce verbosity. This code is only for the copy-down only - I'm not sure what else snapmergepuppy does. Perhaps I'll try it out later - but meanwhile, anyone is welcome to try it.
How much faster do you estimate this is over the default way of doing things? I'm quite impressed by everyones work in this thread.
Image for everyone. :P
User avatar
Posts: 4307
Joined: Wed 26 Aug 2009, 19:56
Location: Tampa, FL

#26 Post by jemimah »

I'm not sure if it's safe to delete items out of the RAM layer.

Barry has this

Code: Select all

#flock -x -n "$N" -c rm -f "$N" #remove if file not in use
But it's commented out so I guess it didn't work. Also it brings up the point that if the file is open, moving it, then deleting it may cause corruption.
Posts: 3433
Joined: Mon 26 Feb 2007, 05:02
Location: The Blue Marble

#27 Post by jamesbond »

If that's the case we can drop the "delete tmpfs" code and leave the rsync alone. But that won't free up the tmpfs even after copy-down - is this the expected behaviour?
Fatdog64 forum links: [url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Latest version[/url] | [url=https://cutt.ly/ke8sn5H]Contributed packages[/url] | [url=https://cutt.ly/se8scrb]ISO builder[/url]
User avatar
Posts: 4307
Joined: Wed 26 Aug 2009, 19:56
Location: Tampa, FL

#28 Post by jemimah »

jamesbond wrote:If that's the case we can drop the "delete tmpfs" code and leave the rsync alone. But that won't free up the tmpfs even after copy-down - is this the expected behaviour?
That's how the current script works. It would be neat if we did figure out how to remove those unused files. Maybe someone wants to ask Barry if he remembers what the issue was?

Maybe we could just run lsof and get the list that way.
User avatar
Posts: 4307
Joined: Wed 26 Aug 2009, 19:56
Location: Tampa, FL

#29 Post by jemimah »

I suppose there will always be some delay between checking if the file is open and actually deleting it, during which the file may become open. I guess that's what the "flock" is for.

Wikipedia says this about flock.
Both flock and fcntl have quirks that occasionally puzzle programmers more familiar with other operating systems.
Mandatory locks have no effect on the unlink function. As a result, certain programs may, effectively, circumvent mandatory locking. The authors of Advanced Programming in the UNIX Environment (Second Edition) observed that the ed editor did so (page 456).
Seems like a complicated problem.
Posts: 3433
Joined: Mon 26 Feb 2007, 05:02
Location: The Blue Marble

#30 Post by jamesbond »

In the usual situation deleting (=unlinking) the file / dir while it's being open is not an issue at all. The name will be deleted, new process can't see them, but old processes that have opened them can still use them using the old file handle. It's only when all processes holding the open handle have terminated, the space will be freed, and the file/dir will be finally removed. So in this case I don't see the need to use flock. I do read an article that using flock while unlinking defeats the very purpose of flock http://world.std.com/~swmcd/steven/tech/flock.html

That's only true for userspace apps, though, and since aufs is kernel mode apps ... I'm not sure how true it is. But I've been playing with deletion for a while (a very short while) - it doesn't seem to have any adverse effects on me. But perhaps because I'm not running it off my rootfs.

Using lsof won't help much - seems that in an ordinary day (=browsing etc) there are a lot of dirs being opened which means they can't be deleted. May as well don't do deletion, not worth the effort.
Fatdog64 forum links: [url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Latest version[/url] | [url=https://cutt.ly/ke8sn5H]Contributed packages[/url] | [url=https://cutt.ly/se8scrb]ISO builder[/url]
Posts: 3433
Joined: Mon 26 Feb 2007, 05:02
Location: The Blue Marble

#31 Post by jamesbond »

Q5sys wrote:
jamesbond wrote:Tadaa ... s9 (version 9 of the script). Same performance as s7. Comment out the echo to reduce verbosity. This code is only for the copy-down only - I'm not sure what else snapmergepuppy does. Perhaps I'll try it out later - but meanwhile, anyone is welcome to try it.
How much faster do you estimate this is over the default way of doing things? I'm quite impressed by everyones work in this thread.
Image for everyone. :P
Until this is really merged into a puplet for testing, no one can tell for sure, unfortunately. Benchmarks doesn't always translate into real-world performance :oops:
Fatdog64 forum links: [url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Latest version[/url] | [url=https://cutt.ly/ke8sn5H]Contributed packages[/url] | [url=https://cutt.ly/se8scrb]ISO builder[/url]
User avatar
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO

#32 Post by technosaurus »

Umm... are we trying to manually do what aubrsync does?
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].
Posts: 3433
Joined: Mon 26 Feb 2007, 05:02
Location: The Blue Marble

#33 Post by jamesbond »

technosaurus wrote:Umm... are we trying to manually do what aubrsync does?
Hahaha yes !!! Good find technosaurus :)

EDIT: Incidentally that script also use rsync ... so we're in the right track (when trying to re-invent the wheel, that is) :D
Fatdog64 forum links: [url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Latest version[/url] | [url=https://cutt.ly/ke8sn5H]Contributed packages[/url] | [url=https://cutt.ly/se8scrb]ISO builder[/url]
User avatar
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO

#34 Post by technosaurus »

I really have no clue how the save to cd/dvd parts work, but...
Seems like it could be sensible?
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].
User avatar
Posts: 4307
Joined: Wed 26 Aug 2009, 19:56
Location: Tampa, FL

#35 Post by jemimah »

Here is the actual script.
(3.2 KiB) Downloaded 491 times
User avatar
Posts: 4307
Joined: Wed 26 Aug 2009, 19:56
Location: Tampa, FL

#36 Post by jemimah »

Looking at comments in the code.
The dst_branch must be mounted as writable.
During the operation, the mntpnt is set readonly.
If you are opening a file for writing on the writable branch,
you need to close the file before invoking this script.
They do have a move option, but it has problems.
Like above (2 branches), move and reflect all modifications
from upper to lower. Almost all files on the upper branch will
be removed. You can still use this aufs after the
operation. But the inode number may be changed. If your
application which depends upon the inode number was running at
that time, it may not work correctly.
I get the feeling this is not going to be very transparent to the user. :)
Posts: 3433
Joined: Mon 26 Feb 2007, 05:02
Location: The Blue Marble

#37 Post by jamesbond »

Long way ahead ... but, I think, if one doesn't really try to deliberately break things (like removing /lib or /usr/sbin and then re-creating and re-populating them), it should be ok for most of them. Again, it's a statement that can only be proven by experiments ...
Fatdog64 forum links: [url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Latest version[/url] | [url=https://cutt.ly/ke8sn5H]Contributed packages[/url] | [url=https://cutt.ly/se8scrb]ISO builder[/url]
Posts: 3433
Joined: Mon 26 Feb 2007, 05:02
Location: The Blue Marble

#38 Post by jamesbond »

Took a jump and tried my script (replacing snapmergepuppy directly). Run the script - doesn't crash, but all other executables are gone (and freemem shows zero space available). But it doesn't crash (although reboot is impossible - power button is required).
After a reboot, I removed all the tmpfs-deletion stuff - and things work! (doesn't crash, system continue as normal, and data is saved, reboot works properly). I think I just need to be careful on what can and cannot be deleted, as jemimah said before.
I tried this on my netbook with harddisk, so the delay isn't noticeable. I should try this on my eeepc with the slow sd-card access - and then we can get a benchmark.
Fatdog64 forum links: [url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Latest version[/url] | [url=https://cutt.ly/ke8sn5H]Contributed packages[/url] | [url=https://cutt.ly/se8scrb]ISO builder[/url]
User avatar
Posts: 4307
Joined: Wed 26 Aug 2009, 19:56
Location: Tampa, FL

#39 Post by jemimah »

So the slow performance of the script is really only a problem on shutdown. However while you are running, it'd actually be better if the snapmerge doesn't hog your cpu.

It should be fine to use the aubrsync at shutdown as all files should be closed at that point anyway.

I've been stress testing Dougal's patch and it's about twice as fast. But I've discovered some bugs during my tests - deleted files coming back from the dead after a reboot. But turns out the original script has the same problem - hopefully I can identify the source.
Posts: 3179
Joined: Sat 31 May 2008, 19:00

#40 Post by jpeps »

jemimah wrote: But turns out the original script has the same problem - hopefully I can identify the source.
I see debugging commented out "set -x"
Post Reply