Need file3 with ONLY differences between file1 and file2

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Message
Author
laurentius77
Posts: 82
Joined: Wed 30 Mar 2011, 07:02

Need file3 with ONLY differences between file1 and file2

#1 Post by laurentius77 »

I have a simple problem for many but for me was unsolvable until now:
file1 with only a single line content
01 02 03 04 05 06

file2 also with only single line content
01 02 03 04 05 06 07 08 09 10

I want to get file3 also single line, having as content only the differences found between file1 and file2

like:
file3 with ONLY the difference content
07 08 09 10

I tried to find the answer to this problem which looks simple by google it but I found solutions on other complex situations, not for this.
Thank you

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#2 Post by MochiMoppel »

Many ways. As long as your real file contents are as simple as in your example you can try

Code: Select all

#!/bin/sh
CONTENT_FILE1=$(< /root/tmp/file1)
CONTENT_FILE2=$(< /root/tmp/file2)
echo -n ${CONTENT_FILE2/$CONTENT_FILE1}  > /root/tmp/file3
Change the path as needed.

laurentius77
Posts: 82
Joined: Wed 30 Mar 2011, 07:02

Empry result...I will try more, maybe I misspelled something

#3 Post by laurentius77 »

MochiMoppel wrote:Many ways. As long as your real file contents are as simple as in your example you can try

Code: Select all

#!/bin/sh
CONTENT_FILE1=$(< /root/tmp/file1)
CONTENT_FILE2=$(< /root/tmp/file2)
echo -n ${CONTENT_FILE2/$CONTENT_FILE1}  > /root/tmp/file3
Change the path as needed.

laurentius77
Posts: 82
Joined: Wed 30 Mar 2011, 07:02

I need a method for real sorting from lower to higher values

#4 Post by laurentius77 »

Thanks
The script is working but I need more, but the results are not that I expected because my files should first be sorted like:

1 2 3 4 5 6 7 8 9 10 11 12 13 14

and they are like

1 10 11 12 13 14 2 3 4 5 6 7 8 9

How can I sort them ascending with lower number first and bigger number last?

sort -n doesn't work for my single line files

Or I need something that try to identify which numbers from file2 are not found in file1

Any ideea?

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

Re: I need a method for real sorting from lower to higher values

#5 Post by MochiMoppel »

laurentius77 wrote:The script is working but I need more
Then you should have asked for more in the first place, giving a realistic example of your input files and and the expected result. Now the topic changed from file differences to file sorting? Sorting what? The input or the output? Clearly I don't know anymore what you are asking for.

laurentius77
Posts: 82
Joined: Wed 30 Mar 2011, 07:02

Real example

#6 Post by laurentius77 »

I'm so sorry that I was confusing...

Here is a real example

file1
0,1,10,11,12,13,14,15,16,17,18,19,2,20,21,22,23,24,25,26,27,28,29,3,30,31,32,33,34,35,36,37,38,39,4,40,41,42,43,44,5,6,7,8,9

file2
0,1,10,11,12,13,14,15,16,17,18,19,2,20,21,22,23,24,25,26,27,28,29,3,30,31,32,33,34,35,36,37,38,39,4,40,41,42,43,5,6,7,8,9

I expect

file3
44

Just a file with what is different between those two files

Sorry again for my mistake.

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#7 Post by MochiMoppel »

Try

Code: Select all

#!/bin/sh
IFS=${IFS},
CONTENT_FILE1=$(< /root/tmp/file1)
CONTENT_FILE1=$(printf '%s\n' $CONTENT_FILE1 | sort -n)
CONTENT_FILE2=$(< /root/tmp/file2)
CONTENT_FILE2=$(printf '%s\n' $CONTENT_FILE2 | sort -n)
echo -n ${CONTENT_FILE2#$CONTENT_FILE1} > /root/tmp/file3
This assumes that file2 has more values than file1 (like in your first example). I don't know if your 2nd example had this rule(?) switched intentionally or by mistake. If there is no rule, you will have to check first, which of the 2 files is the bigger one, then adapt the last line.

Results in file3 will be space delimited.

laurentius77
Posts: 82
Joined: Wed 30 Mar 2011, 07:02

Thank you

#8 Post by laurentius77 »

Thank you sir and I'm sorry for my lack of attention.
Yes, indeed, one file has more values than another. In the second example (in reality) file2 have less values than file1. But I will addapt the script in order to have it funcional.
Thank you a lot!

MochiMoppel wrote:Try

Code: Select all

#!/bin/sh
IFS=${IFS},
CONTENT_FILE1=$(< /root/tmp/file1)
CONTENT_FILE1=$(printf '%s\n' $CONTENT_FILE1 | sort -n)
CONTENT_FILE2=$(< /root/tmp/file2)
CONTENT_FILE2=$(printf '%s\n' $CONTENT_FILE2 | sort -n)
echo -n ${CONTENT_FILE2#$CONTENT_FILE1} > /root/tmp/file3
This assumes that file2 has more values than file1 (like in your first example). I don't know if your 2nd example had this rule(?) switched intentionally or by mistake. If there is no rule, you will have to check first, which of the 2 files is the bigger one, then adapt the last line.

Results in file3 will be space delimited.

User avatar
Ted Dog
Posts: 3965
Joined: Wed 14 Sep 2005, 02:35
Location: Heart of Texas

#9 Post by Ted Dog »

SORT:

-u, --unique
used to work for me, may need to set a delimiter since not default <linefeed>

but on right tract.

cat file1 file2 >file3;
sort -u file3

User avatar
Ted Dog
Posts: 3965
Joined: Wed 14 Sep 2005, 02:35
Location: Heart of Texas

#10 Post by Ted Dog »

wow I remember I last used sort -u this month 21 (or 20) years ago, had a programming contract which started on Columbus Day ( also a Monday so should be able to figure out if 20/21 yrs ago ) and I did not go to work since it was a federal holiday MAN where they Mad I did not go to work, I.T. did work even at the investment bank on Columbus Day. :?

Looks like SORT is smarter than it once was, can't find setting to treat commas as linefeed.

User avatar
6502coder
Posts: 677
Joined: Mon 23 Mar 2009, 18:07
Location: Western United States

#11 Post by 6502coder »

@OP

I hope I'm not just muddying the waters, but following up on MochiMoppel's last post above, just what exactly are you assuming about the two files? For example,

a) is it possible for EACH file to have values that the OTHER file does not, as in:

file1:
1, 3, 4, 6, 13, 21

file2:
1, 2, 4, 6, 17

where file1 has 3, 13, and 21 but file2 does not, and conversely file2 has 2 and 17 but file1 does not.

And if this is possible, do you care which file the differences come from? If the result is

file3:
2, 3, 13, 17, 21

does it matter that there is no way to tell (from file3) which value came from which input file?

b) is it safe to assume that the values in each input file are numbers, and are sorted in some order (either numeric or lexicographic)?

I ask these things not to nitpick, but because the more precisely you state your requirements, the better solution "we" (ie MochiMoppel :wink: ) can provide.
Last edited by 6502coder on Mon 12 Oct 2015, 20:43, edited 1 time in total.

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#12 Post by MochiMoppel »

Ted Dog wrote:... on right tract.
cat file1 file2 >file3;
sort -u file3
Maybe right track, but wrong train: sort -u does not return only the unique values (which would indeed be the solution). Instead it only eliminates duplicates. In laurentius77's example you would end up with a file containing exactly the same data as the bigger of the two input files.

User avatar
Ted Dog
Posts: 3965
Joined: Wed 14 Sep 2005, 02:35
Location: Heart of Texas

#13 Post by Ted Dog »

Lol did I say it was 20 or more years ago.. may have left off a difference flag but that was PART of the way I solved same type problem. Also used AWK ... I think or SED but never can recall exactly all the command flags.

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#14 Post by MochiMoppel »

Ted Dog wrote:Lol did I say it was 20 or more years ago..
Yes you did. At this age it's getting hard for old dogs to remember old tricks :lol:

Still not being sure if I'm on the right train, here is a solution that is more robust. It determines the file with the highest value, then returns all values from that file that are greater than the greatest value of the other file. Does this make sense? In below example it returns '31 40' (of file2). It will return an empty string if both files contain the same maximum value.

Code: Select all

#!/bin/sh
# Create some test files
echo -n "11,30,5,28"    > /tmp/file1
echo -n "24,7,40,9,31"  > /tmp/file2

IFS=${IFS},
CONTENT_FILE1=$(< /tmp/file1)
CONTENT_FILE1=$(printf '%s\n' $CONTENT_FILE1 | sort -n)
HIGHEST_VALU1=${CONTENT_FILE1##*$'\n'}

CONTENT_FILE2=$(< /tmp/file2)
CONTENT_FILE2=$(printf '%s\n' $CONTENT_FILE2 | sort -n)
HIGHEST_VALU2=${CONTENT_FILE2##*$'\n'}

if ((HIGHEST_VALU2 > HIGHEST_VALU1)); then
	LARGER=$CONTENT_FILE2
	SMALLER=$CONTENT_FILE1
	MIN_VAL=$HIGHEST_VALU1
else
	LARGER=$CONTENT_FILE1
	SMALLER=$CONTENT_FILE2
	MIN_VAL=$HIGHEST_VALU2
fi
for VAL in $LARGER ;do
	(($VAL > $MIN_VAL)) && RESULT="${RESULT}${VAL} "
done
RESULT=${RESULT% }  #Remove trailing space
echo -n $RESULT > /tmp/file3

# Show test result
gxmessage -file  /tmp/file3

User avatar
smokey01
Posts: 2813
Joined: Sat 30 Dec 2006, 23:15
Location: South Australia :-(
Contact:

#15 Post by smokey01 »

There used to be a great little app called xfdiff-cut for doing just this.
Drag file1 to the left pane, the second file to the right and click apply. The differences are shown in the bottom panel.

It was present in all earlier pups but seems to have disappeared.

I have a version here compiled in slacko64 if it's of any help.

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#16 Post by MochiMoppel »

smokey01 wrote:It was present in all earlier pups but seems to have disappeared..
Maybe for a good reason :wink:
Attachments
xdiff-cut.png
(71.51 KiB) Downloaded 264 times

User avatar
Argolance
Posts: 3767
Joined: Sun 06 Jan 2008, 22:57
Location: PORT-BRILLET (Mayenne - France)
Contact:

#17 Post by Argolance »

Bonjour,

Code: Select all

$ cat file1
zzzz
eeee
rrrr
yyyy
uuuu
iiii
$ cat file 2
yyyy
aaaa
zzzz
rrrr
eeee
uuuu
iiii
$ comm -3 <(sort file1)  <(sort file2)
aaaa
This works fine in console but not inside my script! :shock:

Cordialement.

User avatar
L18L
Posts: 3479
Joined: Sat 19 Jun 2010, 18:56
Location: www.eussenheim.de/

comm

#18 Post by L18L »

Argolance wrote:

Code: Select all

$ comm -3 <(sort file1)  <(sort file2)
aaaa
This works fine in console but not inside my script! :shock:
This

Code: Select all

# 
# LANGUAGE=en comm -3 $(sort file1)  $(sort file2)
comm: extra operand ‘rrrr’
Try 'comm --help' for more information.
# 
# sort file1 > file1s
# sort file2 > file2s
# LANGUAGE=en comm -3 file1s  file2s
	aaaa
# 
works in my console :wink:

thanks for pointing to this nice tool: comm

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#19 Post by MochiMoppel »

Argolance wrote:This works fine in console but not inside my script! :shock:
It doesn't work when your script starts with shebang #!/bin/sh. It works when you change to #!/bin/bash.

From the bash manual:
  • "When invoked as sh, Bash enters POSIX mode after reading the startup files.
    The following list is what’s changed when ‘POSIX mode’ is in effect:
    <snip>
    28. Process substitution is not available."

User avatar
smokey01
Posts: 2813
Joined: Sat 30 Dec 2006, 23:15
Location: South Australia :-(
Contact:

#20 Post by smokey01 »

MochiMoppel wrote:
smokey01 wrote:It was present in all earlier pups but seems to have disappeared..
Maybe for a good reason :wink:
I agree.

I've been doing a bit of searching and diffuse looks pretty good.
http://diffuse.sourceforge.net/

It seems to work on all file types.

Post Reply