Need file3 with ONLY differences between file1 and file2

Message

laurentius77 · #1 Post by **laurentius77** » Sun 11 Oct 2015, 09:51

I have a simple problem for many but for me was unsolvable until now:
file1 with only a single line content
01 02 03 04 05 06

file2 also with only single line content
01 02 03 04 05 06 07 08 09 10

I want to get file3 also single line, having as content only the differences found between file1 and file2

like:
file3 with ONLY the difference content
07 08 09 10

I tried to find the answer to this problem which looks simple by google it but I found solutions on other complex situations, not for this.
Thank you

MochiMoppel · #2 Post by **MochiMoppel** » Sun 11 Oct 2015, 10:21

Many ways. As long as your real file contents are as simple as in your example you can try

Code: Select all

#!/bin/sh
CONTENT_FILE1=$(< /root/tmp/file1)
CONTENT_FILE2=$(< /root/tmp/file2)
echo -n ${CONTENT_FILE2/$CONTENT_FILE1}  > /root/tmp/file3

Change the path as needed.

laurentius77 · #3 Post by **laurentius77** » Sun 11 Oct 2015, 10:26

MochiMoppel wrote:Many ways. As long as your real file contents are as simple as in your example you can try
Code: Select all
#!/bin/sh
CONTENT_FILE1=$(< /root/tmp/file1)
CONTENT_FILE2=$(< /root/tmp/file2)
echo -n ${CONTENT_FILE2/$CONTENT_FILE1}  > /root/tmp/file3
Change the path as needed.

laurentius77 · #4 Post by **laurentius77** » Sun 11 Oct 2015, 12:42

Thanks
The script is working but I need more, but the results are not that I expected because my files should first be sorted like:

1 2 3 4 5 6 7 8 9 10 11 12 13 14

and they are like

1 10 11 12 13 14 2 3 4 5 6 7 8 9

How can I sort them ascending with lower number first and bigger number last?

sort -n doesn't work for my single line files

Or I need something that try to identify which numbers from file2 are not found in file1

Any ideea?

MochiMoppel · #5 Post by **MochiMoppel** » Sun 11 Oct 2015, 13:17

laurentius77 wrote:The script is working but I need more

Then you should have asked for more in the first place, giving a realistic example of your input files and and the expected result. Now the topic changed from file differences to file sorting? Sorting what? The input or the output? Clearly I don't know anymore what you are asking for.

laurentius77 · #6 Post by **laurentius77** » Sun 11 Oct 2015, 13:54

I'm so sorry that I was confusing...

Here is a real example

file1
0,1,10,11,12,13,14,15,16,17,18,19,2,20,21,22,23,24,25,26,27,28,29,3,30,31,32,33,34,35,36,37,38,39,4,40,41,42,43,44,5,6,7,8,9

file2
0,1,10,11,12,13,14,15,16,17,18,19,2,20,21,22,23,24,25,26,27,28,29,3,30,31,32,33,34,35,36,37,38,39,4,40,41,42,43,5,6,7,8,9

I expect

file3
44

Just a file with what is different between those two files

Sorry again for my mistake.

MochiMoppel · #7 Post by **MochiMoppel** » Sun 11 Oct 2015, 16:50

Try

Code: Select all

#!/bin/sh
IFS=${IFS},
CONTENT_FILE1=$(< /root/tmp/file1)
CONTENT_FILE1=$(printf '%s\n' $CONTENT_FILE1 | sort -n)
CONTENT_FILE2=$(< /root/tmp/file2)
CONTENT_FILE2=$(printf '%s\n' $CONTENT_FILE2 | sort -n)
echo -n ${CONTENT_FILE2#$CONTENT_FILE1} > /root/tmp/file3

This assumes that file2 has more values than file1 (like in your first example). I don't know if your 2nd example had this rule(?) switched intentionally or by mistake. If there is no rule, you will have to check first, which of the 2 files is the bigger one, then adapt the last line.

Results in file3 will be space delimited.

laurentius77 · #8 Post by **laurentius77** » Sun 11 Oct 2015, 19:07

Thank you sir and I'm sorry for my lack of attention.
Yes, indeed, one file has more values than another. In the second example (in reality) file2 have less values than file1. But I will addapt the script in order to have it funcional.
Thank you a lot!

MochiMoppel wrote:Try
Code: Select all
#!/bin/sh
IFS=${IFS},
CONTENT_FILE1=$(< /root/tmp/file1)
CONTENT_FILE1=$(printf '%s\n' $CONTENT_FILE1 | sort -n)
CONTENT_FILE2=$(< /root/tmp/file2)
CONTENT_FILE2=$(printf '%s\n' $CONTENT_FILE2 | sort -n)
echo -n ${CONTENT_FILE2#$CONTENT_FILE1} > /root/tmp/file3
This assumes that file2 has more values than file1 (like in your first example). I don't know if your 2nd example had this rule(?) switched intentionally or by mistake. If there is no rule, you will have to check first, which of the 2 files is the bigger one, then adapt the last line.

Results in file3 will be space delimited.

Ted Dog · #9 Post by **Ted Dog** » Sun 11 Oct 2015, 20:10

SORT:

-u, --unique
used to work for me, may need to set a delimiter since not default <linefeed>

but on right tract.

cat file1 file2 >file3;
sort -u file3

Ted Dog · #10 Post by **Ted Dog** » Sun 11 Oct 2015, 20:29

wow I remember I last used sort -u this month 21 (or 20) years ago, had a programming contract which started on Columbus Day ( also a Monday so should be able to figure out if 20/21 yrs ago ) and I did not go to work since it was a federal holiday MAN where they Mad I did not go to work, I.T. did work even at the investment bank on Columbus Day.

Looks like SORT is smarter than it once was, can't find setting to treat commas as linefeed.

6502coder · #11 Post by **6502coder** » Sun 11 Oct 2015, 23:31

@OP

I hope I'm not just muddying the waters, but following up on MochiMoppel's last post above, just what exactly are you assuming about the two files? For example,

a) is it possible for EACH file to have values that the OTHER file does not, as in:

file1:
1, 3, 4, 6, 13, 21

file2:
1, 2, 4, 6, 17

where file1 has 3, 13, and 21 but file2 does not, and conversely file2 has 2 and 17 but file1 does not.

And if this is possible, do you care which file the differences come from? If the result is

file3:
2, 3, 13, 17, 21

does it matter that there is no way to tell (from file3) which value came from which input file?

b) is it safe to assume that the values in each input file are numbers, and are sorted in some order (either numeric or lexicographic)?

I ask these things not to nitpick, but because the more precisely you state your requirements, the better solution "we" (ie MochiMoppel

) can provide.

MochiMoppel · #12 Post by **MochiMoppel** » Mon 12 Oct 2015, 01:50

Ted Dog wrote:... on right tract.
cat file1 file2 >file3;
sort -u file3

Maybe right track, but wrong train: sort -u does not return only the unique values (which would indeed be the solution). Instead it only eliminates duplicates. In laurentius77's example you would end up with a file containing exactly the same data as the bigger of the two input files.

Ted Dog · #13 Post by **Ted Dog** » Mon 12 Oct 2015, 03:18

Lol did I say it was 20 or more years ago.. may have left off a difference flag but that was PART of the way I solved same type problem. Also used AWK ... I think or SED but never can recall exactly all the command flags.

MochiMoppel · #14 Post by **MochiMoppel** » Mon 12 Oct 2015, 03:45

Ted Dog wrote:Lol did I say it was 20 or more years ago..

Yes you did. At this age it's getting hard for old dogs to remember old tricks

Still not being sure if I'm on the right train, here is a solution that is more robust. It determines the file with the highest value, then returns all values from that file that are greater than the greatest value of the other file. Does this make sense? In below example it returns '31 40' (of file2). It will return an empty string if both files contain the same maximum value.

Code: Select all

#!/bin/sh
# Create some test files
echo -n "11,30,5,28"    > /tmp/file1
echo -n "24,7,40,9,31"  > /tmp/file2

IFS=${IFS},
CONTENT_FILE1=$(< /tmp/file1)
CONTENT_FILE1=$(printf '%s\n' $CONTENT_FILE1 | sort -n)
HIGHEST_VALU1=${CONTENT_FILE1##*$'\n'}

CONTENT_FILE2=$(< /tmp/file2)
CONTENT_FILE2=$(printf '%s\n' $CONTENT_FILE2 | sort -n)
HIGHEST_VALU2=${CONTENT_FILE2##*$'\n'}

if ((HIGHEST_VALU2 > HIGHEST_VALU1)); then
	LARGER=$CONTENT_FILE2
	SMALLER=$CONTENT_FILE1
	MIN_VAL=$HIGHEST_VALU1
else
	LARGER=$CONTENT_FILE1
	SMALLER=$CONTENT_FILE2
	MIN_VAL=$HIGHEST_VALU2
fi
for VAL in $LARGER ;do
	(($VAL > $MIN_VAL)) && RESULT="${RESULT}${VAL} "
done
RESULT=${RESULT% }  #Remove trailing space
echo -n $RESULT > /tmp/file3

# Show test result
gxmessage -file  /tmp/file3

smokey01 · #15 Post by **smokey01** » Mon 12 Oct 2015, 04:05

There used to be a great little app called xfdiff-cut for doing just this.
Drag file1 to the left pane, the second file to the right and click apply. The differences are shown in the bottom panel.

It was present in all earlier pups but seems to have disappeared.

I have a version here compiled in slacko64 if it's of any help.

MochiMoppel · #16 Post by **MochiMoppel** » Mon 12 Oct 2015, 05:59

smokey01 wrote:It was present in all earlier pups but seems to have disappeared..

Maybe for a good reason

Argolance · #17 Post by **Argolance** » Mon 12 Oct 2015, 16:11

Bonjour,

Code: Select all

$ cat file1
zzzz
eeee
rrrr
yyyy
uuuu
iiii
$ cat file 2
yyyy
aaaa
zzzz
rrrr
eeee
uuuu
iiii
$ comm -3 <(sort file1)  <(sort file2)
aaaa

This works fine in console but not inside my script!

Cordialement.

L18L · #18 Post by **L18L** » Mon 12 Oct 2015, 17:36

Argolance wrote:
Code: Select all
$ comm -3 <(sort file1)  <(sort file2)
aaaa
This works fine in console but not inside my script!

This

Code: Select all

# 
# LANGUAGE=en comm -3 $(sort file1)  $(sort file2)
comm: extra operand ‘rrrr’
Try 'comm --help' for more information.
# 
# sort file1 > file1s
# sort file2 > file2s
# LANGUAGE=en comm -3 file1s  file2s
	aaaa
#

works in my console

thanks for pointing to this nice tool: comm

MochiMoppel · #19 Post by **MochiMoppel** » Tue 13 Oct 2015, 06:08

Argolance wrote:This works fine in console but not inside my script!

It doesn't work when your script starts with shebang #!/bin/sh. It works when you change to #!/bin/bash.

From the bash manual:

"When invoked as sh, Bash enters POSIX mode after reading the startup files.
The following list is what’s changed when ‘POSIX mode’ is in effect:
<snip>
28. Process substitution is not available."

smokey01 · #20 Post by **smokey01** » Wed 14 Oct 2015, 06:42

MochiMoppel wrote:
smokey01 wrote:It was present in all earlier pups but seems to have disappeared..
Maybe for a good reason

I agree.

I've been doing a bit of searching and diffuse looks pretty good.
http://diffuse.sourceforge.net/

It seems to work on all file types.

(old)Puppy Linux Discussion Forum

(old)Puppy Linux Discussion Forum

Need file3 with ONLY differences between file1 and file2

Need file3 with ONLY differences between file1 and file2

Empry result...I will try more, maybe I misspelled something

I need a method for real sorting from lower to higher values

Re: I need a method for real sorting from lower to higher values

Real example

Thank you

comm