(old)Puppy Linux Discussion Forum

Posted: **Wed 22 Aug 2012, 14:55**

Hello!
I am working on a new rating system in pMusic. I want to see how many times the track is played last week or last month.
The idea is to add 'date +%s' to the rating-index each time the track has played. The index file could look like this

Code: Select all

/root/file1.mp3| 1345641102,1345644563,1345647584,...
/root/file2.mp3| 1345623045,1345643786,1345648695,...

Now the question is:
How many times has file2 played since 1345643390?

In other words:
How many numbers in line 2 is above 1345643390?

The list will be long, so we need a fast calculation

Thank you
Sigmund

Posted: **Wed 22 Aug 2012, 15:20**

So those numbers are time stamps that show each time the file was played? Are they in chronological order?

Couldn't you just find the number of commas in each line? (The reverse of counting sheep by counting their legs and dividing by four.

)

Posted: **Wed 22 Aug 2012, 16:24**

Code: Select all

#!/bin/ash
#
log="/root/file2.mp3| 1345623045,1345643786,1345648695"
#
#How many numbers in log is below 1345643390? 
#How many numbers in log is below oneTIME
oneTIME=1345643390 # seconds since 1970-1-1
COUNT=0
for VAR in `echo $log | cut -d'|' -f2 | tr ',' '\n'`; do
 [ $(( $VAR - $oneTIME )) -gt 0 ] &&  COUNT=$(( $COUNT + 1 ))
done
echo $COUNT numbers are above $oneTIME

The following should be faster if in chronological order:

Code: Select all

#!/bin/ash
#
log="/root/file2.mp3| 1345623045,1345643786,1345648695"
#
#How many numbers in log is above 1345643390? 
#How many numbers in log is above oneTIME
oneTIME=1345643390 # seconds since 1970-1-1
COUNT=0
for VAR in `echo $log | cut -d'|' -f2 | tr ',' '\n'`; do
 [ $(( $VAR - $oneTIME )) -lt 0 ] &&  COUNT=$(( $COUNT + 1 )) || break
done
echo $COUNT numbers are below $oneTIME

You have to subtract this from the total number of commas.

Posted: **Wed 22 Aug 2012, 19:46**

zigbert,

If you don't care about dates, you might just do:

Code: Select all

grep <file name> music_play_log | wc -l

Regards,
s

Posted: **Wed 22 Aug 2012, 20:54**

Thank you guys for your feedback

Yes, the numbers will be in in chronological order.

@seaside
The rating system today shows how many times you have played the track - with no dates. I want more.....

@L18L
This is sure a solution, but it will be slow when rating-index grows.

Sigmund

Posted: **Wed 22 Aug 2012, 21:04**

zigbert wrote:[...]it will be slow when rating-index grows.

How about binary search algorithm?
Although it looks pretty complex to implement (at least for me) when using such a type of index (item1, item2, item3...), but it should be easier with arrays.
Here's the snippet I found:
http://www.bashguru.com/2011/01/binarys ... cript.html

Greetings!

Posted: **Thu 23 Aug 2012, 13:02**

SFR wrote:How about binary search algorithm?
Although it looks pretty complex to implement (at least for me) when using such a type of index (item1, item2, item3...), but it should be easier with arrays.
Here's the snippet I found:
http://www.bashguru.com/2011/01/binarys ... cript.html

Here is what I made from that snippet:

Code: Select all

#!/bin/sh
# SCRIPT : derived from: binarysearch.sh
# SCRIPT : bsearch.sh
# USAGE  : bsearch.sh
# PURPOSE: Searches given number in a sorted list.
#                         \\\\ ////
#                         \\ - - //
#                            @ @
#                    ---oOOo-( )-oOOo---
#
#####################################################################
#                      Define Functions Here                        #
#####################################################################

binarysearch()
{
    status=-1
    i=1
    array=($(echo "$@"))
    LowIndex=0
    HeighIndex=$((${#array[@]}-1))
    while [ $LowIndex -lt $HeighIndex ]
    do
        MidIndex=$(($LowIndex+($HeighIndex-$LowIndex)/2))
        MidElement=${array[$MidIndex]}
        case $(($MidElement - $SearchedItem)) in
        -*) LowIndex=$(($MidIndex+1)) ;;
         0) status=0
            searches=$i
            return ;; 
        *)  HeighIndex=$(($MidIndex-1)) ;;
        esac
        let i++
    done
}

#####################################################################
#                       Variable Declaration                       #
#####################################################################

clear
echo "Enter Sorted Array Elements : "
read -a ARRAY
count=${#ARRAY[@]}

#####################################################################
#                       Main Script Starts Here                     #
#####################################################################

while [ : ]
do
     echo -n "
     Type in a number or break using CTRL-C: "
     read SearchedItem
     binarysearch "${ARRAY[@]}"
     [ $SearchedItem -gt $MidElement ] \
      && echo "nothing found equal or later than $SearchedItem" \
      || echo "$SearchedItem found after $searches searches, MidIndex=$MidIndex MidElement=$MidElement"

done

Posted: **Thu 23 Aug 2012, 15:50**

And here's it's application in zigbert's problem:

Code: Select all

#!/bin/bash

#usage: played_since filename time

#data stored in database.txt

file="$1";
time="$2";

ifs_bak="$IFS"
IFS=""

while read -d "|" record; do
	#check for file
	if test "$record" = "$file"; then
		IFS=","
		#Get all timestamps in array
		read -a times
		#Binary search
		i=$(( ${#times[*]}/2 ))
		d=$(( $i/2 ))
		test "$d" -eq "0" && d=1 #precaution
		ulim=$(( ${#times[*]}-1 ))
		while test "$i" -ge "0" -a "$i" -lt "$ulim"; do #check whether index is in range
			if test "$time" -lt "${times[$i]}"; then
				i=$(( $i-$d ))
			elif test "$time" -gt "${times[$(( $i+1 ))]}"; then
				i=$(( $i+$d ))
			else
				#found, tell no. of records greater than this
				echo "$(( $ulim-$i ))"
				exit
			fi
			d=$(( ($d/2) ))
			test "$d" -eq "0" && d=1 #precaution
		done
		#In case of 'out of range'
		if test "$ulim" -lt "0"; then
			echo "0"
		else
			if test "$time" -lt "$times"; then
				echo "${#times[*]}"
			elif test "$time" -gt "${times[$ulim]}"; then
				echo "0"
			else
				echo "bug" #This should not be reached
			fi
		fi
		exit
	fi
	read trash
done < database.txt

IFS="$ifs_bak"

echo "File not found"

I have found that overhead of reading data from file into array is much larger than time required for binary search, we have to do something there.

Plus, if the lists are going to be really large it would be performance advantage to store timestamps of each track in a separate file.

I am using a c++ program to generate about 87 MB test file to search in.

Code: Select all

//g++ -Wall -o data_gen source.cpp

#include <iostream>
#include <fstream>
#include <cstdlib>

using namespace std;

int main()
{
	ofstream strm("database.txt", ios::out);
	
	int i, j;
	for (i = 0; i < 100; i++)
	{
		unsigned long rand_time = 0;
		strm << "file" << i << ".mp3|";
		for (j = 0; j < 100000; j++)
		{
			rand_time += (rand() % 1000);
			strm << rand_time << ",";
		}
		rand_time += (rand() % 1000);
		strm << rand_time << "\n";
	}
	
	strm.close();
	
	return 0;
}

Time taken for last file roughly equals time to generate the database.

Code: Select all

# time ./data_gen 

real	0m6.873s
user	0m3.936s
sys	0m0.257s
# time ./played_since file0.mp3 48000000
3881

real	0m0.269s
user	0m0.173s
sys	0m0.027s
# time ./played_since file99.mp3 48000000
3672

real	0m6.849s
user	0m5.696s
sys	0m0.677s
#

Posted: **Thu 23 Aug 2012, 16:59**

Thank you guys
Pretty amazing you want to spend so much time helping out!!!!!

I made a test.
1.) Checking my existing rating index - there are about 2000 tracks in the index.
2.) Make a fake index based on the new structure including 2000 tracks and 25 timestamps for each track. All together 574kb.
3.) Run the played_since script to get an assumption of how fast it can build an updated list of favorite tracks.

Code: Select all

# cut -d'|' -f1 ./database.txt > /tmp/played_since; time while read I; do ./played_since.sh $I 1345643390; done < /tmp/played_since
...
...
...
24
24
24
24
24
24
24
24
24
24

real	0m30.799s
user	0m21.339s
sys	0m5.560s
#

I have a modern system, and still it takes half a minute to build a rating list. Is this an understanding why ratings in the big music-apps uses stars for rating instead of a exact status.

But still, it would be awesome to define a given time (last week, januar 2011, or whatever) and then get My favorite tracks in that period.

Sigmund

Posted: **Thu 23 Aug 2012, 17:55**

If you want to process all files then why not go this way:

Code: Select all

#!/bin/bash

#usage: played_since_foreach time

time="$1";

ifs_bak="$IFS"
IFS=""

while read -d "|" record; do
	IFS=","
	#Get all timestamps in array
	read -a times
	#Binary search
	i=$(( ${#times[*]}/2 ))
	d=$(( $i/2 ))
	test "$d" -eq "0" && d=1 #precaution
	ulim=$(( ${#times[*]}-1 ))
	count=-1
	while test "$i" -ge "0" -a "$i" -lt "$ulim"; do #check whether index is in range
		if test "$time" -lt "${times[$i]}"; then
			i=$(( $i-$d ))
		elif test "$time" -gt "${times[$(( $i+1 ))]}"; then
			i=$(( $i+$d ))
		else
			#found, tell no. of records greater than this
			count="$(( $ulim-$i ))"
			break
		fi
		d=$(( ($d/2) ))
		test "$d" -eq "0" && d=1 #precaution
	done
	#In case of 'out of range'
	if test "$count" -lt "0"; then
		if test "$ulim" -lt "0"; then
			count=0
		else
			if test "$time" -lt "$times"; then
				count="${#times[*]}"
			elif test "$time" -gt "${times[$ulim]}"; then
				count=0
			else
				count="bug" #This should not be reached
			fi
		fi
	fi
	#Show result for this record
	echo "$record: $count"
done < database.txt

IFS="$ifs_bak"

This takes about 20 seconds to process my 84 mb test file.

zigbert wrote: 2.) Make a fake index based on the new structure including 2000 tracks and 25 timestamps for each track. All together 574kb.

I think I overestimated the size

, such a file takes about 2s on my system using the above script. But I think it could be even faster.

Posted: **Thu 23 Aug 2012, 18:32**

1.1 sec here

But I think it could be even faster.

Would be good for those with older hardware. - And those are many in the Puppy world.

Thanks a lot (so far)
Sigmund

Posted: **Thu 23 Aug 2012, 19:15**

akash_rawal
I don't think 5 sec is problematic if user request building a specific rating list....

The speed issue arrives when pMusic wants to give a general rating list of the most popular songs in the Overview mode. A workaround could be to always put the last played track at the bottom of the rating list. When later requesting a quick rating-list we could just 'tail -n 200 > database.txt' before executing played_since_foreach.

That means your code will somehow do the job as is, but if speed improvements is possible, it would sure help.

Sigmund

Posted: **Fri 24 Aug 2012, 01:32**

something like:

Code: Select all

LOGENTRY=`grep $filename log.txt`
#grep is faster than while read case on large files
STAMPS=${LOGENTRY#*|}
#you could use bash arrays instead - set allows "arrays" in sh, ash, etc...
set ${STAMPS//,/ }
#for speed change IFS to "," vs. the //,/ } but don't forget to reset it
while ([ $1 -lt $timestamp ]) do
shift #a hacky way to remove all entries < timestamp
done
# $# is the number of args passed or in this case set with "set ..."
echo The number of times played since $timestamp is $#

haven't tested for accuracy, but something along these lines should be pretty fast... even faster using busybox ash, based on past experience

Posted: **Fri 24 Aug 2012, 12:27**

technosaurus wrote:...haven't tested for accuracy, but ...

Hope this will suffice for accuracy...using zigbert´s original database
# cat database.txt
/root/file1.mp3| 1345641102,1345644563,1345647584
/root/file2.mp3| 1345623045,1345643786,1345648695
#
# ./file_times_played_since 1345644563
The number of times played /root/file1.mp3 since 1345644563 is 2
The number of times played /root/file2.mp3 since 1345644563 is 1
#
# ./file_times_played_since 1345648694
ash: 1345648694: unknown operand
The number of times played /root/file1.mp3 since 1345648694 is 0
The number of times played /root/file2.mp3 since 1345648694 is 1
#

file_times_played_since:

Code: Select all

#!/bin/ash
timestamp=$1

ifs_bak=$IFS
IFS=","

while read LOGENTRY; do

filename=${LOGENTRY%%|*}
#LOGENTRY=`grep $filename database.txt`
#grep is faster than while read case on large files

STAMPS=${LOGENTRY#*|}
#you could use bash arrays instead - set allows "arrays" in sh, ash, etc...
#set ${STAMPS//,/ }
set ${STAMPS}
#for speed change IFS to "," vs. the //,/ } but don't forget to reset it
while ([ $1 -lt $timestamp ]) do
shift #a hacky way to remove all entries < timestamp
done
# $# is the number of args passed or in this case set with "set ..."
echo The number of times played $filename since $timestamp is $#

done < database.txt

IFS=$ifs_bak

The only issue I have found is when result is zero:
ash: 1345648694: unknown operand

Posted: **Fri 24 Aug 2012, 14:43**

Oops need to prepend the -lt test with a [ $1 ] && ... For the case where it has not been played in the time period

Posted: **Fri 24 Aug 2012, 14:47**

This sounds interesting.
In am on the run this weekend, so I can't compare executing-time until Monday.

Sigmund

Posted: **Fri 24 Aug 2012, 14:50**

technosaurus wrote:Oops need to prepend the -lt test with a [ $1 ] && ... For the case where it has not been played in the time period

....and that means????

Posted: **Fri 24 Aug 2012, 16:27**

My submission:

Code: Select all

#!/bin/ash
LANG=C

FIND=3421  # timestamp to find (random)
TIMES=3650 # 3650 timestamps (tracks played everyday for 10 years) - number from 1 to $TIMES
SONGS=2000 # 2000 songs

# generate fake timestamp data
tstamps=$(seq -s, 1 $TIMES)
line="/path/to/song.mp3 | $tstamps"
#echo $line

# generate fake playlist (repeat timestamp data $SONGS times)
for a in $(seq 1 $SONGS); do
	echo $line
done | 

# do actual work
awk -F, -v TS=$FIND '{
		split($1, a, "|"); $1=a[2] # fix first entry
		max=NF; min=1; 
		
		while (max-min > 1) {
			i=int( (max+min)/2 )
			if (TS >= $i) min = i
			else max = i
		}
		print a[1], min
}'

With TIMES=3650 and SONGS=2000 as above (meaning, 3650 timestamps per song, with 2000 songs), result:

Code: Select all

# time ./fav.sh  > /dev/null

real	0m1.641s
user	0m2.916s
sys	0m0.073s

.

Using Ziggy's original parameter of 25 timestamps per song and 2000 songs (TIMES=25, SONGS=2000), result

Code: Select all

# time ./fav.sh  > /dev/null

real	0m0.068s
user	0m0.080s
sys	0m0.010s

Using 25 timestamps for 20,000 songs (TIMES=25, SONGS=20000), result:

Code: Select all

# time ./fav.sh  > /dev/null

real	0m0.479s
user	0m0.730s
sys	0m0.057s

Using 3650 timestamps for 20,000 songs (TIMES=3650, SONGS=20000), result:

Code: Select all

# time ./fav.sh  > /dev/null

real	0m16.855s
user	0m30.308s
sys	0m0.520s

Of course these are on my machine and is not directly comparable with anyone else's number. Ziggy need to run this on his own machine to test

cheers!

Posted: **Fri 24 Aug 2012, 17:05**

zigbert wrote:
technosaurus wrote:Oops need to prepend the -lt test with a [ $1 ] && ... For the case where it has not been played in the time period
....and that means????

change

Code: Select all

while ([ $1 -lt $timestamp ]) do

to

Code: Select all

while ( [ $1 ] && [ $1 -lt $timestamp ]) do

tested successfully (no more: ash: 1345623045: unknown operand ):
# ./file_times_played_since 1345648694
The number of times played /root/file1.mp3 since 1345648694 is 0
The number of times played /root/file2.mp3 since 1345648694 is 1
#

Posted: **Fri 24 Aug 2012, 17:16**

@jamesbond - over 100 lines grep becomes faster than using builtins, and it doesn't slow it down significantly for less than 100 lines, so its probably worth calling grep in this case since we can't assume a user has/plays only a few songs.

L18L wrote:
zigbert wrote:
technosaurus wrote:Oops need to prepend the -lt test with a [ $1 ] && ... For the case where it has not been played in the time period
....and that means????
change
Code: Select all
while ([ $1 -lt $timestamp ]) do 
to
Code: Select all
while ( [ $1 ] && [ $1 -lt $timestamp ]) do
tested successfully (no more: ash: 1345623045: unknown operand ):
# ./file_times_played_since 1345648694
The number of times played /root/file1.mp3 since 1345648694 is 0
The number of times played /root/file2.mp3 since 1345648694 is 1
#

thanks, I was/am posting from my phone.

(old)Puppy Linux Discussion Forum

Bash: Count numbers in list

Bash: Count numbers in list

Re: Bash: Count numbers in list

Count numbers in list

Binary search application

Bash: Count numbers in list

Count numbers in list

Re: Count numbers in list