This thread should mostly focus on binary operations in bash but it is inspired by my thinking of how one could use more currency in sc0ttman's package manager (i.e. pkg) to optimize it. Perhaps, regarding the package manager this is more an academic exercise than of practical importance but I don't know. In my fork of pkg I plan to put my concurrency experiments in a separate file from one which deviates less from sc0tmann's work.
What I'm thinking
Anyway, why am I thinking about binary operations. I was thinking about an efficient way of determining which files are priority to install/download and what we need to do to them. Sc0tmann uses a simple for loop to do recursion in listing the dependencies
and this works well because to determine all the dependencies we shouldn't need that many levels of recursion.
Code: Select all
# recursive search deps of deps
for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
do
deps_list_file="${TMPDIR}all_deps_${i}"
Now since this loop keeps track of the recursion depth, we can use this information to see which package to download and or install first. We can create a directory of all the packages to process (e.g. download or install) and in this directory put a file prefixed by the recursion depth and the file woulc contain a bunch of status flags. Each bit will denote some status (e.g. things we have done or need to do).E.g.
Bit#1 - 1=download; 0=no download
Bit#2 - 1=install, 2=no install
Bit#3 - 1=downloading, 0=not downloading
etc.
Using this type of system we can check multiple flags at once in an if statement without using operations like "and" (e.g. && or -a) and "or) ( "||" or -o ). There is also less bytes that one needs to read from a file.
Note that these files will be stored in ram so their access times will be fast but if we want faster semantics we could simulate the semantics related to manipulating these files and use associative arras or in memory databases (memory mapped file?).
Okay, how do binary operations work in bash?
Okay, keep in mind that this is only a brainstorming thread for me but here are some things that I dug up:
Here is an example of a bitwise or operation in bash:
Code: Select all
$ echo "$(( 0x123456 | 0x876543 ))"
9925975
If you want to read a binary file into a string form you can use hexdump:
https://unix.stackexchange.com/a/469239Read only X bytes:Code: Select all
head -cX binary.file | hexdump -v -e '/1 "%u\n"' | while read c; do echo $c done
You can also use the head statment to extract a given number of bytes,
but you have to be careful when doing this because the $() operater strips the last newline char (see post)If you want to stick with shell utilities, you can use head to extract a number of bytes, and od to convert a byte into a number.Code: Select all
export LC_ALL=C # make sure we aren't in a multibyte locale n=$(head -c 1 | od -An -t u1) string=$(head -c $n)
Another method is to use the dd command:
https://unix.stackexchange.com/a/10966...
dd reads any and all data... It certainly won't baulk at zero as a "length"... but if you have \x00 anywhere in your data, you will need to be creative how you handle it; dd has no propblems with it, but your shell script will have problems (but it depends on what you want to do with the data)... The following basically outputs each "data string", to a file with a line divider between each strin...
...Code: Select all
((count=1)) # count of bytes to read dd if=binfile bs=1 skip=$skip count=$count of=datalen 2>/dev/null (( $(<datalen wc -c) != count )) && { echo "INFO: End-Of-File" ; break ; } strlen=$((0x$(<datalen xxd -ps))) # xxd is shipped as part of the 'vim-common' package
This is probably the best method if you don't need the result in human readable form and has the added plus that you can skip so many bytes of data. The ability to skip bytes of data might allow for a fast lookup. Dare a say a database written in Bash! (now there's an academic exercise).
However, the dd syntax is a bit esoteric and it doesn't give the answer in human readable form. As a consequence one might prefer hexdump.
Final Thoughts
As noted above this all may be an academic exercise and a way of learning about binary operations in bash. However, if one can make significant performance gains by using binary operations then it can not only speed up installation times but it can speed up build systems. Typically the list of installed packages isn't large so creating a file in ram to represent each installed file isn't a huge cost in either ram or i/o usage. However, the biggest bottle neck -- aside from potentially download times -- is processing the package databases.
Linux typically uses text files in their package management systems because they are easy to troubleshoot and the quantity of data isn't that large. One might be able to convert these into a database by copying them to ram (i.e. the /tmp folder) and then indexing each entry. This would be a very fast ad-hock database. A binary search can be used to find the starting byte of a given record in the package database. This search would do a seek operation via the dd command. The dd command then would seek to the appropriate point in the text file.
Yes bash might not be the best language for this but the ideas could be ported to any language. The advantage of using bash is that it is almost always available in linux, even in very small systems and much of the package management tools are already written in bash.