AWK code that writes AWK code

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Post Reply
Message
Author
s243a
Posts: 2580
Joined: Tue 02 Sep 2014, 04:48
Contact:

AWK code that writes AWK code

#1 Post by s243a »

Intro

I thought some people might find interesting AWK code that writes AWK code. I know it is the type of thing that style Nazi's hate. The idea here is to write code that quickly computes a canonical alias. By canonical, I mean a standard form. So we choose just one alias to be the canonical alias, and easy alias for the same pkg will return the same canonical result.

Backgound

This will be applied to finding categories as follows. Puppy has a list of files that defines the categories for each package.
/usr/local/petget/categories.dat


Each category is defined like an array. In the form:

Code: Select all

PKGCAT_Desktop_applet=" gfontsel glipper minixcal xclipboard "
/usr/local/petget/categories.dat#L12

AWK uses associative arrays (aka dictionaries, maps, hashmaps, hashtables)...well the exact name depends on an implementation. For instance if the associative arrays are not implemented via a hashtable then some of these names aren't applicable. Typically though associative arrays have fast lookup like a hash table.

We can instead use the package name as the key and the category as the value. This way we can quickly look up the category. However, the name of the package might not be in the same format as in categories.dat. We might have to trim a version number off it or translate it to a different alias for the package (different linux distrosname the packages differently).

What this code does

This code only solves the first part of the problem which is the translation of a package name to a different alias.

Test Input

Code: Select all

rxvt-unicode,urxvt,urxvt-unicode
gtk+,gtk+2*
gtkdialog,gtkdialog3
dbus*,libdbus*,libdbus-glib*
mesa,mesa_*,libgl1-mesa*,mesa-common*
sane,sane-backends
samba,samba-tng,samba_*,mountcifs
udev,udev_*,libudev*,libgudev*
xdg_puppy,xdg-utils
perl_tiny,perl-base,perl-modules,perlapi*
* alias_file="$(realpath ~/cat_test_alias.in)"
** This alias data was taken from sc0ttmann's pkg (/usr/sbin/pkg#L212)

Test Output

Code: Select all

function get_canonical_name(s){                  
  if (s in CANONICAL_ARY){                       
   return CANONICAL_ARY[s]                       
  } else {                                       
    switch(s){                                   
      case /gtk\+2.*/:                              
        return "gtk+"                          
      case /libdbus-glib.*/:                              
        return "dbus"                          
      case /libdbus.*/:                              
        return "dbus"                          
      case /dbus.*/:                              
        return "dbus"                          
      case /mesa-common.*/:                              
        return "mesa"                          
      case /libgl1-mesa.*/:                              
        return "mesa"                          
      case /mesa_.*/:                              
        return "mesa"                          
      case /samba_.*/:                              
        return "samba"                          
      case /libgudev.*/:                              
        return "udev"                          
      case /libudev.*/:                              
        return "udev"                          
      case /udev_.*/:                              
        return "udev"                          
      case /perlapi.*/:                              
        return "perl_tiny"                          
  }                                              
  return s                                       
}                                                
init_CANONICAL_ARY(){                            
  CANONICAL_ARY["perl_tiny"]="perl_tiny"
  CANONICAL_ARY["perl-base"]="perl_tiny"
  CANONICAL_ARY["samba-tng"]="samba"
  CANONICAL_ARY["sane-backends"]="sane"
  CANONICAL_ARY["gtkdialog3"]="gtkdialog"
  CANONICAL_ARY["xdg_puppy"]="xdg_puppy"
  CANONICAL_ARY["mountcifs"]="samba"
  CANONICAL_ARY["mesa"]="mesa"
  CANONICAL_ARY["urxvt-unicode"]="rxvt-unicode"
  CANONICAL_ARY["gtkdialog"]="gtkdialog"
  CANONICAL_ARY["perl-modules"]="perl_tiny"
  CANONICAL_ARY["urxvt"]="rxvt-unicode"
  CANONICAL_ARY["rxvt-unicode"]="rxvt-unicode"
  CANONICAL_ARY["xdg-utils"]="xdg_puppy"
  CANONICAL_ARY["gtk+"]="gtk+"
  CANONICAL_ARY["samba"]="samba"
  CANONICAL_ARY["sane"]="sane"
  CANONICAL_ARY["udev"]="udev"
}
The actual code can be found in pastebin:

https://pastebin.com/kwmtNern

and I have even rougher code than this about how I will use the above generated code:

https://pastebin.com/4Pw5QrW5

I don't recommend looking too much into either these these pastbin scrips yet because it is not finished. What I will say though is that since AWK code is applied repeadly over a data file then it makes since to me to optimize it and if you can optimize AWK code by having the AWK code be written by other code the so be it. Style police be damned!

Anyway, when I finish everything that I have in mind here I will apply it to resolving (at least in part) towards resolving:
Issue #44 in pkg - slack2pup and ppa2pup can't get good package categories

Other AWK Topics

AWK: match($2,/^(.*[^:digit:])([:digit:]*$|$)/,pkg_split)
awk: Converting deb dependency info into puppy format
AWK Based Version Comparison
Find me on [url=https://www.minds.com/ns_tidder]minds[/url] and on [url=https://www.pearltrees.com/s243a/puppy-linux/id12399810]pearltrees[/url].

s243a
Posts: 2580
Joined: Tue 02 Sep 2014, 04:48
Contact:

#2 Post by s243a »

I've made some further progress on this. I was able to parse the following test input:

Code: Select all

...
PACKAGE NAME:  compiz-0.8.8-i586-4.txz
PACKAGE LOCATION:  ./slackware/x
PACKAGE SIZE (compressed):  748 K
PACKAGE SIZE (uncompressed):  5120 K
PACKAGE REQUIRED:  atk,bzip2,cairo,dbus,dbus-glib,expat,fontconfig,freetype,fuse,gcc,gcc-g++,gdk-pixbuf2,glib2,glu,gtk+2,harfbuzz,libICE,libSM,libX11,libXau,libXcomposite,libXcursor,libXdamage,libXdmcp,libXext,libXfixes,libXi,libXinerama,libXrandr,libXrender,libXres,libXxf86vm,libcroco,libdrm,libffi,libpng,librsvg,libwnck,libxcb,libxml2,libxshmfence,libxslt,mesa,pango,pixman,startup-notification,util-linux,xcb-util,xz,zlib
PACKAGE CONFLICTS:  
PACKAGE SUGGESTS:  
PACKAGE DESCRIPTION:
compiz: compiz (OpenGL window and compositing manager)
compiz:
compiz: Compiz is an OpenGL compositing manager that use
compiz: GLX_EXT_texture_from_pixmap for binding redirected top-level windows
compiz: to texture objects. It has a flexible plug-in system and it is designed
compiz: to run well on most graphics hardware.
compiz:
...
~/SLACO_TEST.in

to produce
compiz_0.8.8|compiz|0.8.8|4|Desktop|5120K|slackware/x|compiz-0.8.8-i586-4.txz|+atk,+bzip2,+cairo,+dbus,+dbus-glib,+expat,+fontconfig,+freetype,+fuse,+gcc,+gcc-g++,+gdk-pixbuf2,+glib2,+glu,+gtk+2,+harfbuzz,+libICE,+libSM,+libX11,+libXau,+libXcomposite,+libXcursor,+libXdamage,+libXdmcp,+libXext,+libXfixes,+libXi,+libXinerama,+libXrandr,+libXrender,+libXres,+libXxf86vm,+libcroco,+libdrm,+libffi,+libpng,+librsvg,+libwnck,+libxcb,+libxml2,+libxshmfence,+libxslt,+mesa,+pango,+pixman,+startup-notification,+util-linux,+xcb-util,+xz,+zlib|Compiz is an OpenGL compositing manager that use|slackware|14.2|
Here is a link to the fully generated AWK code:

https://pastebin.com/cCp0sH7N

This test code will be incorporated into slack2pup_gawk, which I hope to merge into sc0ttman's package manager (i.e. pkg).

Some related files:
~/cat_test_alias.in some aliases names for packages. I don't think I need this at the moment but it could be useful if I apply the code to non slackware versions of linux (e.g. debian).

~/cat_test_cat.dat. A truncated version of /usr/local/petget/categories.dat for testing.
~/testcat #This is the function that I call to drive the test. It contains mostly the awk code that is present in slack2pup_gawk
~/build_cannonical This contains most of the awk code that is not part of my orginal slack2pup_gawk code. In this file, the awk code used to translate a package name to a connonical alias, is actually generated by awk code -- as I noted in my first post.

Anyway, when I get this all working, I'll have to see how much speed improvement there is vs the original slack2pup, that was written by scotmann. I'm not sure if slack2pup had performance issues but I do know that the related code to convert a debian repo ppa2pup, did have major speed issues. I think that the code in slack2pup is based on a script in 0setup (see comment by 01micko) and since 0setup is fast enough maybe slack2pup also is. We'll see. I need to look more into the performance of the current slack2pup and review the related code in 0setup.

Anyway, for this kind of application, my impression is that gawk is faster and more flexiable than the ash based script approaches. It will take further testing for me to verify this.
Find me on [url=https://www.minds.com/ns_tidder]minds[/url] and on [url=https://www.pearltrees.com/s243a/puppy-linux/id12399810]pearltrees[/url].

Post Reply