Bit obscure,
I am trying to write a scanner for compiled code. I know Elf functions in a similar way to PE header.
Looking at the compiled file from a simple helloworld.c program I notice there is a lot of padding in the file. I would postulate that much of this padding can be removed so I can get nice compact binary files.
a side project - What I want to do is scan an executable, check the format of the executable and look for weirdness in the ELF. Similar to a Pe header check for bizarre sparseness ect..
But how is it implemented in Puppy?
I am using the tweak hex editor to look at the files,
I found this article
http://www.linuxjournal.com/node/1060/print
readelf also is nice, been playing with that today.
Elf file header
Elf file header
Last edited by wibble on Tue 13 Aug 2013, 04:03, edited 1 time in total.
Sounds like you want sstrip from
http://www.muppetlabs.com/~breadbox/sof ... ckers.html
or
https://github.com/BR903/ELFkickers
elf.h is the header that defines the ELF format.
http://www.muppetlabs.com/~breadbox/sof ... ckers.html
or
https://github.com/BR903/ELFkickers
elf.h is the header that defines the ELF format.
Thanks that looks like what I need
further update...
Ok so I spent a fun evening encoding the current elf em_machine list.
Started to freak out thinking where i was going to get code samples for over 200+ device familys.. feeling dejected I thought well perhaps i can just tweak the elf header and get the machine to think its a file from a diffrent machine family...
loading up tweak I spot the em_machine number (03 for intel) and I change it to 02, lets make this a file from a SPARC...
Success, well partial. however I get this message..
What's up with the invalid byte order?
Thinking things through, it probably was a bit newbish to assume that I could fool file program to accept that this was a legit file from another machine type when the other header flags probably didnt make sence for the machine. (example, perhaps SPARC does not have 32bits? perhaps its a 64bit ect?) I don't know...
This presents a bit of a chore as I now need to get the documentation for the 200 odd processor/chip familys and figure out the elf file setup.
just assembling the documentation will take ages...
update...
As I suspected.
becomes...
And that is without messing to much with the file. ok I lost portability with this method but its now a 29th of the size of the original file! hooray!
But once you start its hard to stop, optimization I have come to find is programming equivalent of crack.
Looking at the program, there is still a fair bit of fat on this... it can be reduced further i think... according to the tutorial its possible to get it to 45 bytes... or less than 158th of the original size.. nice.. Now i can see how embedded system programs can function on 4096 bytes of memory...
The program as you can see was very simple, just assigns value 42 (the ultimate answer!)
time to mess with the program some more.
Excellent tutorial!
http://www.muppetlabs.com/~breadbox/sof ... eensy.html
only thing i found was the comment is already removed in my version of gcc (all good.. no need to remove that like in the example.)
only problem I have so far is ...
hmmm.... time to go further down the rabbit hole.
In fairness I can understand why the libraries in C are bloated like that, you need a lot of redundant functionality to provide the portability thats necessary for many things.
This is also only a trivial program, with a real application this would be a much more painful process for sure.
91 bytes after rolling my own elf header...
nice size reduction from the 240 bytes. no real magic yet...
Like everything else in life the last mile is the hardest i guess...
Oh yes, from this point on, it was serious mangling of the file format. But surprisingly the current implementation of the linux kernel does not actually do a whole lot with the ELF data, you can use the portions that it does not care about to store bytes for other uses..
As I suspected from the beginning the padding could be used to store program code!
But in all honesty, 80% of the work was in getting that last 46 bytes. Its just not worth it.
But thing is getting to the 91bytes was not a massive slog that I thought it would be. with that kind of size optimization I would be well chuffed even with this.
That said it is extremely cool.
45 BYTE executable....
But then I got to thinking... exit interrupt was pretty useful. I wonder what the other interrupts do...
Look up the file
"/usr/include/asm/unistd.h" 5L, 82C
what the dickens...
No big deal, looks like that tutorial was written before the advent of 64bit libraries for the assembler...
anyway even a simpleton like myself can see its just a straight choice between 64 and 32 bit libraries... lets go with the sane choice (as this is puppy) and go for unistd_32.h..
Wait - thats not such a mind melt, these are system commands! hmmmmmmm this might not be so bad!
So here is a question, where can I find the other system command list for other systems? Perhaps I am being naive but I would assume that it follows a similar logic say for embedded devices ect?
As a simple exercise I would like to write a script to enable cross-platform compilation of the code.
All it would need is the table of interrupts, I pass what system I want to run the code on and it writes a asm file for me after checking what the interrupt code is on that particular platform.
further update...
Ok so I spent a fun evening encoding the current elf em_machine list.
Started to freak out thinking where i was going to get code samples for over 200+ device familys.. feeling dejected I thought well perhaps i can just tweak the elf header and get the machine to think its a file from a diffrent machine family...
loading up tweak I spot the em_machine number (03 for intel) and I change it to 02, lets make this a file from a SPARC...
Success, well partial. however I get this message..
Code: Select all
tiny.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
sh-3.00# tweak goat.o
sh-3.00# file goat.o
goat.o: ELF 32-bit LSB relocatable, SPARC - invalid byte order, version 1 (SYSV), not stripped
sh-3.00# tweak goat.o
sh-3.00# file goat.o
goat.o: ELF 32-bit LSB relocatable, AT&T WE32100 - invalid byte order, version 1 (SYSV), not stripped
Thinking things through, it probably was a bit newbish to assume that I could fool file program to accept that this was a legit file from another machine type when the other header flags probably didnt make sence for the machine. (example, perhaps SPARC does not have 32bits? perhaps its a 64bit ect?) I don't know...
This presents a bit of a chore as I now need to get the documentation for the 200 odd processor/chip familys and figure out the elf file setup.
just assembling the documentation will take ages...
update...
As I suspected.
Code: Select all
sh-3.00# gcc -Wall tiny.c
sh-3.00# ./a.out ; echo $?
42
sh-3.00# wc -c a.out
7122 a.out
Code: Select all
sh-3.00# ld -s tiny.o
sh-3.00# ./a.out ; echo $?
42
sh-3.00# wc -c a.out
240 a.out
And that is without messing to much with the file. ok I lost portability with this method but its now a 29th of the size of the original file! hooray!
But once you start its hard to stop, optimization I have come to find is programming equivalent of crack.
Looking at the program, there is still a fair bit of fat on this... it can be reduced further i think... according to the tutorial its possible to get it to 45 bytes... or less than 158th of the original size.. nice.. Now i can see how embedded system programs can function on 4096 bytes of memory...
The program as you can see was very simple, just assigns value 42 (the ultimate answer!)
time to mess with the program some more.
Excellent tutorial!
http://www.muppetlabs.com/~breadbox/sof ... eensy.html
only thing i found was the comment is already removed in my version of gcc (all good.. no need to remove that like in the example.)
only problem I have so far is ...
Code: Select all
sh-3.00# gcc -s -nostdlib tiny.s
tiny.s: Assembler messages:
tiny.s:1: Error: no such instruction:
In fairness I can understand why the libraries in C are bloated like that, you need a lot of redundant functionality to provide the portability thats necessary for many things.
This is also only a trivial program, with a real application this would be a much more painful process for sure.
Code: Select all
sh-3.00# nasm -f bin -o a.out tiny.asm
sh-3.00# chmod +x a.out
sh-3.00# ./a.out ; echo $?
42
sh-3.00# wc -c a.out
91 a.out
nice size reduction from the 240 bytes. no real magic yet...
Like everything else in life the last mile is the hardest i guess...
Oh yes, from this point on, it was serious mangling of the file format. But surprisingly the current implementation of the linux kernel does not actually do a whole lot with the ELF data, you can use the portions that it does not care about to store bytes for other uses..
As I suspected from the beginning the padding could be used to store program code!
But in all honesty, 80% of the work was in getting that last 46 bytes. Its just not worth it.
But thing is getting to the 91bytes was not a massive slog that I thought it would be. with that kind of size optimization I would be well chuffed even with this.
That said it is extremely cool.
Code: Select all
sh-3.00# nasm -f bin -o a.out tiny.asm
sh-3.00# chmod +x a.out
sh-3.00# ./a.out ; echo $?
42
sh-3.00# wc -c a.out
45 a.out
But then I got to thinking... exit interrupt was pretty useful. I wonder what the other interrupts do...
Look up the file
"/usr/include/asm/unistd.h" 5L, 82C
what the dickens...
Code: Select all
# ifdef __i386__
# include "unistd_32.h"
# else
# include "unistd_64.h"
# endif
anyway even a simpleton like myself can see its just a straight choice between 64 and 32 bit libraries... lets go with the sane choice (as this is puppy) and go for unistd_32.h..
Code: Select all
#ifndef _ASM_X86_UNISTD_32_H
#define _ASM_X86_UNISTD_32_H
...
#define __NR_restart_syscall 0
#define __NR_exit 1
#define __NR_fork 2
#define __NR_read 3
#define __NR_write 4
#define __NR_open 5
#define __NR_close 6
#define __NR_waitpid 7
#define __NR_creat 8
#define __NR_link 9
#define __NR_unlink 10
#define __NR_execve 11
#define __NR_chdir 12
#define __NR_time 13
#define __NR_mknod 14
#define __NR_chmod 15
#define __NR_lchown 16
#define __NR_break 17
#define __NR_oldstat 18
#define __NR_lseek 19
#define __NR_getpid 20
#define __NR_mount 21
#define __NR_umount 22
#define __NR_setuid 23
#define __NR_getuid 24
#define __NR_stime 25
#define __NR_ptrace 26
#define __NR_alarm 27
#define __NR_oldfstat 28
#define __NR_pause 29
#define __NR_utime 30
#define __NR_stty 31
#define __NR_gtty 32
#define __NR_access 33
#define __NR_nice 34
#define __NR_ftime 35
#define __NR_sync 36
#define __NR_kill 37
#define __NR_rename 38
#define __NR_mkdir 39
#define __NR_rmdir 40
#define __NR_dup 41
#define __NR_pipe 42
...
So here is a question, where can I find the other system command list for other systems? Perhaps I am being naive but I would assume that it follows a similar logic say for embedded devices ect?
As a simple exercise I would like to write a script to enable cross-platform compilation of the code.
All it would need is the table of interrupts, I pass what system I want to run the code on and it writes a asm file for me after checking what the interrupt code is on that particular platform.
- technosaurus
- Posts: 4853
- Joined: Mon 19 May 2008, 01:24
- Location: Blue Springs, MO
- Contact:
You may find my libc.h useful here
http://murga-linux.com/puppy/viewtopic.php?t=80916
It has examples on how to do syscalls using the NR_
With the right compiler options+strip/sstrip it can build a statically linked elf executable in <300 bytes
I am currently splitting it into libc.c and libc.h with the object file weighing in at <10kb. Not all fxns are implemented and are only optimized for size (hopefully being statically linked and small enough to fit entirely in cache helps with speed.)
http://murga-linux.com/puppy/viewtopic.php?t=80916
It has examples on how to do syscalls using the NR_
With the right compiler options+strip/sstrip it can build a statically linked elf executable in <300 bytes
I am currently splitting it into libc.c and libc.h with the object file weighing in at <10kb. Not all fxns are implemented and are only optimized for size (hopefully being statically linked and small enough to fit entirely in cache helps with speed.)
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].
- technosaurus
- Posts: 4853
- Joined: Mon 19 May 2008, 01:24
- Location: Blue Springs, MO
- Contact:
You may find my libc.h useful here
http://murga-linux.com/puppy/viewtopic.php?t=80916
It has examples on how to do syscalls using the NR_
With the right compiler options+strip/sstrip it can build a statically linked elf executable in <300 bytes
I am currently splitting it into libc.c and libc.h with the object file weighing in at <10kb. Not all fxns are implemented and are only optimized for size (hopefully being statically linked and small enough to fit entirely in cache helps with speed.)
http://murga-linux.com/puppy/viewtopic.php?t=80916
It has examples on how to do syscalls using the NR_
With the right compiler options+strip/sstrip it can build a statically linked elf executable in <300 bytes
I am currently splitting it into libc.c and libc.h with the object file weighing in at <10kb. Not all fxns are implemented and are only optimized for size (hopefully being statically linked and small enough to fit entirely in cache helps with speed.)
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].