John Pierce

. SLAE, Security+

Project Euler exercise 1 in C, Python 3 and x86 assembly language

Same project in C, Python 3 and x86 assembly in linux

I'm doing a bit of work just to refresh my knowledge of syntax, and maybe to learn some new stuff to boot.  I thought it would be cool to program some simple things in several different languages as an exercise.  Here's the first one I worked on:

Euler project #1, sum all the integers from 0 to 1000 divisible by 3 or 5.

The simple solution would be to set up a loop from 0 to 1000, test each number to see if it's divisible, and if so, add that number to a running sum.  Analyzing the problem a bit, though, we know that's a lot more work than necessary.  We could easily limit the universe using just additions and subtractions as follows:

    sum = 0
    for n = 0 to 1000 step 3
        sum += n
    for n = 0 to 1000 step 5
        sum += n
    for n = 0 to 1000 step 15
        sum -= n
    print sum

That still takes a lot of steps, and high school math taught us that we can sum a series easily in one step.  Taking an example case of summing all the positive integers divisible by 3 less than 1000, we get the series {3,6,9,...999}.  We know there are 333 values (999/3), and we know that 999+3 = 996 + 6 = 993 + 9 = 1002.  If we multiply 1002 * 333/2 we get 166,383 which is the sum of the universe we're looking for.  If we sum this value for 3 and for 5 then subtract the duplicates (3 and 5 are both factors so these will be double counted), we'll have the result we're looking for.   Here's a solution that doesn't use any loops:

    sum = 0
    maxMultiplier = 999 div 3    ; get the integer multiplier for maximum multiple of 3
    maxProduct = 3 * maxMultiplier
    sum = (maxProduct+3)*(maxMultiplier/2)
    maxMultiplier = 999 div 5    ; get the integer multiplier for maximum multiple of 5
    maxProduct = 5 * maxMultiplier
    sum += (maxProduct+5)*(maxMultiplier/2)
    maxMultiplier = 999 div 15    ; get the integer multiplier for maximum multiple of 15
    maxProduct = 15 * maxMultiplier
    sum -= (maxProduct+15)*(maxMultiplier/2)

    One could generalize so that the divisors could be changed or the minimum/maximum could change as well.

    Below is the same program written in C, Python 3 and x86 linux.  I'll do the generalization in C and Python, but assembly language will just handle this particular instance.  I wrote newton as a function in each of the programs even though it didn't save many lines of coding in the higher level languages.  This was helpful in the final step which was to debug the C program to see how gcc implements the floating point math.


/* prjeuler1.c
    project euler challenge 1 -
    sum all the numbers divisible by 3 or 5 for 0<x<1000
    This program is generalized to sum from an arbitrary minimum
    to maximum but requires two divisors.  It defaults to given
    arguments if none are provided.


#include <stdio.h>

int Newton(int iMax, int iFact)
        int iMM;        // maximum multiple of factor in universe
        iMM = iMax/iFact;    
        return (int)(iMM+1)*iFact*iMM/2.0;


int main(int argc, char *argv[])
        int iSum = 0;                            
        int iArg1Max = 1000;
        int iArg2Factor1 = 3;
        int iArg3Factor2 = 5;
        if (argc == 4)     // are arguments provided (no check on valid)
            iArg1Max = atoi(argv[1]);
            iArg2Factor1 = atoi(argv[2]);
            iArg3Factor2 = atoi(argv[3]);
        iArg1Max -= 1;    // don't want to include the maximum value in the universe
        iSum += Newton(iArg1Max,iArg2Factor1);
        iSum += Newton(iArg1Max,iArg3Factor2);
        iSum -= Newton(iArg1Max,iArg2Factor1*iArg3Factor2);
        printf("Sum of the numbers less than %d divisible by %d or %d is %d.\n", iArg1Max+1, iArg2Factor1, iArg3Factor2, iSum);


And here are a couple of runs:

Sum of the numbers less than 1000 divisible by 3 or 5 is 233168.

./prjeuler1 10 3 5
Sum of the numbers less than 10 divisible by 3 or 5 is 23.

What about Python?  I'll generate the same output using the same method.


import sys

def Newton(iMax,iFact) :
#	iMM, remainder = divmod(iMax,iFact)
	iMM = iMax//iFact

if len(sys.argv) == 4:	# arguments given
	iArg1Max = int(sys.argv[1])
	iArg2Factor1 = int(sys.argv[2])
	iArg3Factor2 = int(sys.argv[3])
	iArg1Max = 1000
	iArg2Factor1 = 3
	iArg3Factor2 = 5
iSum = Newton(iArg1Max-1,iArg2Factor1)
iSum += Newton(iArg1Max-1,iArg3Factor2)
iSum -= Newton(iArg1Max-1,iArg2Factor1*iArg3Factor2)
print("\nSum of the numbers less than %d divisible by %d or %d is %d.\n" % (iArg1Max,iArg2Factor1,iArg3Factor2,iSum))


And here's some output:  

./ 1000 3 5

Sum of the numbers less than 1000 divisible by 3 or 5 is 233168.

./ 10 3 5

Sum of the numbers less than 10 divisible by 3 or 5 is 23.

Finally, I'll do the same thing in assembly language, without the option of changing the inputs at run time.  I'll have to do both integer and floating point math to use the same technique, so many of the instructions will begin with "f" and will deal with the floating-point processor unit (FPU).  Instructions to convert between integer and float start with "fi" for example fild = load (push) integer to FPU stack, st0.  Here's the code:     


; Filename:     prjeuler1asm.nasm
; Author:   John Pierce

bits 32
global main

section .text

    ; ebx contains iFact
    ; return value = (iMM+1)*iFact*iMM/2
    push ebx                            ; store for later
    mov eax, [iMax]                     ; numerator
    mov ecx, [esp]                      ; iFact in denominator
    xor edx,edx                         ; zero out for remainder
    div ecx                             ; whole number part in eax, remainder in edx
    mov [iMM],eax                       ; iMM = 333 in first case
    fild dword [iMM]                    ; push iMM to fp
    mov eax,2
    push eax
    fild dword [esp]                    ; push 2 to fp
    fdiv                                ; st0 = iMM/2 after inherent pop (166.5 in the first case)
    pop eax                             ; clear the value from the stack, now esp points back to iFact (3 in first case)
    fild dword [esp]                    ; push iFact
    fmul                                ; st0 = iMM/2 * iFact after inherent pop (499.5 in the first case)
    fild dword [iMM]                    ; push retrieved iMM to st0
    fld1                                ; push +1.0 onto the stack
    fadd                                ; st0 = iMM + 1 after inherent pop (334 in the first case), st1 contains iMM/2*iFact
    fmul                                ; st0 = (iMM+1)*iFact*iMM/2 after inherent pop (166833 in the first case)
    fist dword [result]                 ; and store the result
    pop ebx                             ; restore the stack

    mov eax, [iMax]                                ; requirement is "less than" so I'll decrement by 1
    dec eax
    mov [iMax], eax
    mov ebx, [iFact1]                              ; set up for the first call
    call newton
    mov eax, [result]                              ; iSum = 0 accumulate the result
    mov [iSum],eax        
    mov ebx, [iFact2]                              ; and set up for second call
    call newton
    mov eax, [iSum]
    add eax, [result]
    mov [iSum],eax
    mov ebx,[iFact1]
    mov eax,[iFact2]
    xor edx,edx                                    ; zero out, but I know there won't be overflow so no checks
    mul ebx                                        ; multiply the factors together
    mov ebx,eax                                    ; store the factor for the call
    call newton
    mov eax, [iSum]
    sub eax, [result]                              ; subtract the sum for numbers divisible by both
    push eax                                       ; store iSum to the stack
    push dword [iFact2]                            ; factor 1 to stack
    push dword [iFact1]                            ; factor 2 to the stack
    mov eax,[iMax]
    inc eax                                        ; restore iMax to original value
    push eax                                       ; iMax to the stack
    push format_str                                ; *string to the stack
    extern printf
    call printf                                    ; call the c function to print the info
    add esp,20                                     ; restore the stack pointer to where it was at the start
    mov eax,0

section .data
    iMax dd 1000           ; sum numbers less than this
    iFact1 dd 3            ; that are divisible by this
    iFact2 dd 5            ; or this
    iSum dd 0              ; accumulator for the answer
    iMM dd 0               ; maximum multiplier (how many numbers to sum)
    format_str db "Sum of the numbers less than %d divisible by %d or %d is %d.", 10, 0 ; w/ newline and null terminator

section .bss
    result resw 1          ; reserve a dw for Newton result

And the output:

nasm -f elf32 prjeuler1asm.nasm
gcc -m32 -o prjeuler1asm prjeuler1asm.o

Sum of the numbers less than 1000 divisible by 3 or 5 is 233168.

So, having completed this exercise, Python and C look pretty similar.  The instructions only differed in minor ways, so close that I moved the code over from C and edited.  As expected, the x86 assembly took longer, and it's a bit confusing without any comments, but it's not too bad.  Particularly, the use of C libraries can reduce complexity of tasks like printing out results.  It was a fun exercise, and I would recommend it to anyone interested in exploring exactly how processes might be accomplished.  It gives a feel for what is happening under the hood when math is performed.  Having cut my teeth on 8086 assembly (16 bit with no FPU), the x86 instruction set makes a lot of tasks a comparative cake walk.  Now that I'm done, it's time to study the code from gcc  in gdb so maybe I can learn some new tricks. 



disas Newton
Dump of assembler code for function Newton:
   0x0804844d <+0>:	push   ebp
   0x0804844e <+1>:	mov    ebp,esp
   0x08048450 <+3>:	sub    esp,0x18
   0x08048453 <+6>:	mov    eax,DWORD PTR [ebp+0x8]         ; iMax, 0x3e7 default case
   0x08048456 <+9>:	cdq                                    ; convert double to quad
   0x08048457 <+10>:	idiv   DWORD PTR [ebp+0xc]             ; iMax/iFact, 0x3e7/0x3 default case, first call
   0x0804845a <+13>:	mov    DWORD PTR [ebp-0x4],eax 	       ; iMM (333)
   0x0804845d <+16>:	mov    eax,DWORD PTR [ebp-0x4]         ; wasted step, data is already in eax
   0x08048460 <+19>:	add    eax,0x1                         ; inc eax, now iMM+1
   0x08048463 <+22>:	imul   eax,DWORD PTR [ebp+0xc]         ; eax now (iMM+1)*iFact
   0x08048467 <+26>:	imul   eax,DWORD PTR [ebp-0x4]         ; eax now (iMM+1)*iFact*iMM (0x51672 = 333,666)
   0x0804846b <+30>:	mov    DWORD PTR [ebp-0x14],eax
   0x0804846e <+33>:	fild   DWORD PTR [ebp-0x14]            ; push integer to FPU
   0x08048471 <+36>:	fld    QWORD PTR ds:0x8048660          ; push float to FPU
                                                               ; st0            2	
                                                               ; st1            333666	
   0x08048477 <+42>:	fdivrp st(1),st                        ; st1/st0, store in st1, pop st0
                                                               ; st0  166833
   0x08048479 <+44>:	fnstcw WORD PTR [ebp-0x16]             ; store precision control
   0x0804847c <+47>:	movzx  eax,WORD PTR [ebp-0x16]         ; mov and zero extend (eax = 0x37f)
   0x08048480 <+51>:	mov    ah,0xc                          ; alter the top nibble (eax = 0xc7f)
   0x08048482 <+53>:	mov    WORD PTR [ebp-0x18],ax
   0x08048486 <+57>:	fldcw  WORD PTR [ebp-0x18]             ; load control word for rounding, precision, etc. 
   0x08048489 <+60>:	fistp  DWORD PTR [ebp-0x14]            ; store what's in st0 with pop, (166,833)
   0x0804848c <+63>:	fldcw  WORD PTR [ebp-0x16]             ; reset precision, etc.
   0x0804848f <+66>:	mov    eax,DWORD PTR [ebp-0x14]        ; store the result in eax for return
   0x08048492 <+69>:	leave  
   0x08048493 <+70>:	ret    
End of assembler dump.

It doesn't look a whole lot different from mine as far as efficiency.  They did a bunch of integer math first, then loaded to FPU for the div by 2.  I did most of mine within the FPU instead.  A couple of innocuous changes that I didn't implement (setting precision, convert double to quad) were reasonable as they didn't know my fixed constraints.  I think we were both pretty efficient.



Most Recent Articles

First bit::

This is a writeup of the format string vulnerability in level 4 of the 64bitprimer VM from vulnhu

First bit::

Installation of the software to make a yubikey 4 work in FIDO U2F mode on Debian Jessie i386

First bit::

Lesson(s) learned

First bit::

This one stumped me. Overall, it was a great competition for me as I got to learn a whole lot of new things. I had never worked on a Mac, other than as a user, had never used Hopper, lldb or any of the other tools for reversing on a Mac, and haven't got any experience in the Objective C/Swift framework.

First bit::

4 rounds, lots of debugging


Categories: Network security, Videos
First bit::

Explains the workings of a DMZ, walks through setting up and testing of a DMZ in a virtual machine lab environment

Categories: Network security, Videos
First bit::

In this video I go through the process of setting up an SSH tunnel to hide an IP and also setting

Categories: Exploits, Videos
First bit::

Useful for someone who is interested in what a buffer overflow is. Does not go into the details of development, just explains generally and demonstrates the use of one.

Categories: Exploits, Videos
First bit::

a demonstration of a vulnerability discovered and published by Muts in 2004, exploited on a Windows XP SP3 machine using Python, Immunity Debugger, and Metasploit.

Categories: Network security, Videos
First bit::

In this video I demo some simple iptables rules and show them how to perform network traffic analysis to test them out.