input file with "magic numbers" (DEADCAFEh, etc.) into array
given output from DumpPE -disasm:
    get Win32 filename from first line of dumppe output
    ignore everything until see "Disassembly"
    for each line in disassembly
        if it's a function label
            if opstring length > minlength and 
            if opstring contains at least one ret or jmp
                if opstring length is maximum, might be junk so:
                   chop it off at the first ret or jmp
                output filename, previous function name, and opstring
            set up for next opstring:
               set new function name; clear opstring; oplength = 0
        else if it's an instruction line, and opstring isn't at maxlength
            if line contains a large hex operand
                if the hex operand is found in the "magic" array
                   add mnemonic "_" magic number to opstring
            else if it's a Windows API call
                add API name to opstring without trailing 'W' or 'A'
            else if it's a branch target
                 add "loc" to opstring
            else if it's common (mov, push, pop, add esp)
                if it's an API mov
                   add "mov_" and API name to opstring
                else
                   do nothing
            else if it's junk code (nop, int 3, etc.)
                do nothing
            else if it's data ; relying on DumpPE to separate code/data
                do nothing
            else if we've seen this same thing many times in a row
                do nothing
            else if mnemonic has prefix (rep, lock, etc.)
                add prefix "_" mnemonic to opstring
            else
                add mnemonic to opstring
            if added to opstring
                oplength++
    stop processing when see hex dump
    output last one
send output to mkmd5db

Figure 4: Pseudocode for the Opstring program.

Back to Article