A Coding Implementation to Recover Hidden Malware IOCs with FLARE-FLOSS Beyond Classic Strings Analysis

In this tutorial, we explore how FLARE-FLOSS helps us recover hidden and obfuscated strings from a Windows PE file. We begin by setting up FLOSS and the MinGW-w64 cross-compiler. We synthesize a small malware-like executable that hides strings using multiple techniques, including static strings, stack-built strings, tight strings, and XOR-decoded strings. After that, we compare the limitations of the traditional string utility with FLOSS’s deeper static analysis and emulation-based string recovery. Through this process, we learn how analysts can uncover URLs, registry paths, suspicious APIs, and other indicators of compromise that plain string extraction often misses.

Copy Code

import subprocess, os, sys, json, re, time
from pathlib import Path


def banner(t): print("\n" + "═"*72 + f"\n  {t}\n" + "═"*72)
def sh(cmd, quiet=False, check=False):
   r = subprocess.run(cmd, shell=True, capture_output=True, text=True)
   if not quiet:
       if r.stdout: print(r.stdout.rstrip()[:4000])
       if r.returncode and r.stderr: print("[stderr]", r.stderr.rstrip()[:1500], file=sys.stderr)
   if check and r.returncode: raise RuntimeError(cmd)
   return r


banner("STEP 1 — Install FLOSS + MinGW-w64")
sh("pip install -q flare-floss")
sh("apt-get -qq update && apt-get -qq install -y mingw-w64 binutils-mingw-w64", quiet=True)
sh("floss --version 2>&1 | head -3")

We set up the core Python imports, helper functions, and command runner used throughout the tutorial. We then install FLARE-FLOSS and the MinGW-w64 cross-compiler. Also, we verify the FLOSS installation by checking its version before moving into executable generation.

Copy Code

banner("STEP 2 — Build a synthetic malware-like PE")
WORK = Path("/content/floss_tutorial"); WORK.mkdir(exist_ok=True); os.chdir(WORK)


SECRETS = [
   ("FAKE_FLAG_DECODED_SECRET",                0x37),
   ("https://c2-totally-fake.example/beacon",  0x42),
   ("SOFTWARE\\Microsoft\\Run\\PersistDemo",   0x5A),
   ("kernel32.dll!VirtualAllocEx",             0x29),
]
def xor_arr(s, k): return ",".join(f"0x{(ord(c)^k)&0xff:02x}" for c in s)


c = [
   '#include <stdio.h>',
   '__attribute__((noinline)) static void xord(char* b, int n, int k){',
   '}',
   'int main(void){',
   '    puts("PLAIN_STATIC_HELLO_FROM_FLOSS_TUTORIAL");',
   '',
   '    volatile char stk[20];',
]
seq = "STACK_BUILT_STRING"
for i, ch in enumerate(seq): c.append(f"    stk[{i}]='{ch}';")
c += [f"    stk[{len(seq)}]=0;", "    puts((char*)stk);", "",
     "    volatile char tght[]={'T','I','G','H','T','-','S','T','R',0};",
     "    puts((char*)tght);", ""]
for i,(s,k) in enumerate(SECRETS):
   c += [f"    char enc{i}[] = {{ {xor_arr(s,k)}, 0x00 }};",
         f"    xord(enc{i}, {len(s)}, 0x{k:02x});",
         f"    puts(enc{i});"]
c += ["    return 0;", "}"]
(WORK/"sample.c").write_text("\n".join(c))
sh("x86_64-w64-mingw32-gcc -O0 -fno-stack-protector -o sample.exe sample.c -static-libgcc", check=True)
print(f"\n✓ sample.exe built ({(WORK/'sample.exe').stat().st_size:,} bytes)")
sh("file sample.exe")

We create a synthetic Windows PE file that serves as a safe malware analysis sample for learning string recovery. We hide strings using multiple techniques, including plain static text, stack-built strings, tight strings, and XOR-encoded secrets. We then compile the generated C source into sample.exe so FLOSS can analyze it like a real Windows executable.

Copy Code

banner("STEP 3 — Classic `strings` baseline (what gets MISSED)")
classic = set(subprocess.run("strings -a -n 6 sample.exe", shell=True,
             capture_output=True, text=True).stdout.splitlines())
print(f"`strings` extracted {len(classic):,} candidates total.")
print("Coverage of our planted secrets in plain `strings`:")
planted = ["PLAIN_STATIC_HELLO_FROM_FLOSS_TUTORIAL", "STACK_BUILT_STRING", "TIGHT-STR"] + [s for s,_ in SECRETS]
for s in planted:
   hit = any(s in line for line in classic)
   print(f"  {'✓ FOUND ' if hit else '✗ MISSED'}  {s}")


banner("STEP 4 — Run FLOSS (vivisect static + emulation; ~30–90 s)")
t0 = time.time()
sh("floss --json sample.exe > floss.json 2> floss.log")
print(f"\n[FLOSS finished in {time.time()-t0:.1f}s]")
print("--- last lines of FLOSS log ---")
sh("tail -15 floss.log")

We run the traditional strings command first to understand what a basic string extraction tool can and cannot detect. We compare each planted secret against the classic output to identify which strings are found and which are missed. We then run FLOSS on the executable and save both the JSON output and the log file for deeper structured analysis.

Copy Code

banner("STEP 5 — Parse FLOSS JSON output")
with open("floss.json") as f: data = json.load(f)


def extract(key):
   out = []
   for e in data.get("strings", {}).get(key, []):
       if isinstance(e, dict): out.append(e)
       else: out.append({"string": e})
   return out


static_s, stack_s = extract("static_strings"), extract("stack_strings")
tight_s,  decoded_s = extract("tight_strings"),  extract("decoded_strings")
buckets = {"static": static_s, "stack": stack_s, "tight": tight_s, "decoded": decoded_s}


print(f"  metadata.version : {data.get('metadata', {}).get('version','?')}")
for k,v in buckets.items(): print(f"  {k+'_strings':<17}: {len(v):>5}")


print("\nDecoded strings recovered (with decoder routine info):")
for e in decoded_s:
   s = e.get("string","")
   rtn = e.get("decoding_routine"); addr = e.get("address")
   rtn_s = f"0x{rtn:x}" if isinstance(rtn,int) else str(rtn)
   addr_s = f"0x{addr:x}" if isinstance(addr,int) else str(addr)
   print(f"  decoder={rtn_s:<12} at={addr_s:<12} → {s!r}")
print("\nStack / tight strings recovered:")
for e in stack_s + tight_s: print(f"  → {e.get('string','')!r}")

We load the FLOSS JSON output and organize the recovered strings into static, stack, tight, and decoded categories. We print the metadata and string counts to understand the overall recovery results. We also inspect decoded, stack, and tight strings to see which hidden values FLOSS successfully extracts.

Copy Code

banner("STEP 6 — IOC hunting in the deobfuscated strings")
PATTERNS = [
   ("URL",          re.compile(r"https?://[^\s\"<>]+")),
   ("IP",           re.compile(r"\b(?:\d{1,3}\.){3}\d{1,3}\b")),
   ("PE/script",    re.compile(r"[A-Za-z0-9_]+\.(?:exe|dll|sys|ps1|bat)\b", re.I)),
   ("Win32 API",    re.compile(r"\b(?:Reg(?:Open|Set|Create|Delete)Key(?:Ex)?A?|VirtualAlloc(?:Ex)?|CreateRemoteThread|WinExec|LoadLibraryA?|GetProcAddress|InternetOpenA?)\b")),
   ("Registry",     re.compile(r"SOFTWARE\\\\?[A-Za-z0-9_\\\\]+", re.I)),
   ("Base64-like",  re.compile(r"\b[A-Za-z0-9+/]{24,}={0,2}\b")),
]
hits = []
for kind, items in buckets.items():
   for e in items:
       s = e.get("string","")
       for label, pat in PATTERNS:
           if pat.search(s): hits.append((kind, label, s))


if hits:
   print(f"{'BUCKET':<10}{'IOC':<14}STRING")
   print("-"*72)
   for kind,lbl,s in hits[:40]:
       print(f"{kind:<10}{lbl:<14}{s[:80]}")
   print(f"\n→ {len(hits)} IOC hits total. Note: most are inside the 'decoded' bucket")
   print("  — those would be invisible to plain `strings`!")
else:
   print("(no IOC pattern matches)")


banner("STEP 7 — Visualize string-type counts and length distribution")
import matplotlib.pyplot as plt
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 4.5))


labels = list(buckets); counts = [len(v) for v in buckets.values()]
bars = ax1.bar(labels, counts, color=["#5fa8d3","#62b6cb","#cae9ff","#ff7b7b"])
ax1.set_title("FLOSS strings by type"); ax1.set_ylabel("count")
for b,n in zip(bars,counts): ax1.text(b.get_x()+b.get_width()/2, n, str(n), ha="center", va="bottom")


for kind, items in buckets.items():
   lens = [len(e.get("string","")) for e in items]
   if lens: ax2.hist(lens, bins=30, alpha=0.55, label=f"{kind} (n={len(lens)})")
ax2.set_title("String-length distribution"); ax2.set_xlabel("characters")
ax2.set_ylabel("frequency (log)"); ax2.set_yscale("log"); ax2.legend()
plt.tight_layout(); plt.savefig("floss_summary.png", dpi=110); plt.show()


print("\n✓ Tutorial complete.")
print(f"   Artifacts: {WORK/'sample.exe'}, {WORK/'floss.json'}, {WORK/'floss_summary.png'}")

We search all recovered strings for useful indicators such as URLs, IP addresses, DLL names, Win32 APIs, registry paths, and base64-like values. We display each IOC match with its corresponding string bucket so we can understand where important evidence appears. We finish by visualizing string counts and length distributions, then save the final summary image as an artifact.

In conclusion, we built a complete hands-on workflow for analyzing obfuscated strings in a synthetic Windows executable using FLARE-FLOSS. We saw how simple command-line string extraction can miss important evidence, while FLOSS can recover decoded, stack-based, and tightly constructed strings that are useful during malware triage. We also parsed FLOSS’s JSON output, hunted for IOC patterns, and visualized the recovered string categories to make the results easier to understand. It gives us a practical foundation for using FLOSS in reverse engineering, malware analysis, and security research workflows.

Check out the Full Codes here . Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us

The post A Coding Implementation to Recover Hidden Malware IOCs with FLARE-FLOSS Beyond Classic Strings Analysis appeared first on MarkTechPost.

from MarkTechPost https://ift.tt/8sZj0ug
via IFTTT

World Wire

A Coding Implementation to Recover Hidden Malware IOCs with FLARE-FLOSS Beyond Classic Strings Analysis

Comments

Post a Comment

Popular posts from this blog

Microsoft AI Proposes BitNet Distillation (BitDistill): A Lightweight Pipeline that Delivers up to 10x Memory Savings and about 2.65x CPU Speedup

Technical Deep Dive: Automating LLM Agent Mastery for Any MCP Server with MCP- RL and ART

Google AI Releases LangExtract: An Open Source Python Library that Extracts Structured Data from Unstructured Text Documents