“Load the module from memory” sounds like one thing. It is not one thing. Each operating system gives you different primitives, hides different walls, and has a different definition of what counts as “never touched disk.” I’ve been working on paker, a Python library that ships encrypted packages over the network and imports them without writing bytes to the filesystem, and the platform-by-platform reality turned out to be messier than the pitch. Post one in this series covered what paker is and what it’s for; this one is about the plumbing underneath.
The patterns below apply whether you’re building a plugin system, shipping a proprietary SDK, or doing anything where the word “executable” appears without a corresponding path on disk. paker is the concrete example I’ll point at, but the primitives are the OS’s, not mine.
Linux: memfd_create is the clean case
Linux gives you the primitive you want. memfd_create(2) was added in kernel 3.17 and creates an anonymous file descriptor backed entirely by RAM. You write bytes to the fd, then dlopen("/proc/self/fd/N"), and the loader maps the code into executable memory. The kernel never creates a filesystem entry.
import ctypes, os, importlib.util
from importlib.machinery import ExtensionFileLoader
libc = ctypes.CDLL(None, use_errno=True)
fd = libc.memfd_create(b"", 0x0001) # MFD_CLOEXEC
os.write(fd, so_bytes)
loader = ExtensionFileLoader("module_name", f"/proc/self/fd/{fd}")
spec = importlib.util.spec_from_loader("module_name", loader)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
Outside the process, ls -l /proc/<pid>/fd/ shows memfd: as the link target. Another process can reach the bytes through /proc/<pid>/fd/<fd> (the memfd man page documents exactly this pattern for shared mappings), but it has to pass normal /proc access checks: matching UID, no container isolation, ptrace_scope friendly. There’s no stable filesystem path, only a procfs entry whose accessibility depends on process-access rules.
If you pass an empty tag instead of the module name, the anonymous mapping doesn’t publish what it is.
So far so clean. Then the real gotcha.
glibc can treat a reused /proc/self/fd/N path as the same library
glibc’s dynamic linker reference-counts loaded objects and returns the same handle when the same library is opened again, as dlopen(3) describes. Fine for normal libraries whose on-disk paths are stable. With memfd, the “path” looks like /proc/self/fd/3, and 3 is just whatever file descriptor number the kernel handed you. Close that fd after loading, the next memfd_create will get fd 3 back, and in my glibc tests dlopen("/proc/self/fd/3") returned the cached handle for the previous library.
I verified this empirically in Docker on glibc: close fd 3, open a new memfd for a different module, dlopen returns the wrong shared object. No error, no warning. The code just calls into the wrong library.
So the fix is to keep the memfd fd open for the lifetime of the process. The mmap’d pages survive an fd close on their own as long as the mapping exists, so the lifetime question isn’t about keeping the code mapped. It’s that closing the fd lets the kernel reuse the same fd number for a different memfd. Once /proc/self/fd/3 can mean a different shared object later, glibc’s loaded-object cache can hand you the wrong handle. Cleaning up the fd is how you reintroduce a collision bug that will fire on the first process loading more than a handful of extensions.
Every primitive has a papercut. This one was mine.
macOS: “zero disk” is a lie on Apple Silicon
Apple Silicon and mandatory code signing killed in-memory loading as a general technique. There is no macOS equivalent of memfd_create. The closest public API is NSCreateObjectFileImageFromMemory, and Meta’s Red Team X documented the behavior change: starting with dyld3, NSLinkModule stopped doing in-memory loading and instead writes a temporary file with a NSCreateObjectFileImageFromMemory-XXXXXXXX fingerprint in $TMPDIR. The API still exists. It just doesn’t do what the name says, and the temp artifact has a predictable fingerprint that monitoring tools can see.
The practical ceiling on modern macOS is “milliseconds on disk.” You mkstemp a file with a random name, write the bytes, dlopen it, and os.unlink immediately. The directory entry disappears but the kernel keeps the mmap’d pages alive for the loaded module.
import os, tempfile, ctypes
fd, path = tempfile.mkstemp(prefix="", suffix="")
os.write(fd, so_bytes)
os.close(fd)
# Depending on how the host binary is signed, you may need to disable
# library validation or otherwise satisfy macOS code-signing rules.
handle = ctypes.CDLL(path, mode=ctypes.RTLD_NOW)
os.unlink(path) # directory entry gone, mmap pages still valid
Signing is its own rabbit hole. Apple’s forums distinguish between library validation (who signed the dylib), unsigned executable memory (JIT-style entitlements), and the deprecated reflective-loading APIs. Different problems, different entitlements. Which one you need depends on your host binary’s signature and whether the bundle you’re loading is ad-hoc signed, team-signed, or unsigned.
If your threat model assumes an attacker with root and a filesystem watcher, the milliseconds-on-disk window is still a window. If your threat model is “no plaintext bytes should survive this process,” macOS lets you get close but never all the way there.
Windows: a full PE loader, until Static TLS stops you
Windows is surprisingly good if you’re willing to bring your own loader. The _memimporter code from py2exe (itself derived from Joachim Bauch’s MemoryModule, MPL-2.0) implements a PE loader in C: VirtualAlloc a region, copy the section headers, walk the relocation table, resolve the import address table by calling GetProcAddress against LoadLibrary’d dependencies, mark pages executable, call PyInit_<modname>. No filesystem syscalls anywhere in the path.
It works. I’ve shipped numpy, Pillow, and a dozen other packages through it on both x64 and ARM64. The ARM64 work surfaced two real bugs that had been dormant because nobody had tried:
HOST_MACHINEwas hardcoded toIMAGE_FILE_MACHINE_AMD64. On ARM64 the PE header says0xAA64, and the check was rejecting every valid library.- After relocations and section copies, ARM64 needs an explicit
FlushInstructionCachebefore executing the new pages. x64 gets away without it because of cache coherency; ARM64 has split I-cache and D-cache and will happily execute the old bytes.
Fixing both of those was a Saturday. Static TLS is not a Saturday.
Static TLS is where my in-memory Windows path stops
Any extension that uses static TLS (__declspec(thread) or thread_local with static initialization) cannot be loaded safely and generally by a normal user-mode PE loader. I hit this first with Rust-based Python extensions, but the reliable test isn’t the language. It’s the PE header: if IMAGE_DIRECTORY_ENTRY_TLS is present and non-empty, the module needs the Windows loader’s TLS machinery.
The reason is where the TLS data lives. TlsAlloc gives you dynamic indices and writes to TEB->TlsSlots. That one’s fine. But the compiler doesn’t emit TlsAlloc code for static TLS. It emits direct indexing into TEB->ThreadLocalStoragePointer, which is a separate array. That array is managed by LdrpHandleTlsData in ntdll, an undocumented internal function responsible for resizing the array when a new DLL with TLS is loaded. When LdrpHandleTlsData never runs, your extension’s thread-local indexes point into memory that was never allocated for it.
Projects that have tried to work around this all make trade-offs I wasn’t willing to ship:
- Blackbone reimplements
LdrpHandleTlsDatavia pattern-scanningntdll. It maintains thirteen-plus byte patterns across Windows versions; every Windows update is a potential break. - Fatmike’s PELoader does handle static TLS for a single loaded module. The author is explicit that simply iterating TLS callbacks isn’t enough for Rust and that custom per-thread TLS-data initialization is required, with a
TlsCallbackProxyshim to forward thread events. The project describes itself as experimental on this surface and loads one target at a time. - Manual slot management with
TlsAllocaddresses the wrong array entirely. The one your compiler never touches.
This is why Python freezing tools often keep a real file path in the loop for native extensions. py2exe made that trade-off explicit for Python 3.12+, where bundle_files < 3 is no longer supported. The maintainer closed the issue with “I did not see a viable implementation.” At some point delegating to the OS loader is the boring solution that works.
Accepting that took longer than fixing it. The TEB structure has been stable for decades. There is no user-mode path around it that’s both general and stable, and a library that quietly breaks on the next Windows update is worse than a tempfile.
So for static TLS, paker detects it, writes the extension to a tempfile, and lets LoadLibrary do its job. For the majority of extensions without static TLS, the PE loader runs and nothing touches disk.
What you actually get, by platform
| Linux | macOS (arm64) | Windows (no TLS) | Windows (static TLS) | |
|---|---|---|---|---|
| Bytes on disk | None | Milliseconds | None | Lifetime of load |
| Visible filesystem path | No | Briefly | No | Yes |
| Plaintext artifact after crash | None | Possible before unlink | None | Temp file may remain |
| Extra constraints | Keep fd open | Signing / entitlements | PE loader quirks | None, it’s a file |
“Loaded from memory” is not a binary property. Linux gives you a real zero-disk path with one behavioral gotcha. macOS gives you a brief window, not a guarantee, and the gap isn’t closing. Windows gives you zero-disk for most of your bindings and a hard wall for the rest.
If you’re choosing techniques, the question I’d ask isn’t “can I load without touching disk” but “what’s the worst-case footprint when the happy path doesn’t apply.” The packages that break the pattern are usually the ones you most want to ship.
paker hides most of this. pip install paker, paker.dumps(...) on one end, paker.loads(...) on the other, and you get the correct primitive per platform with the gotchas already handled: memfd fds held open on Linux, ephemeral tempfile on macOS, PE loader on Windows with a static-TLS fallback. Source, docs, and a remote-agent example live at github.com/desty2k/paker.
Footnote on memory hygiene: if you’re decrypting bytes before loading, the buffer lifetime matters too. mmap + mlock keeps the plaintext out of swap, MADV_DONTDUMP keeps it out of core dumps, and zeroing via a volatile pointer loop after use prevents the compiler from eliminating the scrub. None of this is OS-specific and all of it is secondary to the loading primitive itself.