<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>softwaredevelopment &amp;mdash; agyild</title>
    <link>https://agyild.writeas.com/tag:softwaredevelopment</link>
    <description>Security Researcher</description>
    <pubDate>Fri, 24 Apr 2026 17:52:56 +0000</pubDate>
    <item>
      <title>Offensive threat modeling &amp; IOC-proof ID generation</title>
      <link>https://agyild.writeas.com/offensive-threat-modeling-and-ioc-proof-id-generation?pk_campaign=rss-feed</link>
      <description>&lt;![CDATA[The content below is archived and might not be up-to-date. You can find its latest version on my new personal website.&#xA;&#xA;---&#xA;&#xA;a href=&#34;https://flic.kr/p/dCweus&#34;img alt=&#34;Drawing of police lineup of birds&#34; src=&#34;https://i.snap.as/9KIEP2e.jpg&#34;/a&#xA;&#xA;This the Part I of a two-part article, that talks about forensic and attribution resistant application of developmental tradecraft for offensive software development. In this first part, I am going to give some tips and examples on how to apply threat modeling methodology to development process and also share a simple technique that I have experimented with back when I researching fingerprinting-resistant data creation and storage methods.&#xA;!--more--&#xA;DISCLAIMER: The following content is for educational use and research purposes only. The author does not condone unlawful acts.&#xA;&#xA;Background &amp; Threat Modeling fundamentals&#xA;&#xA;If you are developing any sort of software where things should go under the radar, anything static is bad. Static means things stay as they are and can be pointed by finger anytime and anywhere in space. Things that can be pointed can be named, which are naturally done by humans since they were babies. And as Thomas L. Friedman (2007) once famously said: q cite=&#34;https://web.archive.org/web/20190924022511/https://www.nytimes.com/2007/04/15/magazine/15green.t.html&#34;In the world of ideas cyber, to name something is to own pwn it./q&#xA;&#xA;  Look, ma! It&#39;s a MAGNETICPOLARGOOSE!&#xA;&#xA;Yes, we are talking about attribution which is not something desired especially if you are conducting a covert operation. Because there is no longer a cover to speak of and the adversary knows that it&#39;s you. We are also talking about fingerprinting, which is again undesirable because it can be used for detecting a particular feature of an implant, whether during execution as part of an anti-malware solution or post-execution as part of a abbr title=&#34;Digital Forensics and Incident Response&#34;DFIR/abbr process. Today, we are going to talk about fingerprinting.&#xA;&#xA;Before we move on, let&#39;s remember the fundamentals of threat modeling and in particular, options for remedying an attack (Shostack, 2014):&#xA;&#xA;dl&#xA;dtMitigation/dt&#xA;ddDoing things to make it harder to take advantage of a threat. (e.g. using multifactor authentication)/dd&#xA;dtElimination/dt&#xA;ddEliminating the feature, not doing it in the first place. (e.g. disabling unused services on a computer)/dd&#xA;dtTransference/dt&#xA;ddLetting someone or something else handle the risk (e.g. using an antivirus software)./dd&#xA;dtAcceptance/dt&#xA;ddAccepting the risk by regarding it as managable and/or not mission critical. (e.g. a href=&#34;https://youtu.be/RNzS66iRcjw&#34;emnot/em doing this/a in your everyday life)/dd&#xA;/dl&#xA;&#xA;The first obvious static part of a software is its executable code section, which is a prime candidate for crafting signatures since it might contain many unique fingerprintable patterns. As for the remedies; you cannot eliminate it because software needs to run and the show must go on. You cannot accept it because you don&#39;t want to be detected. That leaves us with two options: transference and mitigation. You can transfer the threat by using a commercial software packing solution such as the infamous VMProtect —but then again what kind of abbr title=&#34;Advanced Persistent Threat&#34;APT/abbr would you be and what would people say behind your back?—. And finally, you can mitigate it on your own.&#xA;&#xA;For mitigating against fingerprinting code sections of an executable, there are various approaches. One of the oldest tricks in the book is using a poly/oligo/meta-morphic —seriously, how many are there?— engine to morph the code into many variations —therefore making it not static— with the same goal (Szor, 2005). Each of these techniques has their own caveats and I will not go into detail about them, however you can find more information in the referenced book but also here and here.&#xA;&#xA;More advanced methods aim to mitigate against scanning instead. It makes sense, if you cannot see something, you cannot point it and name it and so on. And there are many variations of these sort of techniques. Off the top of my head, they start with simple removal of the PE header to make it harder to dump the executable, various process injection techniques to run under the disguise of another and possibly whitelisted software and go into more advanced techniques such as hooking system calls to redirect potentially threatening abbr title=&#34;Application Programing Interface&#34;API/abbr calls (Blunden, 2012), and yet even more obscure and highly advanced techniques such as runing code under a hypervisor or SGX enclave.&#xA;&#xA;But one thing that is usually overlooked is the data, which is also the subject of today&#39;s demonstration. Well actually, code is also data as well. But labeling it as code, implies that it is executable. Marking something as executable introduces its own limitations, set of insturctions eventually need to be read by a abbr title=&#34;Central Processing Unit&#34;CPU/abbr and therefore the legibility needs to protected. All preceding ix/imorphic code techniques comply with this limitation. And anti-scanning based approaches don&#39;t even touch the code, they just shield it by abusing the access control logic.&#xA;&#xA;When I say data, what I mean is read-writable memory or strings. That&#39;s it. And in many cases, you don&#39;t really need them. Because generally something read-writable is meant to be for humans. And as once someone said to me on abbr title=&#34;Internet Relay Chat&#34;IRC/abbr, qIf you are doing strings, you are doing it wrong./q (Note that, the IRC channel in question permabanned a user once, just for asking if Windows API has a similar function to make an entire string uppercase at once akin to Python, so take it with a grain of salt). Anyway, the main idea is if you are developing a low-level software, which is also the go-to approach for developing spyware, strings are usually a byproduct and there are better means to query something. In the end, there is usually a 8-bit or larger numeric value that shows the status of the thing that you are really querying. Same thing also goes with exfiltration, if telemetry collection is part of your operation, don&#39;t just send out a huge message that reads Microsoft Windows 10, Build 18363.418, instead according to your target specifications encode it as a value between 0 and 255 (by using something like enum for instance), and send it out as a single byte value. Not only that you reduce the time it takes to transmit the message therefore reduce the risk of detection, but you also save from space as well. Do not use strings, unless you absolutely have to. It may seem like a very simple principle to grasp at first, but you&#39;d be surprised to learn, in the wild how many developers make these simple operational mistakes.&#xA;&#xA;But, are the strings only meant to be used for conveying a message? Well, no. They are also used as identifiers or names, for instance to get the memory address of a kernel object such as a file, mutex, named pipe etc.&#xA;&#xA;So, whenever we need to name a kernel object including dropping a file on the disk we need to give it a name. And if you have been following me closely, you should see where this is going. You see, whenever you name a kernel object, in a way you are also leaving a signature with your name on it for forensics to find and document and share with everyone as an abbr title=&#34;Indicators of Compromise&#34;IOC/abbr. So let&#39;s talk a little bit about that.&#xA;&#xA;Fingerprinting-resistant identifier generation&#xA;&#xA;So how can we overcome this problem? Simple, just don&#39;t. I mean try to do without naming kernel objects and implement some kind of other measure. Contrary to what I said before, in this case elimination as a threat modeling remedy is an acceptable solution because data is more sacrificable compared to the code. If you need to share a pipe or a mutex, pass its address with something else that doesn&#39;t require a preshared name. If you need to store data for later use, ask yourself if it is critical to the mission and can you do without state persistence or consider something like storing it inside the motherboard&#39;s abbr title=&#34;Non-volatile Random Access Memory&#34;NVRAM/abbr (it still requires an identifier but also has the characteristics of an anti-scan measure to shield against unwanted eyes).&#xA;&#xA;In some rare and limited cases you can transfer the risk, for example by weaponizing previously installed write protection software on the target computer, such as Deep Freeze. Wrote data on the disk? No problem, Deep Freeze will take care of it (But always abbr title=&#34;Read the Friendly Manual&#34;RTFM/abbr and test, test, test; what if they have an out-of-band logging solution?). Like I said those are rare and this one also doesn&#39;t give protection during the execution.&#xA;&#xA;Finally, let&#39;s talk about the real thing: mitigation. Whenever there is a need to name something covertly, one of my favorite techniques is to generate it randomly. Here are bunch of strings generated by RANDOM.ORG:&#xA;s9QZEycn9aGjRBP98LWO&#xA;nWnp7DbLtgs8yIt8nRXQ&#xA;6GzgLhVPubTHp0GIwrDa&#xA;z16lns0dQ15fAzymiYC1&#xA;Fe4Peghp3qT4usvKlZxo&#xA;4rRQPSlCwRD7S4ALq3HG&#xA;PS2JtxlvzW2ICKTBXZit&#xA;STExZZd74MT9qJqTezRU&#xA;HFn6sFmLFuP9NkgFcUB6&#xA;FyqMQqk4GFf543vIv3AA&#xA;&#xA;However, that doesn&#39;t really solve anything. Even if you were to put them in the executable during the compile-time, they are still unique. For all we care, we could just say ThisIsARandomString and there would be no difference. So, we need to tweak it a little bit. And in order to do that, we need to define what I call hierarchies of time domains for random value generation (If there is already a better name for it, please let me know). Basically, there are several phases where a random value could start its existence, sorted from general to specific:&#xA;&#xA;Global-time (i.e. generated once, inserted into the code and never changed afterwards)&#xA;Compile-time (i.e. generating a new value for each new compilation of the code)&#xA;Run-time (i.e. generating a value during the execution of the code)&#xA;Sub run-time (i.e. generating a value during a specific execution frame)&#xA;&#xA;Those are definitely not set in stone and the list could be expanded or shrinked according to one&#39;s needs. Generally, every random value we see around that is used in cryptography and such, are generated in the run-time domain. Because in theory, the generated values are not tied to anywhere, except the universe —they are universally random so to speak— and therefore they are more secure. But did you notice that I have said tie? Yes, one funny and in our case useful characteristic of random number generators is that, you can change their time domain as long as you seed them with a value that is originated from your target domain.&#xA;&#xA;To better understand, let&#39;s continue with an example use case. One of the oldest tricks for checking if an application instance is already running, is to create a kernel object in global or per-session namespace such as a mutex or a semaphore. Basically, if a kernel object with the given name already exists that means the application is already running and there is no need to create a new instance and it is safe to exit. But this method also has its own risks, if you choose a very common name such as Mutex1, there is a high risk that you are going to collide with another application. So the safer practice is to generate a random abbr title=&#34;Globally Unique Identifier&#34;GUID/abbr value at global-time and use it as the preshared name.&#xA;&#xA;However, although it might reduce the collision risk, due its unique property it can also be used as a fingerprint to craft a signature. To overcome that, we need to lower its time domain by seeding it with a value originated from the targeted domain. For instance if you want to target per-user execution frame in the sub run-time, you can use the user name and the computer name (for increasing entropy, in case the user name is very generic like user or john etc.) as seed:&#xA;&#xA;!/usr/bin/env python3&#xA;&#xA;fingresidpoc1.py&#xA;&#xA;import os&#xA;import itertools&#xA;&#xA;globalguid = &#39;snxsqgvlslfdhhoykjhtryxsskcagymk&#39;&#xA;&#xA;Get computer and user names from environment variables&#xA;subrunseed = os.getenv(&#39;username&#39;) + os.getenv(&#39;computername&#39;)&#xA;&#xA;Convert chars to their numeric counterparts and XOR with each other&#xA;subrunguidord = []&#xA;for chars in zip((ord(x) for x in globalguid), itertools.cycle(subrunseed)):&#xA;    subrunguidord.append(chars[0] ^ ord(chars[1]))&#xA;&#xA;&#34;Asciify&#34; the numeric values, for the sake of demonstration we are just disributing them within the 97-122 region which corresponds to lowercase ascii&#xA;subrunguid = &#39;&#39;.join(chr((n - 97) % 26 + 97) for n in subrunguidord)&#xA;print(subrunguid) # e.g. iieoipsuuqjcwsnissrnkdtkjlvivwtx&#xA;&#xA;The preceding abbr title=&#34;Proof of Concept&#34;PoC/abbr takes globalguid and creates a similar looking subrunguid  by combining it with computer and user names, but this time it is in sub-run time domain and an unique value just for this computer and user. In a way, it is very similar to salting a password before storing the digest. Note that, I chose a simple, low-entropy lowercase ASCII GUID for the sake of demonstration. In production, you should use proper GUIDs but you will also have to deal with normalizing or asciifying them.&#xA;&#xA;Even though this example was good enough for demonstration, we can still tweak it a little bit further. We can for instance combine this mitigation mechanism with a transference one by abusing abbr title=&#34;Address Space Layout Randomization&#34;ASLR/abbr and Windows memory model, and also change the ID generation mechanism with an alternative one.&#xA;&#xA;!/usr/bin/env python3&#xA;&#xA;fingresidpoc2.py&#xA;&#xA;import ctypes&#xA;import random&#xA;import string&#xA;&#xA;Get the base addresses of two commonly linked system DLLs&#xA;subrunaddresskernel32 = ctypes.windll.kernel32.GetModuleHandleW(&#39;kernel32&#39;)&#xA;subrunaddressntdll = ctypes.windll.kernel32.GetModuleHandleW(&#39;ntdll&#39;)&#xA;&#xA;Initiate and seed the PRNG with numeric values of the addresses&#xA;subrunprng = random.Random(subrunaddresskernel32 ^ subrunaddressntdll)&#xA;&#xA;Craft an identifier in sub run-time that changes randomly every boot&#xA;subrunuid = &#39;&#39;.join(subrunprng.choice(string.asciiletters + string.digits) for  in range(32))&#xA;print(subrunuid) # e.g. GUfK0Jw628yFLmEo2kWctDd31MPAhcU1&#xA;&#xA;In this example we have queried the base addresses of two commonly linked system DLLs and used them as seed for initiating a sub run-time abbr title=&#34;Pseudorandom Number Generator&#34;PRNG/abbr. Then we have used the resulting PRNG to choose ASCII characters and digits to craft a 32 characters long identifier that is guaranteed to change each reboot or in other words it is in per-reboot sub run-time execution frame. This works beautifully because system DLLs&#39; base addresses are redetermined during each boot and ASLR takes care of making things not static by giving us just enough entropy. Here is a more detailed explanation (Yosifovich &amp; Solomon &amp; Ionescu &amp; Russinovich, 2017):&#xA;&#xA;  ...For abbr title=&#34;Dynamic-link Library&#34;DLL/abbrs, computing the load offset begins with a per-boot, system-wide value called the image bias. This is computed by MiInitializeRelocations and stored in the global memory state structure (MISYSTEM_INFORMATION) in the MiState.Sections.ImageBias fields (MiImageBias global variable in Windows 8.x/2012/R2). This value corresponds to the abbr title=&#34;Time Stamp Counter&#34;TSC/abbr of the current CPU when this function was called during the boot cycle, shifted and masked into an 8-bit value. This provides 256 possible values on 32 bit systems; similar computations are done for 64-bit systems with more possible values as the address space is vast. Unlike executables, this value is computed only once per boot and shared across the system to allow DLLs to remain shared in physical memory and relocated only once. If DLLs were remapped at different locations inside different processes, the code could not be shared. The loader would have to fix up address references differently for each process, thus turning what had been shareable read-only code into process-private data. Each process using a given DLL would have to have its own private copy of the DLL in physical memory.&#xA;&#xA;This is great, because now you can use the resulting identifier to create a mutex to check if the application is already running. And since the identifier itself is random and changes every reboot, there is no way to create a static fingerprint. However, it should be noted that if some other software on the machine uses the same exact method, again you are risking collision. So you might want to combine it with a compile-time originated value to differentiate yourself.&#xA;&#xA;Final notes&#xA;&#xA;Basically what we did so far is randomizing the selection of characters. You should remember that whenever you introduce another layer of randomization, you are making it harder to fingerprint something. If you were to use the last example as it is, forensics would create an IOC such as q32 characters long string consisting of ASCII alphabet and digits between 0-9/q. In order to make it more resistant, we could also randomize the length of the identifier. But it&#39;s not that simple.&#xA;&#xA;First of all, if you choose the maximum number of allowed characters as the upper limit for your random length, and you get something like 1337 as a result, chances are it is going to get flagged as an anomaly. Because seriously, what kind of a sick bastard would choose a name that long? So that introduces us the disadvantage of randomization: the more random something is, the more behaviorally abnormal it becomes. So the best practice is to choose a range where the lower limit is high enough to make collisions less likely, and the upper limit is low enough to stay under the radar.&#xA;&#xA;And even then entropy analysis could be used to detect weird looking names. But entropy analysis has its own problems. What if some impatient user creates a file with a name like qasdjhajdhasdasdadasgqwoekqehasold.xls/q? (You&#39;d be surprised.) So due to the false positive risk, it could only be used as a secondary signature to further support other IOCs.&#xA;&#xA;Also some consideration should be made regarding the target characteristics. For instance, if the target computer is located in Asia, then Latin characters alone might be enough to raise flags. So it is advisable to adapt the exact methods you choose according to where you are targeting.&#xA;&#xA;When you are modifying the time domain of a random value, always remember that a lower hierarchy time domain will always supersede a higher one. So whenever you combine compile-time with run-time, the resulting value will be in the run-time domain. Whenever you combine per-reboot execution frame  with per-login, it will result in per-login sub run-time execution frame, etc. More specific a time frame, higher its effect. The latest value from the most specific time domain acts as the password, while the previous ones from higher and more generic domains acts as the salt.&#xA;&#xA;A better and more advanced application of this technique could be achieved by using abbr title=&#34;Natural Language Processing&#34;NLP/abbr to mimic human writing, by using code samples from public repositories. I might research this in the future or someone might want to beat me to it. Would love to see how it would work.&#xA;&#xA;Well, that should do it for now. In the next part, I plan to talk about how black propaganda &amp; disinformation tactics can be used against attribution attempts.&#xA;&#xA;References&#xA;Shostack, A. (2014). citeThreat Modeling: Designing for Security/cite. John Wiley &amp; Sons.&#xA;Szor, P. (2005). citeThe Art of Computer Virus Research and Defense/cite. Addison Wesley Professional.&#xA;Blunden, B. (2012). citeThe Rootkit Arsenal: Escape and Evasion in the Dark Corners of the System /cite (2supnd/sup ed.). Jones &amp; Bartlett Learning. &#xA;Yosifovich, P. &amp; Solomon, D. A. &amp; Ionescu, A. &amp; Russinovich, M. E. (2017). citeWindows Internals, Part 1: System architecture, processes, threads, memory management, and more/cite (7supth/sup ed.). Microsoft Press.&#xA;&#xA;#english #offensive #antiforensics #windows #softwaredevelopment]]&gt;</description>
      <content:encoded><![CDATA[<p><strong>The content below is archived and might not be up-to-date. You can find its latest version on my new personal <a href="https://agyild.website/posts/2020/03/offensive-threat-modeling-ioc-proof-id-generation/" rel="nofollow">website</a>.</strong></p>

<hr/>

<p><a href="https://flic.kr/p/dCweus" rel="nofollow"><img alt="Drawing of police lineup of birds" src="https://i.snap.as/9KIEP2e.jpg"></a></p>

<p>This the Part I of a two-part article, that talks about forensic and attribution resistant application of developmental tradecraft for offensive software development. In this first part, I am going to give some tips and examples on how to apply threat modeling methodology to development process and also share a simple technique that I have experimented with back when I researching fingerprinting-resistant data creation and storage methods.

<strong>DISCLAIMER:</strong> The following content is for educational use and research purposes only. The author does not condone unlawful acts.</p>

<h2 id="background-threat-modeling-fundamentals" id="background-threat-modeling-fundamentals">Background &amp; Threat Modeling fundamentals</h2>

<p>If you are developing any sort of software where things should go under the radar, anything <em>static</em> is bad. Static means things stay as they are and can be pointed by finger anytime and anywhere in space. Things that can be pointed can be <em>named</em>, which are naturally done by humans since they were babies. And as Thomas L. Friedman (2007) once famously <a href="https://web.archive.org/web/20190924022511/https://www.nytimes.com/2007/04/15/magazine/15green.t.html" rel="nofollow">said</a>: <q cite="https://web.archive.org/web/20190924022511/https://www.nytimes.com/2007/04/15/magazine/15green.t.html">In the world of <del>ideas</del> cyber, to name something is to <del>own</del> pwn it.</q></p>

<blockquote><p>Look, ma! It&#39;s a MAGNETICPOLARGOOSE!</p></blockquote>

<p>Yes, we are talking about attribution which is not something desired especially if you are conducting a covert operation. Because there is no longer a <em>cover</em> to speak of and the adversary knows that it&#39;s <em>you</em>. We are also talking about fingerprinting, which is again undesirable because it can be used for detecting a particular feature of an implant, whether during execution as part of an anti-malware solution or post-execution as part of a <abbr title="Digital Forensics and Incident Response">DFIR</abbr> process. Today, we are going to talk about fingerprinting.</p>

<p>Before we move on, let&#39;s remember the fundamentals of threat modeling and in particular, options for remedying an attack (Shostack, 2014):</p>

<dl>
<dt>Mitigation</dt>
<dd>Doing things to make it harder to take advantage of a threat. (e.g. using multifactor authentication)</dd>
<dt>Elimination</dt>
<dd>Eliminating the feature, not doing it in the first place. (e.g. disabling unused services on a computer)</dd>
<dt>Transference</dt>
<dd>Letting someone or something else handle the risk (e.g. using an antivirus software).</dd>
<dt>Acceptance</dt>
<dd>Accepting the risk by regarding it as managable and/or not mission critical. (e.g. <a href="https://youtu.be/RNzS66iRcjw" rel="nofollow"><em>not</em> doing this</a> in your everyday life)</dd>
</dl>

<p>The first obvious static part of a software is its executable code section, which is a prime candidate for crafting signatures since it might contain many unique fingerprintable patterns. As for the remedies; you cannot <em>eliminate</em> it because software needs to run and the show must go on. You cannot <em>accept</em> it because you don&#39;t want to be detected. That leaves us with two options: transference and mitigation. You can <em>transfer</em> the threat by using a commercial software packing solution such as the infamous <a href="https://web.archive.org/web/20191115102140/https://vmpsoft.com/" rel="nofollow">VMProtect</a> —but then again what kind of <abbr title="Advanced Persistent Threat">APT</abbr> would you be and what would people say behind your back?—. And finally, you can <em>mitigate</em> it on your own.</p>

<p>For mitigating against fingerprinting code sections of an executable, there are various approaches. One of the oldest tricks in the book is using a poly/oligo/meta-morphic —seriously, how many are there?— engine to <em>morph</em> the code into many variations —therefore making it <em>not</em> static— with the same goal (Szor, 2005). Each of these techniques has their own caveats and I will not go into detail about them, however you can find more information in the referenced book but also <a href="https://web.archive.org/web/0/https://www.codeproject.com/Articles/1236410/Evolution-of-Polymorphic-Malware" rel="nofollow">here</a> and <a href="https://web.archive.org/web/20200317043325/https://www.codeproject.com/Articles/1165717/Metamorphic-engines" rel="nofollow">here</a>.</p>

<p>More advanced methods aim to mitigate against <em>scanning</em> instead. It makes sense, if you cannot <em>see</em> something, you cannot point it and <em>name</em> it and so on. And there are many variations of these sort of techniques. Off the top of my head, they start with simple <a href="https://web.archive.org/web/20191209230111/https://www.codeproject.com/Articles/30815/An-Anti-Reverse-Engineering-Guide#RemovePEHeader" rel="nofollow">removal of the PE header</a> to make it harder to dump the executable, various <a href="https://web.archive.org/web/20200309052254/https://attack.mitre.org/techniques/T1055/" rel="nofollow">process injection</a> techniques to run under the disguise of another and possibly whitelisted software and go into more advanced techniques such as hooking system calls to redirect potentially threatening <abbr title="Application Programing Interface">API</abbr> calls (Blunden, 2012), and yet even more obscure and highly advanced techniques such as <a href="https://web.archive.org/web/20200131162620/https://revers.engineering/7-days-to-virtualization-a-series-on-hypervisor-development/" rel="nofollow">runing code under a hypervisor</a> or <a href="https://web.archive.org/web/20191207132307/https://arxiv.org/abs/1902.03256" rel="nofollow">SGX enclave</a>.</p>

<p>But one thing that is usually overlooked is the <em>data</em>, which is also the subject of today&#39;s demonstration. Well actually, code is also data as well. But labeling it as code, implies that it is executable. Marking something as executable introduces its own limitations, set of insturctions eventually need to be read by a <abbr title="Central Processing Unit">CPU</abbr> and therefore the legibility needs to protected. All preceding <i>x</i>morphic code techniques comply with this limitation. And anti-scanning based approaches don&#39;t even touch the code, they just shield it by abusing the access control logic.</p>

<p>When I say data, what I mean is read-writable memory or <em>strings</em>. That&#39;s it. And in many cases, you don&#39;t really need them. Because generally something read-writable is meant to be for humans. And as once someone said to me on <abbr title="Internet Relay Chat">IRC</abbr>, <q>If you are doing strings, you are doing it wrong.</q> (Note that, the IRC channel in question <em>permabanned</em> a user once, just for asking if Windows API has a similar function to make an entire string uppercase at once akin to Python, so take it with a grain of salt). Anyway, the main idea is if you are developing a low-level software, which is also the go-to approach for developing spyware, strings are usually a byproduct and there are better means to query something. In the end, there is usually a 8-bit or larger numeric value that shows the status of the thing that you are really querying. Same thing also goes with exfiltration, if telemetry collection is part of your operation, don&#39;t just send out a <em>huge</em> message that reads <code>Microsoft Windows 10, Build 18363.418</code>, instead according to your target specifications encode it as a value between 0 and 255 (by using something like <a href="https://web.archive.org/web/20190904161847/https://docs.microsoft.com/en-us/cpp/cpp/enumerations-cpp?view=vs-2019" rel="nofollow"><code>enum</code></a> for instance), and send it out as a single byte value. Not only that you reduce the time it takes to transmit the message therefore reduce the risk of detection, but you also save from space as well. Do not use strings, unless you absolutely have to. It may seem like a very simple principle to grasp at first, but you&#39;d be surprised to learn, in the wild how many developers make these simple operational mistakes.</p>

<p>But, are the strings only meant to be used for conveying a message? Well, no. They are also used as identifiers or names, for instance to get the memory address of a <a href="https://web.archive.org/web/20200316150208/https://docs.microsoft.com/en-us/windows/win32/sysinfo/kernel-objects" rel="nofollow">kernel object</a> such as a file, mutex, named pipe etc.</p>

<p>So, whenever we need to name a kernel object including dropping a file on the disk we need to give it a <em>name</em>. And if you have been following me closely, you should see where this is going. You see, whenever you name a kernel object, in a way you are also leaving a signature with your name on it for forensics to find and document and share with everyone as an <abbr title="Indicators of Compromise">IOC</abbr>. So let&#39;s talk a little bit about that.</p>

<h2 id="fingerprinting-resistant-identifier-generation" id="fingerprinting-resistant-identifier-generation">Fingerprinting-resistant identifier generation</h2>

<p>So how can we overcome this problem? Simple, just don&#39;t. I mean try to do without naming kernel objects and implement some kind of other measure. Contrary to what I said before, in this case <em>elimination</em> as a threat modeling remedy is an acceptable solution because data is more sacrificable compared to the code. If you need to share a pipe or a mutex, pass its address with something else that doesn&#39;t require a preshared name. If you need to store data for later use, ask yourself if it is critical to the mission and can you do without state persistence or consider something like <a href="https://web.archive.org/web/20190508181819/https://wikileaks.org/ciav7p1/cms/page_31227915.html" rel="nofollow">storing it</a> inside the motherboard&#39;s <abbr title="Non-volatile Random Access Memory">NVRAM</abbr> (it still requires an identifier but also has the characteristics of an <em>anti-scan</em> measure to shield against unwanted eyes).</p>

<p>In some rare and limited cases you can <em>transfer</em> the risk, for example by weaponizing previously installed write protection software on the target computer, such as <a href="https://www.faronics.com/products/deep-freeze" rel="nofollow">Deep Freeze</a>. Wrote data on the disk? No problem, Deep Freeze will take care of it (But always <abbr title="Read the Friendly Manual">RTFM</abbr> and test, test, test; what if they have an out-of-band logging solution?). Like I said those are rare and this one also doesn&#39;t give protection during the execution.</p>

<p>Finally, let&#39;s talk about the real thing: <em>mitigation</em>. Whenever there is a need to name something covertly, one of my favorite techniques is to generate it <em>randomly</em>. Here are bunch of strings generated by <a href="https://www.random.org/strings/?num=10&amp;len=20&amp;digits=on&amp;upperalpha=on&amp;loweralpha=on&amp;unique=on&amp;format=html&amp;rnd=new" rel="nofollow">RANDOM.ORG</a>:</p>

<pre><code>s9QZEycn9aGjRBP98LWO
nWnp7DbLtgs8yIt8nRXQ
6GzgLhVPubTHp0GIwrDa
z16lns0dQ15fAzymiYC1
Fe4Peghp3qT4usvKlZxo
4rRQPSlCwRD7S4ALq3HG
PS2JtxlvzW2ICKTBXZit
STExZZd74MT9qJqTezRU
HFn6sFmLFuP9NkgFcUB6
FyqMQqk4GFf543vIv3AA
</code></pre>

<p>However, that doesn&#39;t really solve anything. Even if you were to put them in the executable during the compile-time, they are still <em>unique</em>. For all we care, we could just say <code>ThisIsARandomString</code> and there would be no difference. So, we need to tweak it a little bit. And in order to do that, we need to define what I call <em>hierarchies of time domains for random value generation</em> (If there is already a better name for it, please let me know). Basically, there are several phases where a random value could start its existence, sorted from general to specific:</p>
<ol><li>Global-time (i.e. generated once, inserted into the code and never changed afterwards)</li>
<li>Compile-time (i.e. generating a new value for each new compilation of the code)</li>
<li>Run-time (i.e. generating a value during the execution of the code)</li>
<li>Sub run-time (i.e. generating a value during a <em>specific execution frame</em>)</li></ol>

<p>Those are definitely not set in stone and the list could be expanded or shrinked according to one&#39;s needs. Generally, every random value we see around that is used in cryptography and such, are generated in the run-time domain. Because in theory, the generated values are not <em>tied</em> to anywhere, except the universe —they are universally random so to speak— and therefore they are more secure. But did you notice that I have said <em>tie</em>? Yes, one funny and in our case useful characteristic of random number generators is that, you can change their time domain as long as you <em>seed</em> them with a value that is originated from your target domain.</p>

<p>To better understand, let&#39;s continue with an example use case. One of the oldest tricks for checking if an application instance is already running, is to <a href="https://web.archive.org/web/20200316150011/https://docs.microsoft.com/en-us/windows/win32/termserv/kernel-object-namespaces" rel="nofollow">create a kernel object in global or per-session namespace</a> such as a mutex or a semaphore. Basically, if a kernel object with the given name already exists that means the application is already running and there is no need to create a new instance and it is safe to exit. But this method also has its own risks, if you choose a very common name such as <code>Mutex1</code>, there is a high risk that you are going to collide with another application. So the safer practice is to generate a random <abbr title="Globally Unique Identifier">GUID</abbr> value at global-time and use it as the preshared name.</p>

<p>However, although it might reduce the collision risk, due its unique property it can also be used as a fingerprint to craft a signature. To overcome that, we need to lower its time domain by seeding it with a value originated from the targeted domain. For instance if you want to target <em>per-user</em> execution frame in the sub run-time, you can use the user name and the computer name (for increasing entropy, in case the user name is very generic like <em>user</em> or <em>john</em> etc.) as seed:</p>

<pre><code class="language-python">#!/usr/bin/env python3

# fingresid_poc1.py

import os
import itertools

global_guid = &#39;snxsqgvlslfdhhoykjhtryxsskcagymk&#39;

# Get computer and user names from environment variables
subrun_seed = os.getenv(&#39;username&#39;) + os.getenv(&#39;computername&#39;)

# Convert chars to their numeric counterparts and XOR with each other
subrun_guid_ord = []
for chars in zip((ord(x) for x in global_guid), itertools.cycle(subrun_seed)):
    subrun_guid_ord.append(chars[0] ^ ord(chars[1]))

# &#34;Asciify&#34; the numeric values, for the sake of demonstration we are just disributing them within the 97-122 region which corresponds to lowercase ascii
subrun_guid = &#39;&#39;.join(chr((n - 97) % 26 + 97) for n in subrun_guid_ord)
print(subrun_guid) # e.g. iieoipsuuqjcwsnissrnkdtkjlvivwtx
</code></pre>

<p>The preceding <abbr title="Proof of Concept">PoC</abbr> takes <code>global_guid</code> and creates a similar looking <code>subrun_guid</code> by combining it with computer and user names, but this time it is in sub-run time domain and an unique value just for this computer and user. In a way, it is very similar to <em>salting</em> a password before storing the digest. Note that, I chose a simple, low-entropy lowercase ASCII GUID for the sake of demonstration. In production, you should use proper GUIDs but you will also have to deal with normalizing or <em>asciifying</em> them.</p>

<p>Even though this example was good enough for demonstration, we can still tweak it a little bit further. We can for instance combine this <em>mitigation</em> mechanism with a <em>transference</em> one by abusing <abbr title="Address Space Layout Randomization">ASLR</abbr> and Windows memory model, and also change the ID generation mechanism with an alternative one.</p>

<pre><code class="language-python">#!/usr/bin/env python3

# fingresid_poc2.py

import ctypes
import random
import string

# Get the base addresses of two commonly linked system DLLs
subrun_address_kernel32 = ctypes.windll.kernel32.GetModuleHandleW(&#39;kernel32&#39;)
subrun_address_ntdll = ctypes.windll.kernel32.GetModuleHandleW(&#39;ntdll&#39;)

# Initiate and seed the PRNG with numeric values of the addresses
subrun_prng = random.Random(subrun_address_kernel32 ^ subrun_address_ntdll)

# Craft an identifier in sub run-time that changes randomly every boot
subrun_uid = &#39;&#39;.join(subrun_prng.choice(string.ascii_letters + string.digits) for _ in range(32))
print(subrun_uid) # e.g. GUfK0Jw628yFLmEo2kWctDd31MPAhcU1
</code></pre>

<p>In this example we have queried the base addresses of two commonly linked system DLLs and used them as seed for initiating a sub run-time <abbr title="Pseudorandom Number Generator">PRNG</abbr>. Then we have used the resulting PRNG to choose ASCII characters and digits to craft a 32 characters long identifier that is guaranteed to change <em>each reboot</em> or in other words it is in <em>per-reboot</em> sub run-time execution frame. This works beautifully because system DLLs&#39; base addresses are redetermined during each boot and ASLR takes care of making things <em>not static</em> by giving us just enough entropy. Here is a more detailed explanation (Yosifovich &amp; Solomon &amp; Ionescu &amp; Russinovich, 2017):</p>

<blockquote><p>...For <abbr title="Dynamic-link Library">DLL</abbr>s, computing the load offset begins with a per-boot, system-wide value called the image bias. This is computed by <code>MiInitializeRelocations</code>and stored in the global memory state structure (<code>MI_SYSTEM_INFORMATION</code>) in the <code>MiState.Sections.ImageBias</code> fields (<code>MiImageBias</code>global variable in Windows 8.x/2012/R2). This value corresponds to the <abbr title="Time Stamp Counter">TSC</abbr> of the current CPU when this function was called during the boot cycle, shifted and masked into an 8-bit value. This provides 256 possible values on 32 bit systems; similar computations are done for 64-bit systems with more possible values as the address space is vast. Unlike executables, this value is computed only once per boot and shared across the system to allow DLLs to remain shared in physical memory and relocated only once. If DLLs were remapped at different locations inside different processes, the code could not be shared. The loader would have to fix up address references differently for each process, thus turning what had been shareable read-only code into process-private data. Each process using a given DLL would have to have its own private copy of the DLL in physical memory.</p></blockquote>

<p>This is great, because now you can use the resulting identifier to create a mutex to check if the application is already running. And since the identifier itself is random and changes every reboot, there is no way to create a static fingerprint. However, it should be noted that if some other software on the machine uses the same exact method, again you are risking collision. So you might want to combine it with a compile-time originated value to differentiate yourself.</p>

<h2 id="final-notes" id="final-notes">Final notes</h2>

<p>Basically what we did so far is randomizing the <em>selection</em> of characters. You should remember that whenever you introduce another layer of randomization, you are making it harder to fingerprint something. If you were to use the last example as it is, forensics would create an IOC such as <q>32 characters long string consisting of ASCII alphabet and digits between 0-9</q>. In order to make it more resistant, we could also randomize the <em>length</em> of the identifier. But it&#39;s not that simple.</p>

<p>First of all, if you choose the maximum number of allowed characters as the upper limit for your random length, and you get something like 1337 as a result, chances are it is going to get flagged as an anomaly. Because seriously, what kind of a sick bastard would choose a name that long? So that introduces us the disadvantage of randomization: the more random something is, the more behaviorally abnormal it becomes. So the best practice is to choose a range where the lower limit is high enough to make collisions less likely, and the upper limit is low enough to stay under the radar.</p>

<p>And even then <em>entropy analysis</em> could be used to detect weird looking names. But entropy analysis has its own problems. What if some impatient user creates a file with a name like <q>asdjhajdhasdasdadasgqwoekqehasold.xls</q>? (You&#39;d be surprised.) So due to the false positive risk, it could only be used as a secondary signature to further support other IOCs.</p>

<p>Also some consideration should be made regarding the target characteristics. For instance, if the target computer is located in Asia, then Latin characters alone might be enough to raise flags. So it is advisable to adapt the exact methods you choose according to where you are targeting.</p>

<p>When you are modifying the time domain of a random value, always remember that a lower hierarchy time domain will always supersede a higher one. So whenever you combine compile-time with run-time, the resulting value will be in the run-time domain. Whenever you combine per-reboot execution frame  with <em>per-login</em>, it will result in per-login sub run-time execution frame, etc. More specific a time frame, higher its effect. The latest value from the most specific time domain acts as the <em>password</em>, while the previous ones from higher and more generic domains acts as the <em>salt</em>.</p>

<p>A better and more advanced application of this technique could be achieved by using <abbr title="Natural Language Processing">NLP</abbr> to mimic human writing, by using code samples from public repositories. I might research this in the future or someone might want to beat me to it. Would love to see how it would work.</p>

<p>Well, that should do it for now. In the next part, I plan to talk about how black propaganda &amp; disinformation tactics can be used against attribution attempts.</p>

<h2 id="references" id="references">References</h2>
<ol><li>Shostack, A. (2014). <cite>Threat Modeling: Designing for Security</cite>. John Wiley &amp; Sons.</li>
<li>Szor, P. (2005). <cite>The Art of Computer Virus Research and Defense</cite>. Addison Wesley Professional.</li>
<li>Blunden, B. (2012). <cite>The Rootkit Arsenal: Escape and Evasion in the Dark Corners of the System </cite> (2<sup>nd</sup> ed.). Jones &amp; Bartlett Learning.</li>
<li>Yosifovich, P. &amp; Solomon, D. A. &amp; Ionescu, A. &amp; Russinovich, M. E. (2017). <cite>Windows Internals, Part 1: System architecture, processes, threads, memory management, and more</cite> (7<sup>th</sup> ed.). Microsoft Press.</li></ol>

<p><a href="https://agyild.writeas.com/tag:english" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">english</span></a> <a href="https://agyild.writeas.com/tag:offensive" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">offensive</span></a> <a href="https://agyild.writeas.com/tag:antiforensics" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">antiforensics</span></a> <a href="https://agyild.writeas.com/tag:windows" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">windows</span></a> <a href="https://agyild.writeas.com/tag:softwaredevelopment" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">softwaredevelopment</span></a></p>
]]></content:encoded>
      <guid>https://agyild.writeas.com/offensive-threat-modeling-and-ioc-proof-id-generation</guid>
      <pubDate>Tue, 17 Mar 2020 05:32:41 +0000</pubDate>
    </item>
  </channel>
</rss>