<?xml version='1.0' encoding='UTF-8'?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0"><channel><title>alex-s168's blog</title><link>https://alex.vxcc.dev/</link><description>Alexander Nutz (aka alex-s168)'s blog</description><atom:link href="https://alex.vxcc.dev/rss-hybrid.xml" rel="self" type="application/rss+xml"/><docs>http://www.rssboard.org/rss-specification</docs><generator>python-feedgen</generator><language>en-US</language><lastBuildDate>Sun, 12 Apr 2026 18:17:03 +0000</lastBuildDate><item><title>Paying for things</title><link>article-paying-for-things.typ.desktop.html</link><description>Big cooperations got us spoiled with free stuff</description><content:encoded>&lt;!DOCTYPE html&gt;
&lt;html lang="en"&gt;
  &lt;head&gt;
    &lt;title&gt;Paying for things&lt;/title&gt;
    &lt;meta charset="utf-8"&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;header&gt;
      &lt;div style="
        
        
        
        
        "&gt;
        &lt;p&gt;&lt;br&gt;&lt;/p&gt;
        &lt;h1&gt;Paying for things&lt;/h1&gt;
        &lt;p&gt;&lt;span style="font-size: 9pt"&gt;&lt;p&gt;Last modified: 12. April 2026 20:07 (Git #&lt;code&gt;&lt;code&gt;f6a9bb61&lt;/code&gt;&lt;/code&gt;)&lt;/p&gt;&lt;p&gt;Written by &lt;a href="https://alex.vxcc.dev"&gt;Alexander Nutz&lt;/a&gt;&lt;/p&gt;&lt;/span&gt;&lt;/p&gt;
      &lt;/div&gt;
    &lt;/header&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;These days, most things are available for “free”. Google search is free. G-Mail is free. Even Google Photos is free?? Obviously things are not free because the companies are nice. The companies want to collect your data, and show you ads.&lt;/p&gt;
      &lt;p&gt;The situation’s gotten so bad, that most people refuse to pay for small things [citation needed].&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;There is a good &lt;a href="https://help.kagi.com/kagi/why-kagi/why-pay-for-search.html"&gt;article&lt;/a&gt; by Kagi (search engine &amp;amp; more), about why you should pay for search.&lt;br&gt;&lt;/p&gt;
      &lt;p&gt;You should pay for everything you use. Especially things you use many times daily, like search engines, your e-mail software, your web browser, and even your desktop compositor!&lt;/p&gt;
      &lt;p&gt;Many people will say “But what about free and open source!”. No. YOU should pay for &lt;strong&gt;everything&lt;/strong&gt; you use. Of course you should not have to pay a subscription, and the software should not have a paywall (or at least not a restrictive paywall), but you, as user, should be willing to pay the software maintainers, developers, and hosters!&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;&lt;br&gt;Of course this does not apply to shitty or LLM-written software, like Windows, or GitHub. &lt;a href="https://stephango.com/quality-software"&gt;Quality software deserves your hard‑earned cash&lt;/a&gt; . This also applies to other media like: music, entertainment, educational content, photographs, blog articles, …&lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h2&gt;NO Pay-walls!&lt;/h2&gt;
      &lt;p&gt;Just because I said you should pay for media, like blog articles, doesn’t mean everyone should start putting paywalls infront of their blogs! Quite the opposite actually! All research, summaries of research, etc, should be publicly accessible (for free)!&lt;/p&gt;
      &lt;p&gt;Instead, blogs could have a (non-intrusive!) donate button. Normalize micro-transactions on the web! (These could be implemented with for example Cardano, or more privacy focused DeFi)&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h2&gt;Against streaming subscriptions&lt;/h2&gt;
      &lt;p&gt;Even though streaming subscriptions are nice in theory, media should not be streamed. You should be able to own your media.&lt;/p&gt;
      &lt;p&gt;Potential alternatives, with music as example:&lt;/p&gt;
      &lt;ul&gt;
        &lt;li&gt;Buy the album/song once, and get the raw audio file. &lt;a href="https://bandcamp.com/"&gt;Bandcamp&lt;/a&gt; allows you to do that.&lt;/li&gt;
        &lt;li&gt;Always have access to the raw audio file, but reliably pay the creators depending on how often you listen to it.&lt;/li&gt;
      &lt;/ul&gt;
      &lt;p&gt;The only thing worse than streaming music, is DRM:&lt;span style="white-space: pre-wrap"&gt;&amp;#x20;&lt;/span&gt;&lt;a href="https://www.defectivebydesign.org/"&gt;&lt;img src="res/badges/defectivebydesign" alt="link to defectivebydesign" attributionsrc="https://512b.dev/ote/dbd.gif" fetchpriority="low" style="padding-left:10px; padding-right:14px" width="123.19999999999999" height="43.4"&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h2&gt;Against subscriptions in general&lt;/h2&gt;
      &lt;p&gt;There are two kinds of subscription users:&lt;/p&gt;
      &lt;ul&gt;
        &lt;li&gt;underuses their subscription: they pay much more for the subscription, than they would have paid for each item / stream / search / …&lt;/li&gt;
        &lt;li&gt;overuses their subscription: they save a ton of money by having the subscription, and pay less per item than most others.&lt;/li&gt;
      &lt;/ul&gt;
      &lt;p&gt;The second kind is rather uncommon, so most companies that offer subscriptions do so, knowing that users will underuse them, and they can make more money that way.&lt;/p&gt;
      &lt;p&gt;Probably another reason subscription exists, is that it’s hard to do seamless micro-transactions. When you do an DeFi payment, typically only your wallet pops up immediately, and you can click pay, and it just works. When you do for example a PayPal payment, you often have to log-in again, the payment website loads really slowly, etc. This could easily be fixed by the payment processors, if they actually cared. But obviously they don’t.&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h2&gt;What can you do?&lt;/h2&gt;
      &lt;p&gt;Consider paying or donating to the software you use daily. For example use &lt;a href="https://kagi.com/"&gt;Kagi search&lt;/a&gt; , donate to your linux desktop environemnt developers, buy songs as directly from artists as possible, …&lt;/p&gt;
    &lt;/div&gt;
    &lt;p&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;/p&gt;
  &lt;/body&gt;
&lt;/html&gt;
</content:encoded><author>alexander.nutz@vxcc.dev (Alexander Nutz)</author><guid isPermaLink="false">https://vxcc.dev/alex/article-paying-for-things.typ.desktop.html</guid></item><item><title>Truthear HEXA vs PURE vs GATE</title><link>article-truthear.typ.desktop.html</link><description>Being financially irresponsible and spending way too much money on audio tech</description><content:encoded>&lt;!DOCTYPE html&gt;
&lt;html lang="en"&gt;
  &lt;head&gt;
    &lt;title&gt;HEXA vs PURE vs GATE Review&lt;/title&gt;
    &lt;meta charset="utf-8"&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;header&gt;
      &lt;div style="
        
        
        
        
        "&gt;
        &lt;p&gt;&lt;br&gt;&lt;/p&gt;
        &lt;h1&gt;Truthear HEXA vs PURE vs GATE&lt;/h1&gt;
        &lt;p&gt;&lt;span style="font-size: 9pt"&gt;&lt;p&gt;Last modified: 12. April 2026 20:07 (Git #&lt;code&gt;&lt;code&gt;f6a9bb61&lt;/code&gt;&lt;/code&gt;)&lt;/p&gt;&lt;p&gt;Written by &lt;a href="https://alex.vxcc.dev"&gt;Alexander Nutz&lt;/a&gt;&lt;/p&gt;&lt;/span&gt;&lt;/p&gt;
      &lt;/div&gt;
    &lt;/header&gt;
    &lt;aside&gt;
      &lt;div style="
        
        
        
        
        "&gt;&lt;br&gt;Note that I have no idea what I’m talking about, and these are just my personal oppinions!&lt;/div&gt;
    &lt;/aside&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;After being sick of bad headphones, I decided to buy Truthear HEXAs for 99€, and they were by far the best headphones I ever tried…&lt;/p&gt;
      &lt;p&gt;They produce a really clear sound, with strong bass, allowing you to notice musical details you would have never heard with most cheap setups. However! sometimes you don’t catch the “vibe” of the music; they force you to overanalyze everything, they are almost too optimal; Quoting the HEXA product page: &lt;a href="https://web.archive.org/web/20260408175610/https://truthear.com/products/hexa#:~:text=Present%20the%20excellent%20objective%20index"&gt;“Present the excellent objective index”&lt;/a&gt; .&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;That didn’t annoy me too much tho, so I just decided to keep them. But one day, when pulling the headphones out of my bag, I was shocked to see that I snapped the left headphone-connecting pins on the cable in half! Turns out you’re not supposed to just throw them into a bag…&lt;/p&gt;
      &lt;p&gt;I managed to “repair” the cable by soldering on different pins, but the wires aren’t just standard multi-strand copper wires, so I had a hard time soldering on new contacts, and the fix broke after a few days.&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;&lt;br&gt;Since cheap &amp;amp; good replacement cables are hard to find, I decided to buy Truthear GATEs for 20€, which have compatible cables. But I was thinking: “might as well get another pair of headphones as comparision”, so I decided to also get Truthear PUREs for 99€…&lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h2&gt;Comparision&lt;/h2&gt;
      &lt;ul&gt;
        &lt;li&gt;GATE: 20 € (at time of writing)&lt;/li&gt;
        &lt;li&gt;HEXA: 99 € (at time of writing)&lt;/li&gt;
        &lt;li&gt;PURE: 99 € (at time of writing)&lt;/li&gt;
      &lt;/ul&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h3&gt;GATE&lt;/h3&gt;
      &lt;p&gt;Really solid for the price point. They have a similar “clearness” to the HEXAs, but are noticably worse than both HEXAs and PUREs&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h3&gt;HEXA vs PURE&lt;/h3&gt;
      &lt;p&gt;It’s easy to notice that they sound different, but it’s hard to tell which is “better”…&lt;/p&gt;
      &lt;p&gt;The PUREs sound more “fun” and warm than the HEXAs&lt;/p&gt;
      &lt;p&gt;Recommendation based on genre:&lt;/p&gt;
      &lt;ul&gt;
        &lt;li&gt;Dark R&amp;amp;B: I think the clear sound of the HEXAs is much better for lots of Dark R&amp;amp;B&lt;/li&gt;
        &lt;li&gt;HipHop: PUREs sound more exciting, and seem to have better bass&lt;/li&gt;
        &lt;li&gt;EDM: The PUREs have better bass in many songs, but tend to be noticably worse than the HEXAs for lead &amp;amp; vocals&lt;/li&gt;
      &lt;/ul&gt;
      &lt;p&gt;Also consider that according to Truthear, the PUREs are supposed to be an improvement over the HEXA design, based on feedback by “professionals”…&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h2&gt;Favourites&lt;/h2&gt;
      &lt;p&gt;If I could only pick one, I’d go with the HEXAs; Even though the “clearness” is annoying sometimes, they just sound much better than “warm”er headphones, like the PUREs or the ZERO:RED (which I tried for one day when getting the HEXAs, and then returned), in most cases.&lt;/p&gt;
      &lt;p&gt;Keep in mind that I’m not a “professional”, so you should definitely read other people’s reviews too. If you don’t want to spend 100€ on headphones, you should probably get GATEs; they are good enough for most things.&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h2&gt;Foam Eartips&lt;/h2&gt;
      &lt;p&gt;In my opinion, the foam eartips sound much better than the silicone eartips; Sadly the GATEs don’t come with any, and the HEXAs and PUREs only with one set. Be careful with the included foam eartips! They break easily…&lt;/p&gt;
      &lt;p&gt;If you need replacement eartips, I suggest getting &lt;a href="https://www.complyfoam.com"&gt;Comply&lt;/a&gt; eartips. You can just click the button on the top right, and enter the model name of your headphones.&lt;/p&gt;
      &lt;p&gt;I got Comply 600 core series (small &amp;amp; medium) eartips, which work for the HEXAs, GATEs, and PUREs. They seem to last much longer than the included eartips, and also don’t itch after wearing for a long time.&lt;/p&gt;
      &lt;p&gt;Also note that getting slightly larger eartips means better sound (and less sound from the environment), but might hurt a bit after wearing them for long.&lt;/p&gt;
    &lt;/div&gt;
  &lt;/body&gt;
&lt;/html&gt;
</content:encoded><author>alexander.nutz@vxcc.dev (Alexander Nutz)</author><guid isPermaLink="false">https://vxcc.dev/alex/article-truthear.typ.desktop.html</guid></item><item><title>Challenges with automatically inlining functions</title><link>compiler-inlining.typ.desktop.html</link><description>Compiler backends should automatically inline functions, to get rid to avoid function call overhead. A greedy approach has many issues. We'll be exploring better approaches.</description><content:encoded>&lt;!DOCTYPE html&gt;
&lt;html lang="en"&gt;
  &lt;head&gt;
    &lt;title&gt;Challenges with automatically inlining functions&lt;/title&gt;
    &lt;meta charset="utf-8"&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;header&gt;
      &lt;div style="
        
        
        
        
        "&gt;
        &lt;p&gt;&lt;br&gt;&lt;/p&gt;
        &lt;h1&gt;Challenges with automatically inlining functions&lt;/h1&gt;
        &lt;p&gt;&lt;span style="font-size: 9pt"&gt;&lt;p&gt;Last modified: 12. April 2026 20:07 (Git #&lt;code&gt;&lt;code&gt;f6a9bb61&lt;/code&gt;&lt;/code&gt;)&lt;/p&gt;&lt;p&gt;Written by &lt;a href="https://alex.vxcc.dev"&gt;Alexander Nutz&lt;/a&gt;&lt;/p&gt;&lt;/span&gt;&lt;/p&gt;
      &lt;/div&gt;
    &lt;/header&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h2&gt;Introduction&lt;/h2&gt;
      &lt;p&gt;Function calls have some overhead, which can sometimes be a big issue for other optimizations. Because of that, compiler backends (should) inline function calls. There are however many issues with just greedily inlining calls…&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h2&gt;Greedy inlining with heuristics&lt;/h2&gt;
      &lt;p&gt;This is the most obvious approach. We can just inline all functions with only one call, and then inline calls where the inlined function does not have many instructions.&lt;/p&gt;
      &lt;p&gt;Example:&lt;/p&gt;
      &lt;p&gt;&lt;code&gt;&lt;pre&gt;&lt;code&gt;function f32 $square(f32 %x) {&lt;br&gt;@entry:&lt;br&gt;  // this is stupid, but I couldn't come up with a better example&lt;br&gt;  f32 %e = add %x, 0&lt;br&gt;  f32 %out = mul %e, %x&lt;br&gt;  ret %out&lt;br&gt;}&lt;br&gt;&lt;br&gt;function f32 $hypot(f32 %a, f32 %b) {&lt;br&gt;@entry:&lt;br&gt;  f32 %as = call $square(%a)&lt;br&gt;  f32 %bs = call $square(%b)&lt;br&gt;  f32 %sum = add %as, %bs&lt;br&gt;  f32 %o = sqrt %sum&lt;br&gt;  ret %o&lt;br&gt;}&lt;br&gt;&lt;br&gt;function f32 $tri_hypot({f32, f32} %x) {&lt;br&gt;  f32 %a = extract %x, 0&lt;br&gt;  f32 %b = extract %x, 1&lt;br&gt;  f32 %o = call $hypot(%a, %b) // this is a "tail call"&lt;br&gt;  ret %o&lt;br&gt;}&lt;br&gt;&lt;br&gt;// let's assume that $hypot is used someplace else in the code too&lt;/code&gt;&lt;/pre&gt;&lt;/code&gt;&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;&lt;br&gt;Let’s assume our inlining treshold is 5 operations. Then we would get – Waait there are multiple options…&lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h3&gt;Issue 1: (sometimes) multiple options&lt;/h3&gt;
      &lt;p&gt;If we inline the &lt;code&gt;&lt;code&gt;$square&lt;/code&gt;&lt;/code&gt; calls, then &lt;code&gt;&lt;code&gt;$hypot&lt;/code&gt;&lt;/code&gt; will have too many instructions to be inlined into &lt;code&gt;&lt;code&gt;$tri_hypot&lt;/code&gt;&lt;/code&gt;:&lt;/p&gt;
      &lt;p&gt;&lt;code&gt;&lt;pre&gt;&lt;code&gt;...&lt;br&gt;function f32 $hypot(f32 %a, f32 %b) {&lt;br&gt;@entry:&lt;br&gt;  // more instructions than our inlining treshold:&lt;br&gt;  f32 %ase = add %a, 0&lt;br&gt;  f32 %as = mul %ase, %a&lt;br&gt;  f32 %bse = add %b, 0&lt;br&gt;  f32 %bs = mul %bse, %b&lt;br&gt;  f32 %sum = add %as, %bs&lt;br&gt;  f32 %o = sqrt %sum&lt;br&gt;  ret %o&lt;br&gt;}&lt;br&gt;...&lt;/code&gt;&lt;/pre&gt;&lt;/code&gt;&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;The second option is to inline the &lt;code&gt;&lt;code&gt;$hypot&lt;/code&gt;&lt;/code&gt; call into &lt;code&gt;&lt;code&gt;$tri_hypot&lt;/code&gt;&lt;/code&gt;. (There are also some other options)&lt;/p&gt;
      &lt;p&gt;Now in this case, it seems obvious to prefer inlining &lt;code&gt;&lt;code&gt;$square&lt;/code&gt;&lt;/code&gt; into &lt;code&gt;&lt;code&gt;$hypot&lt;/code&gt;&lt;/code&gt;.&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h3&gt;Issue 2: ABI requirements on argument passing&lt;/h3&gt;
      &lt;p&gt;If we assume the target ABI only has one f32 register for passing arguments, then we would have to generate additional instructions for passing the second argument of &lt;code&gt;&lt;code&gt;$hypot&lt;/code&gt;&lt;/code&gt;, and then it might actually be more efficient to inline &lt;code&gt;&lt;code&gt;$hypot&lt;/code&gt;&lt;/code&gt; instead of &lt;code&gt;&lt;code&gt;$square&lt;/code&gt;&lt;/code&gt;.&lt;/p&gt;
      &lt;p&gt;This example is not realistic, but this issue actually occurs when compiling lots of code.&lt;/p&gt;
      &lt;p&gt;Another related issue is that having more arguments arranged in a fixed way will require lots of moving data arround at the call site.&lt;/p&gt;
      &lt;p&gt;A solution to this is to make the heuristics not just output code size, but also make it depend on the number of arguments / outputs passed to the function.&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h3&gt;Issue 3: (sometimes) prevents optimizations&lt;/h3&gt;
      &lt;p&gt;&lt;code&gt;&lt;pre&gt;&lt;code&gt;function f32 $myfunc(f32 %a, f32 %b) {&lt;br&gt;@entry:&lt;br&gt;  f32 %sum = add %a, %b&lt;br&gt;  f32 %sq = sqrt %sum&lt;br&gt;  ...&lt;br&gt;}&lt;br&gt;&lt;br&gt;function $callsite(f32 %a, f32 %b) {&lt;br&gt;@entry:&lt;br&gt;  f32 %as = mul %a, %a&lt;br&gt;  f32 %bs = mul %b, %b&lt;br&gt;  f32 %x = call $myfunc(%as, %bs)&lt;br&gt;  ...&lt;br&gt;}&lt;/code&gt;&lt;/pre&gt;&lt;/code&gt;&lt;/p&gt;
      &lt;p&gt;If the target has a efficient &lt;code&gt;&lt;code&gt;hypot&lt;/code&gt;&lt;/code&gt; operation, then that operation will only be used if we inline &lt;code&gt;&lt;code&gt;$myfunc&lt;/code&gt;&lt;/code&gt; into &lt;code&gt;&lt;code&gt;$callsite&lt;/code&gt;&lt;/code&gt;.&lt;/p&gt;
      &lt;p&gt;This means that inlining is now depended on… instruction selection??&lt;/p&gt;
      &lt;p&gt;This is not the only optimization prevented by not inlining the call. If &lt;code&gt;&lt;code&gt;$callsite&lt;/code&gt;&lt;/code&gt; were to be called in a loop, then not inlining would prevent vectorization.&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h2&gt;Function outlining&lt;/h2&gt;
      &lt;p&gt;A related optimization is “outlining”. It’s the opposite to inlining. It moves duplicate code into a function, to reduce code size, and sometimes increase performance (because of instruction caching)&lt;/p&gt;
      &lt;p&gt;If we do inlining seperately from outlining, we often get unoptimal code.&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h2&gt;A better approach&lt;/h2&gt;
      &lt;p&gt;We can instead first inline &lt;strong&gt;all&lt;/strong&gt; inlinable calls, and &lt;strong&gt;then&lt;/strong&gt; perform more agressive outlining.&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h3&gt;Step 1: inlining&lt;/h3&gt;
      &lt;p&gt;We inline &lt;strong&gt;all&lt;/strong&gt; function calls, except for:&lt;/p&gt;
      &lt;ul&gt;
        &lt;li&gt;self recursion (obviously)&lt;/li&gt;
        &lt;li&gt;functions explicitly marked as no-inline by the user&lt;/li&gt;
      &lt;/ul&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h3&gt;Step 2: detect duplicate code&lt;/h3&gt;
      &lt;p&gt;There are many algorithms for doing this.&lt;/p&gt;
      &lt;p&gt;The goal of this step is to both:&lt;/p&gt;
      &lt;ul&gt;
        &lt;li&gt;maximize size of outlinable section&lt;/li&gt;
        &lt;li&gt;minimize size of code&lt;/li&gt;
      &lt;/ul&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h3&gt;Step 3: slightly reduce size of outlinable section&lt;/h3&gt;
      &lt;p&gt;The goal is to reduce size of outlinable sections, to make the code more optimal.&lt;/p&gt;
      &lt;p&gt;This should be ABI and instruction depended, and have the goal of:&lt;/p&gt;
      &lt;ul&gt;
        &lt;li&gt;reducing argument shuffles required at all call sites&lt;/li&gt;
        &lt;li&gt;reducing register preassure&lt;/li&gt;
        &lt;li&gt;not preventing good isel choices and optimizations.&lt;/li&gt;
      &lt;/ul&gt;
      &lt;p&gt;this is also dependent on the targetted code size.&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h3&gt;Step 4: perform outlining&lt;/h3&gt;
      &lt;p&gt;This is obvious.&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h3&gt;Issue 1: high compile-time memory usage&lt;/h3&gt;
      &lt;p&gt;Inlining &lt;strong&gt;all&lt;/strong&gt; function calls first will increase the memory usage during compilation by A LOT&lt;/p&gt;
      &lt;p&gt;I’m sure that there is a smarter way to implement this method, without actually performing the inlining…&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h2&gt;Conclusion&lt;/h2&gt;
      &lt;p&gt;Function inlining is much more complex than one might think.&lt;/p&gt;
      &lt;p&gt;Subscribe to the &lt;a href="atom.xml"&gt;Atom feed&lt;/a&gt; to get notified about futre compiler-related articles.&lt;/p&gt;
    &lt;/div&gt;
  &lt;/body&gt;
&lt;/html&gt;
</content:encoded><author>alexander.nutz@vxcc.dev (Alexander Nutz)</author><guid isPermaLink="false">https://vxcc.dev/alex/compiler-inlining.typ.desktop.html</guid></item><item><title>Approaches to pattern matching in compilers</title><link>compiler-pattern-matching.typ.desktop.html</link><description>If you are working an more advanced compilers, you probably had to work with pattern matching already. In this article, we will explore different approaches.</description><content:encoded>&lt;!DOCTYPE html&gt;
&lt;html lang="en"&gt;
  &lt;head&gt;
    &lt;title&gt;Approaches to Compiler Pattern Matching&lt;/title&gt;
    &lt;meta charset="utf-8"&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;header&gt;
      &lt;div style="
        
        
        
        
        "&gt;
        &lt;p&gt;&lt;br&gt;&lt;/p&gt;
        &lt;h1&gt;Approaches to pattern matching in compilers&lt;/h1&gt;
        &lt;p&gt;&lt;span style="font-size: 9pt"&gt;&lt;p&gt;Last modified: 12. April 2026 20:07 (Git #&lt;code&gt;&lt;code&gt;f6a9bb61&lt;/code&gt;&lt;/code&gt;)&lt;/p&gt;&lt;p&gt;Written by &lt;a href="https://alex.vxcc.dev"&gt;Alexander Nutz&lt;/a&gt;&lt;/p&gt;&lt;/span&gt;&lt;/p&gt;
      &lt;/div&gt;
    &lt;/header&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h2&gt;Introduction&lt;/h2&gt;
      &lt;p&gt;Compilers often have to deal with pattern matching and rewriting (find-and-replace) inside the compiler IR (intermediate representation).&lt;/p&gt;
      &lt;p&gt;Common use cases for pattern matching in compilers:&lt;/p&gt;
      &lt;ul&gt;
        &lt;li&gt;“peephole optimizations”: the most common kind of optimization in compilers. They find a short sequence of code and replace it with some other code. For example replacing &lt;code&gt;&lt;code data-lang="c"&gt;x &lt;span style="color: #d73948"&gt;&amp;amp;&lt;/span&gt; (&lt;span style="color: #b60157"&gt;1&lt;/span&gt; &lt;span style="color: #d73948"&gt;&amp;lt;&amp;lt;&lt;/span&gt; b)&lt;/code&gt;&lt;/code&gt; with a bit test operation.&lt;/li&gt;
        &lt;li&gt;finding a sequence of operations for complex optimization passes to operate on: advanced compilers have complex optimizations that can’t really be performed with simple IR operation replacements, and instead require complex logic. Patterns are used here to find operation sequences where those optimizations are applicable, and also to extract details inside that sequence.&lt;/li&gt;
        &lt;li&gt;code generation: converting the IR to machine code / VM bytecode. A compiler needs to find operations (or sequences of operations) inside the IR, and “replace” them with machine code.&lt;/li&gt;
      &lt;/ul&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h2&gt;Simplest Approach&lt;/h2&gt;
      &lt;p&gt;Currently, most compilers mostly do this inside their source code. For example, in MLIR, most (but not all) pattern matches are performed in C++ code.&lt;/p&gt;
      &lt;p&gt;The only advantage to this approach is that it doesn’t require a complex pattern matching system.&lt;/p&gt;
      &lt;p&gt;I only recommend doing this for small compiler toy projects.&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h3&gt;Disadvantages&lt;/h3&gt;
      &lt;p&gt;Doing pattern matching this way has many disadvantages.&lt;/p&gt;
      &lt;p&gt;&lt;br&gt;Some (but not all):&lt;/p&gt;
      &lt;ul&gt;
        &lt;li&gt;debugging pattern match rules can be hard&lt;/li&gt;
        &lt;li&gt;IR rewrites need to be tracked manually (for debugging)&lt;/li&gt;
        &lt;li&gt;source locations and debug information also need to be tracked manually, which often isn’t implemented very well.&lt;/li&gt;
        &lt;li&gt;verbose and barely readable pattern matching code&lt;/li&gt;
        &lt;li&gt;overall error-prone&lt;/li&gt;
      &lt;/ul&gt;
      &lt;p&gt;I myself did pattern matching this way in my old compiler backend, and I speak from experience when I say that this approach &lt;strong&gt;sucks&lt;/strong&gt; (in most cases).&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h2&gt;Pattern Matching DSLs&lt;/h2&gt;
      &lt;p&gt;A custom language for describing IR patterns and IR transformations (aka rewrites).&lt;/p&gt;
      &lt;p&gt;I will put this into the category of “structured pattern matching”.&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;An example is Cranelift’s ISLE DSL:&lt;/p&gt;
      &lt;p&gt;&lt;code&gt;&lt;pre&gt;&lt;code data-lang="lisp"&gt;&lt;span style="color: #74747c"&gt;;;&lt;/span&gt;&lt;span style="color: #74747c"&gt; x ^ x == 0.&lt;/span&gt;&lt;br&gt;(&lt;span style="color: #4b69c6"&gt;rule&lt;/span&gt; (&lt;span style="color: #4b69c6"&gt;simplify&lt;/span&gt; (&lt;span style="color: #4b69c6"&gt;bxor&lt;/span&gt; (&lt;span style="color: #4b69c6"&gt;ty_int&lt;/span&gt; ty) x x))&lt;br&gt;      (&lt;span style="color: #4b69c6"&gt;subsume&lt;/span&gt; (&lt;span style="color: #4b69c6"&gt;iconst_u&lt;/span&gt; ty &lt;span style="color: #b60157"&gt;0&lt;/span&gt;)))&lt;/code&gt;&lt;/pre&gt;&lt;/code&gt;&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;Another example is tinygrad’s pattern system:&lt;/p&gt;
      &lt;p&gt;&lt;code&gt;&lt;pre&gt;&lt;code data-lang="python"&gt;(&lt;span style="color: #4b69c6"&gt;UPat&lt;/span&gt;(Ops.AND, src&lt;span style="color: #d73948"&gt;=&lt;/span&gt;(&lt;br&gt;   UPat.&lt;span style="color: #4b69c6"&gt;var&lt;/span&gt;(&lt;span style="color: #198810"&gt;"&lt;/span&gt;&lt;span style="color: #198810"&gt;x&lt;/span&gt;&lt;span style="color: #198810"&gt;"&lt;/span&gt;),&lt;br&gt;   &lt;span style="color: #4b69c6"&gt;UPat&lt;/span&gt;(Ops.SHL, src&lt;span style="color: #d73948"&gt;=&lt;/span&gt;(&lt;br&gt;     UPat.&lt;span style="color: #4b69c6"&gt;const&lt;/span&gt;(&lt;span style="color: #b60157"&gt;1&lt;/span&gt;),&lt;br&gt;     UPat.&lt;span style="color: #4b69c6"&gt;var&lt;/span&gt;(&lt;span style="color: #198810"&gt;"&lt;/span&gt;&lt;span style="color: #198810"&gt;b&lt;/span&gt;&lt;span style="color: #198810"&gt;"&lt;/span&gt;)))),&lt;br&gt; &lt;span style="color: #d73948"&gt;lambda&lt;/span&gt; x,b: &lt;span style="color: #4b69c6"&gt;UOp&lt;/span&gt;(Ops.BIT_TEST, src&lt;span style="color: #d73948"&gt;=&lt;/span&gt;(x, b)))&lt;/code&gt;&lt;/pre&gt;&lt;/code&gt;&lt;/p&gt;
      &lt;p&gt;Fun fact: tinygrad actually decompiles the python code inside the second element of the pair, and runs multiple optimization passes on that.&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;&lt;br&gt;This approach is used by many popular compilers such as LLVM, GCC, and Cranelift for peephole optimizations and code generation.&lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h3&gt;Advantages&lt;/h3&gt;
      &lt;ul&gt;
        &lt;li&gt;&lt;strong&gt;debugging and tracking of rewrites, source locations, and debug information can be done properly&lt;/strong&gt;&lt;/li&gt;
        &lt;li&gt;patterns themselves can be inspected and modified programmatically.&lt;/li&gt;
        &lt;li&gt;they are easier to use and read than manual pattern matching in the source code.&lt;/li&gt;
      &lt;/ul&gt;
      &lt;p&gt;&lt;br&gt;There is however an even better alternative:&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h2&gt;Pattern Matching Dialects&lt;/h2&gt;
      &lt;p&gt;I will also put this method into the category of “structured pattern matching”.&lt;/p&gt;
      &lt;p&gt;&lt;br&gt;The main example of this is MLIR, with the &lt;code&gt;&lt;code&gt;pdl&lt;/code&gt;&lt;/code&gt; and the &lt;code&gt;&lt;code&gt;transform&lt;/code&gt;&lt;/code&gt; dialects. Sadly few projects/people use these dialects, and instead do pattern matching in C++ code. Probably because the dialects aren’t documented very well.&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h3&gt;What are compiler dialects?&lt;/h3&gt;
      &lt;p&gt;Modern compilers, especially multi-level compilers, such as MLIR, have their operations grouped in “dialects”.&lt;/p&gt;
      &lt;p&gt;Each dialect either represents specific kinds of operations, like arithmetic operations, or a specific backend’s/frontend’s operations, such as the &lt;code&gt;&lt;code&gt;llvm&lt;/code&gt;&lt;/code&gt;, &lt;code&gt;&lt;code&gt;emitc&lt;/code&gt;&lt;/code&gt;, and the &lt;code&gt;&lt;code&gt;spirv&lt;/code&gt;&lt;/code&gt; dialects in MLIR.&lt;/p&gt;
      &lt;p&gt;Dialects commonly contain operations, data types, as well as optimization and dialect conversion passes.&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h3&gt;Core Concept&lt;/h3&gt;
      &lt;p&gt;The IR patterns and transformations are represented using the compiler’s IR. This is mostly done in a separate dialect, with dedicated operations for operating on IR.&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h3&gt;Examples&lt;/h3&gt;
      &lt;p&gt;MLIR’s &lt;code&gt;&lt;code&gt;pdl&lt;/code&gt;&lt;/code&gt; dialect can be used to replace &lt;code&gt;&lt;code&gt;arith.addi&lt;/code&gt;&lt;/code&gt; with &lt;code&gt;&lt;code&gt;my.add&lt;/code&gt;&lt;/code&gt; like this:&lt;/p&gt;
      &lt;p&gt;&lt;code&gt;&lt;pre&gt;&lt;code data-lang="llvm"&gt;pdl.pattern &lt;span style="color: #d73948"&gt;@replace_addi_with_my_add&lt;/span&gt; : benefit(&lt;span style="color: #b60157"&gt;1&lt;/span&gt;) {&lt;br&gt;  &lt;span style="color: #d73948"&gt;%arg0&lt;/span&gt; = pdl.operand&lt;br&gt;  &lt;span style="color: #d73948"&gt;%arg1&lt;/span&gt; = pdl.operand&lt;br&gt;  &lt;span style="color: #d73948"&gt;%op&lt;/span&gt; = pdl.operation &lt;span style="color: #198810"&gt;"arith.addi"&lt;/span&gt;(&lt;span style="color: #d73948"&gt;%arg0&lt;/span&gt;, &lt;span style="color: #d73948"&gt;%arg1&lt;/span&gt;)&lt;br&gt;&lt;br&gt;  pdl.rewrite &lt;span style="color: #d73948"&gt;%op&lt;/span&gt; {&lt;br&gt;    &lt;span style="color: #d73948"&gt;%new_op&lt;/span&gt; = pdl.operation &lt;span style="color: #198810"&gt;"my.add"&lt;/span&gt;(&lt;span style="color: #d73948"&gt;%arg0&lt;/span&gt;, &lt;span style="color: #d73948"&gt;%arg1&lt;/span&gt;) -&gt; (&lt;span style="color: #d73948"&gt;%op&lt;/span&gt;)&lt;br&gt;    pdl.replace &lt;span style="color: #d73948"&gt;%op&lt;/span&gt; with &lt;span style="color: #d73948"&gt;%new_op&lt;/span&gt;&lt;br&gt;  }&lt;br&gt;}&lt;/code&gt;&lt;/pre&gt;&lt;/code&gt;&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h3&gt;Advantages&lt;/h3&gt;
      &lt;ul&gt;
        &lt;li&gt;the pattern matching infrastructure can optimize it’s own patterns: The compiler can operate on patterns and rewrite rules like they are normal operations. This removes the need for special infrastructure regarding pattern matching DSLs.&lt;/li&gt;
        &lt;li&gt;the compiler could AOT compile patterns&lt;/li&gt;
        &lt;li&gt;the compiler could optimize, analyze, and combine patterns to reduce compile time.&lt;/li&gt;
        &lt;li&gt;IR (de-)serialization infrastructure in the compiler can also be used to exchange peephole optimizations.&lt;/li&gt;
        &lt;li&gt;bragging rights: your compiler represents its patterns in it’s own IR&lt;/li&gt;
      &lt;/ul&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h3&gt;Combining with a DSL&lt;/h3&gt;
      &lt;p&gt;I recommend having a pattern matching / rewrite DSL, that transpiles to pattern matching / rewrite dialect operations.&lt;/p&gt;
      &lt;p&gt;The advantage of this over just having a rewrite dialect is that it makes patterns even more readable (and maintainable!)&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h3&gt;E-Graphs&lt;/h3&gt;
      &lt;p&gt;&lt;span class style="white-space:nowrap;"&gt;&lt;a href="https://en.wikipedia.org/wiki/E-graph"&gt;E-Graphs&lt;/a&gt;&lt;/span&gt; are magical datastructures that can be used to efficiently encode all possible transformations, and then select the best transformation.&lt;/p&gt;
      &lt;p&gt;An example implementation is &lt;a href="https://egraphs-good.github.io/"&gt;egg&lt;/a&gt;&lt;/p&gt;
      &lt;p&gt;Even though E-Graphs solve most problems, I still recommend using a pattern matching dialect, especially in multi-level compilers, to be more flexible, and have more future-proof pattern matching, or you decide that you want to match some complex patterns manually.&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h2&gt;More Advantages of Structured Pattern Matching&lt;/h2&gt;
      &lt;h3&gt;Smart Pattern Matchers&lt;/h3&gt;
      &lt;p&gt;Instead of brute-forcing all peephole optimizations (of which there can be a LOT in advanced compilers), the compiler can organize all the patterns to provide more efficient matching. I didn’t yet investigate how to do this. If you have any ideas regarding this, please &lt;a href="https://alex.vxcc.dev"&gt;contact me.&lt;/a&gt;&lt;/p&gt;
      &lt;p&gt;There are other ways to speed up the pattern matching and rewrite process using this too.&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h3&gt;Reversible Transformations&lt;/h3&gt;
      &lt;p&gt;I don’t think that there currently is any compiler that does this. If you do know one, again, please &lt;a href="https://alex.vxcc.dev"&gt;contact me.&lt;/a&gt;&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;&lt;br&gt;Optimizing compilers typically deal with code (mostly written by people) that is on a lower level than the compiler theoretically supports. For example, humans tend to write code like this for extracting a bit: &lt;code&gt;&lt;code data-lang="c"&gt;x &lt;span style="color: #d73948"&gt;&amp;amp;&lt;/span&gt; (&lt;span style="color: #b60157"&gt;1&lt;/span&gt; &lt;span style="color: #d73948"&gt;&amp;lt;&amp;lt;&lt;/span&gt; b)&lt;/code&gt;&lt;/code&gt;, but compilers tend to have a high-level bit test operation (with exceptions). A reason for having higher-level primitives is that it allows the compiler to do more high-level optimizations, but also some target architectures have a bit test operation, that is more optimal.&lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;&lt;br&gt;This is not just the case for “low-level” things like bit tests, but also high level concepts, like a reduction over an array, or even the implementation of a whole algorithm. For example LLVM, since recently, can detect implementations of &lt;a href="https://en.wikipedia.org/wiki/Cyclic_redundancy_check"&gt;CRC.&lt;/a&gt;&lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;&lt;br&gt;LLVM actually doesn’t have many dedicated operations like a bit-test operation, and instead canonicalizes all bit-test patterns to &lt;code&gt;&lt;code data-lang="c"&gt;x &lt;span style="color: #d73948"&gt;&amp;amp;&lt;/span&gt; (&lt;span style="color: #b60157"&gt;1&lt;/span&gt; &lt;span style="color: #d73948"&gt;&amp;lt;&amp;lt;&lt;/span&gt; b) &lt;span style="color: #d73948"&gt;!=&lt;/span&gt; &lt;span style="color: #b60157"&gt;0&lt;/span&gt;&lt;/code&gt;&lt;/code&gt;, and matches for that in compiler passes that expect bit test operations.&lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;&lt;br&gt;Now let’s go back to the &lt;code&gt;&lt;code data-lang="c"&gt;x &lt;span style="color: #d73948"&gt;&amp;amp;&lt;/span&gt; (&lt;span style="color: #b60157"&gt;1&lt;/span&gt; &lt;span style="color: #d73948"&gt;&amp;lt;&amp;lt;&lt;/span&gt; b)&lt;/code&gt;&lt;/code&gt; (bit test) example. Optimizing compilers should be able to detect that, and other bit test patterns (like &lt;code&gt;&lt;code data-lang="c"&gt;x &lt;span style="color: #d73948"&gt;&amp;amp;&lt;/span&gt; (&lt;span style="color: #b60157"&gt;1&lt;/span&gt; &lt;span style="color: #d73948"&gt;&amp;lt;&amp;lt;&lt;/span&gt; b) &lt;span style="color: #d73948"&gt;&gt;&lt;/span&gt; &lt;span style="color: #b60157"&gt;0&lt;/span&gt;&lt;/code&gt;&lt;/code&gt;), and then replace those with a bit-test operation. But they also have to be able to convert bit-test operations back to their implementation for compilation targets that don’t have a bit-test instruction. Currently, compiler backends do this by having separate patterns for converting bit-test to it’s dedicated operation, and back.&lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;A better solution is to associate a set of implementations with the bit test operation, and make the compiler &lt;strong&gt;automatically reverse&lt;/strong&gt; those to generate the best implementation (in the instruction selector for example).&lt;/p&gt;
      &lt;p&gt;Implementing pattern/transformation reversion can be challenging however, but it provides many benefits, and all “big” compilers should definitely do this, in my opinion.&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h3&gt;Runtime Library&lt;/h3&gt;
      &lt;p&gt;Compilers typically come with a runtime library that implement more complex operations that aren’t supported by most processors or architectures.&lt;/p&gt;
      &lt;p&gt;The implementation of those functions should also use that pattern matching dialect. This allows your backend to detect code written by users with a similar implementation as in the runtime library, giving you some additional optimizations for free.&lt;/p&gt;
      &lt;p&gt;I don’t think any compiler currently does this either.&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h2&gt;Problems with Pattern Matching&lt;/h2&gt;
      &lt;p&gt;The main problem is ordering the patterns.&lt;/p&gt;
      &lt;p&gt;As an example, consider these three patterns:&lt;/p&gt;
      &lt;p&gt;&lt;code&gt;&lt;pre&gt;&lt;code data-lang="lisp"&gt;&lt;span style="color: #74747c"&gt;;;&lt;/span&gt;&lt;span style="color: #74747c"&gt; A&lt;/span&gt;&lt;br&gt;(&lt;span style="color: #4b69c6"&gt;add&lt;/span&gt; x:Const y) =&gt; (&lt;span style="color: #4b69c6"&gt;add&lt;/span&gt; y x)&lt;br&gt;&lt;br&gt;&lt;span style="color: #74747c"&gt;;;&lt;/span&gt;&lt;span style="color: #74747c"&gt; B&lt;/span&gt;&lt;br&gt;(&lt;span style="color: #4b69c6"&gt;sub&lt;/span&gt; (&lt;span style="color: #4b69c6"&gt;add&lt;/span&gt; x y:Const) z:Const) =&gt; (&lt;span style="color: #4b69c6"&gt;lea&lt;/span&gt; x y (&lt;span style="color: #4b69c6"&gt;const_neg&lt;/span&gt; z))&lt;br&gt;&lt;br&gt;&lt;span style="color: #74747c"&gt;;;&lt;/span&gt;&lt;span style="color: #74747c"&gt; C&lt;/span&gt;&lt;br&gt;(&lt;span style="color: #4b69c6"&gt;add&lt;/span&gt; x &lt;span style="color: #b60157"&gt;1&lt;/span&gt;) =&gt; (&lt;span style="color: #4b69c6"&gt;inc&lt;/span&gt; x)&lt;/code&gt;&lt;/pre&gt;&lt;/code&gt;&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;Now what should the compiler do when it sees this:&lt;/p&gt;
      &lt;p&gt;&lt;code&gt;&lt;pre&gt;&lt;code data-lang="lisp"&gt;(&lt;span style="color: #4b69c6"&gt;sub&lt;/span&gt; (&lt;span style="color: #4b69c6"&gt;add&lt;/span&gt; &lt;span style="color: #b60157"&gt;5&lt;/span&gt; &lt;span style="color: #b60157"&gt;1&lt;/span&gt;) &lt;span style="color: #b60157"&gt;2&lt;/span&gt;)&lt;/code&gt;&lt;/pre&gt;&lt;/code&gt;&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;All three patterns would match:&lt;/p&gt;
      &lt;p&gt;&lt;code&gt;&lt;pre&gt;&lt;code data-lang="lisp"&gt;&lt;span style="color: #74747c"&gt;;;&lt;/span&gt;&lt;span style="color: #74747c"&gt; apply A&lt;/span&gt;&lt;br&gt;(&lt;span style="color: #4b69c6"&gt;sub&lt;/span&gt; (&lt;span style="color: #4b69c6"&gt;add&lt;/span&gt; &lt;span style="color: #b60157"&gt;5&lt;/span&gt; &lt;span style="color: #b60157"&gt;1&lt;/span&gt;) &lt;span style="color: #b60157"&gt;2&lt;/span&gt;) =&gt; (&lt;span style="color: #4b69c6"&gt;sub&lt;/span&gt; (&lt;span style="color: #4b69c6"&gt;add&lt;/span&gt; &lt;span style="color: #b60157"&gt;1&lt;/span&gt; &lt;span style="color: #b60157"&gt;5&lt;/span&gt;) &lt;span style="color: #b60157"&gt;2&lt;/span&gt;)&lt;br&gt;&lt;span style="color: #74747c"&gt;;;&lt;/span&gt;&lt;span style="color: #74747c"&gt; only B applies now&lt;/span&gt;&lt;br&gt;(&lt;span style="color: #4b69c6"&gt;sub&lt;/span&gt; (&lt;span style="color: #4b69c6"&gt;add&lt;/span&gt; &lt;span style="color: #b60157"&gt;1&lt;/span&gt; &lt;span style="color: #b60157"&gt;5&lt;/span&gt;) &lt;span style="color: #b60157"&gt;2&lt;/span&gt;) =&gt; (&lt;span style="color: #4b69c6"&gt;lea&lt;/span&gt; &lt;span style="color: #b60157"&gt;1&lt;/span&gt; &lt;span style="color: #b60157"&gt;5&lt;/span&gt; (&lt;span style="color: #4b69c6"&gt;const_neg&lt;/span&gt; &lt;span style="color: #b60157"&gt;2&lt;/span&gt;))&lt;br&gt;&lt;span style="color: #74747c"&gt;;;&lt;/span&gt;&lt;span style="color: #74747c"&gt; nothing applies anymore&lt;/span&gt;&lt;br&gt;&lt;br&gt;&lt;span style="color: #74747c"&gt;;;&lt;/span&gt;&lt;span style="color: #74747c"&gt; alternatively apply B&lt;/span&gt;&lt;br&gt;(&lt;span style="color: #4b69c6"&gt;sub&lt;/span&gt; (&lt;span style="color: #4b69c6"&gt;add&lt;/span&gt; &lt;span style="color: #b60157"&gt;5&lt;/span&gt; &lt;span style="color: #b60157"&gt;1&lt;/span&gt;) &lt;span style="color: #b60157"&gt;2&lt;/span&gt;) =&gt; (&lt;span style="color: #4b69c6"&gt;lea&lt;/span&gt; &lt;span style="color: #b60157"&gt;5&lt;/span&gt; &lt;span style="color: #b60157"&gt;1&lt;/span&gt; (&lt;span style="color: #4b69c6"&gt;const_neg&lt;/span&gt; &lt;span style="color: #b60157"&gt;2&lt;/span&gt;))&lt;br&gt;&lt;span style="color: #74747c"&gt;;;&lt;/span&gt;&lt;span style="color: #74747c"&gt; nothing applies anymore&lt;/span&gt;&lt;br&gt;&lt;br&gt;&lt;span style="color: #74747c"&gt;;;&lt;/span&gt;&lt;span style="color: #74747c"&gt; atlernatively apply C&lt;/span&gt;&lt;br&gt;(&lt;span style="color: #4b69c6"&gt;sub&lt;/span&gt; (&lt;span style="color: #4b69c6"&gt;add&lt;/span&gt; &lt;span style="color: #b60157"&gt;5&lt;/span&gt; &lt;span style="color: #b60157"&gt;1&lt;/span&gt;) &lt;span style="color: #b60157"&gt;2&lt;/span&gt;) =&gt; (&lt;span style="color: #4b69c6"&gt;sub&lt;/span&gt; (&lt;span style="color: #4b69c6"&gt;inc&lt;/span&gt; &lt;span style="color: #b60157"&gt;5&lt;/span&gt;) &lt;span style="color: #b60157"&gt;2&lt;/span&gt;)&lt;br&gt;&lt;span style="color: #74747c"&gt;;;&lt;/span&gt;&lt;span style="color: #74747c"&gt; nothing applies anymore&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/code&gt;&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;Now which of those transformations should be performed?&lt;/p&gt;
      &lt;p&gt;This is not as easy to solve as it seems, especially in the context of instruction selection (specifically scheduling), where the performance on processors depends on a sequence of instructions, instead of just a single instruction.&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h3&gt;Superscalar CPUs&lt;/h3&gt;
      &lt;p&gt;Modern processor architecture features like superscalar execution make this even more complicated.&lt;/p&gt;
      &lt;p&gt;&lt;br&gt;As a simple, &lt;strong&gt;unrealistic&lt;/strong&gt; example, let’s imagine a CPU (core) that has one bit operations execution unit, and two ALU execution units / ports.&lt;br&gt;This means that the CPU can execute two instructions in the ALU unit and one instruction in the bit ops unit at the same time.&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;One might think that always optimizing &lt;code&gt;&lt;code&gt;a &amp;amp; (1 &amp;lt;&amp;lt; b)&lt;/code&gt;&lt;/code&gt; to a bit test operation is good for performance. But in this example, that is not the case.&lt;/p&gt;
      &lt;p&gt;If we have a function that does a lot of bitwise operations next to each other, and the compiler replaces all bit tests with bit test operations, suddenly all operations depend on the bit ops unit, which means that instead of executing 3 instructions at a time (ignoring pipelining), the CPU can only execute one instruction at a time.&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;This shows that we won’t know if an optimization is actually good, until we are at a late point in the compilation process where we can simulate the CPU’s instruction scheduling.&lt;/p&gt;
      &lt;p&gt;This does not only apply to instruction selection, but also to more higher-level optimizations, such as loop and control flow related optimizations.&lt;/p&gt;
    &lt;/div&gt;
    &lt;div style="
        
        
        
        
        "&gt;
      &lt;p&gt;&lt;br&gt;&lt;/p&gt;
      &lt;h2&gt;Conclusion&lt;/h2&gt;
      &lt;p&gt;One can see how pattern matching dialects are the best option to approach pattern matching.&lt;/p&gt;
      &lt;p&gt;&lt;br&gt;Someone wanted me to insert a takeaway here, but I won’t.&lt;/p&gt;
      &lt;p&gt;&lt;br&gt;PS: I’ll hunt down everyone who still decides to do pattern matching in their compiler source after reading this article.&lt;/p&gt;
    &lt;/div&gt;
  &lt;/body&gt;
&lt;/html&gt;
</content:encoded><author>alexander.nutz@vxcc.dev (Alexander Nutz)</author><guid isPermaLink="false">https://vxcc.dev/alex/compiler-pattern-matching.typ.desktop.html</guid></item><item><title>Making the favicon</title><link>article-favicon.typ.desktop.html</link><description>It turns out that websites need a favicon, and making one is hard...</description><author>alexander.nutz@vxcc.dev (Alexander Nutz)</author><guid isPermaLink="false">https://vxcc.dev/alex/article-favicon.typ.desktop.html</guid></item></channel></rss>