<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Alza Bitz</title>
  <link href="https://alza-bitz.github.io/atom.xml" rel="self"/>
  <link href="https://alza-bitz.github.io"/>
  <updated>2025-09-15T10:53:59+00:00</updated>
  <id>https://alza-bitz.github.io</id>
  <author>
    <name>Alex Coyle</name>
  </author>
  <entry>
    <id>https://alza-bitz.github.io/clojure-support-for-popular-data-tools</id>
    <link href="https://alza-bitz.github.io/clojure-support-for-popular-data-tools"/>
    <title>Clojure Support for Popular Data Tools: A Data Engineer's Perspective, and a New Clojure API for Snowflake</title>
    <updated>2025-08-28T23:59:59+00:00</updated>
    <content type="html"><![CDATA[<p>In this article I look at the extent of Clojure support for some popular on-cluster data processing tools that Clojure users might need for their data engineering or data science tasks. Then for <a href='https://snowflake.com'>Snowflake</a> in particular <strong>I go further and present a new Clojure API.</strong></p><p>Why is the level of Clojure support important? As an example, consider that <a href='https://scicloj.org'>Scicloj</a> is mostly focused on cases where your data fits on a single machine. As such, if you need to work with a large dataset it will be necessary to compute on-cluster and extract a smaller result before continuing your data science task locally.</p><p>However, without sufficient Clojure support for on-cluster processing, anyone needing that facility for their data science or data engineering task would be forced to reach outside the Clojure ecosystem. That adds complexity in terms of interop, compatibility and overall stack requirements.</p><p>With that in mind, let's examine the level of Clojure support for some popular on-cluster data processing tools. For each tool I selected its official Clojure library if one exists, or if not the most popular and well-known community-supported alternative with at least 100 stars and 10 contributors on GitHub. I then used the following criteria against the library to classify it as "supported" or "support unknown":</p><ol><li>CI/CD build passing</li><li>Most recent commit less than 12 months ago</li><li>Most recent release less than 12 months ago</li><li>Maintainers responded to any issue or question less than 12 months ago</li><li>Maintainers either accepted or rejected any PR less than 12 months ago</li></ol><p>If I couldn't find any such library at all, I classified it as having "no support".</p><table><thead><tr><th>Tool Category</th><th>Supported</th><th>Support Unknown</th><th>No Support</th></tr></thead><tbody><tr><td><strong>On-cluster batch processing</strong></td><td></td><td>1. <a href='https://spark.apache.org'>Spark</a> (see <a href='#spark_interop_with_geni'>Spark Interop with Geni</a> below)</td><td></td></tr><tr><td><strong>On-cluster stream processing</strong></td><td></td><td>2. <a href='https://kafka.apache.org/documentation/streams'>Kafka Streams</a> (see <a href='#kafka_interop_with_jackdaw'>Kafka Interop with Jackdaw</a> below)</td><td>3. <a href='https://spark.apache.org/streaming'>Spark Structured Streaming</a>,<br>4. <a href='https://flink.apache.org'>Flink</a></td></tr><tr><td><strong>On-cluster batch and stream processing</strong></td><td></td><td></td><td>5. <a href='https://databricks.com'>Databricks</a> (see <a href='#spark_interop_with_geni'>Spark Interop with Geni</a> below),<br>6. <a href='https://snowflake.com'>Snowflake</a> (see <a href='#snowflake_interop_with_a_new_clojure_api!'>Snowflake Interop</a> below)</td></tr></tbody></table><p>Please note, I don't wish to make any critical judgments based on either the summary analysis above or the more detailed analysis below. The goal is to understand the situation with respect to Clojure support and highlight any gaps, although I suppose I am also inadvertently highlighting the difficulties of maintaining open source software!</p><h3 id="spark&#95;interop&#95;with&#95;geni">Spark Interop with Geni</h3><p><a href='https://github.com/zero-one-group/geni'>Geni</a> is the go-to library for Spark interop. Some months back, I was motivated to evaluate the coverage of Spark features. In particular, I wanted to understand what would be involved to support <a href='https://spark.apache.org/spark-connect/'>Spark Connect</a> as it would reduce the complexity of computing on-cluster directly from the Clojure REPL.</p><p>However, I found a number of issues that would need to be addressed in order to support Spark Connect and Databricks: </p><ol><li>Problems with the <a href='https://github.com/zero-one-group/geni/issues/345'>default session</a>.</li><li>Problems with <a href='https://github.com/zero-one-group/geni/issues/356'>support for Databricks</a>, although I suspect this is related to point 1.</li></ol><p>Also, in general by my criteria the support classification is "support unknown":</p><ol><li>CI/CD build <a href='https://github.com/zero-one-group/geni/actions'>failing.</a></li><li>Version <a href='https://cljdoc.org/d/zero.one/geni/0.0.42/doc/readme'>0.0.42 api docs</a> <a href='https://cljdoc.org/builds/73977'>broken</a>, also affects version 0.0.41</li><li>No commits since November 2023.</li><li>No releases since November 2023.</li><li>No PRs accepted or rejected since November 2023.</li><li>No response when attempting to contact the author or maintainers.</li></ol><h3 id="kafka&#95;interop&#95;with&#95;jackdaw">Kafka Interop with Jackdaw</h3><p><a href='https://github.com/FundingCircle/jackdaw'>Jackdaw</a> is the go-to library for Kafka interop. However, by my criteria the support classification is also "support unknown":</p><ol><li>No commits since August 2024.</li><li>No releases since December 2023.</li><li>No PRs accepted or rejected since August 2024. As a further example, <a href='https://github.com/FundingCircle/jackdaw/pull/374'>here's a PR</a> raised in May 2024 but not yet commented on either way by the maintainers.</li></ol><h3 id="snowflake&#95;interop&#95;with&#95;a&#95;new&#95;clojure&#95;api!">Snowflake Interop with a New Clojure API!</h3><p>Although the <a href='https://docs.snowflake.com/en/developer-guide/snowpark/java/index'>Snowpark</a> library has Java and Scala bindings, it doesn't provide anything for Clojure. As such, it's currently not possible to interact with Snowflake using the Clojure way.</p><p>To address this gap, I decided to try my hand at creating a <a href='https://github.com/alza-bitz/snowpark-clj'>Clojure API for Snowflake</a> as part of a broader effort to improve the overall situation regarding Clojure support for popular data tools.</p><p>The aim is to validate this approach as a foundation for enabling a wide range of data science or data engineering use cases from the Clojure REPL, in situations where Snowflake is the data warehouse of choice.</p><p>The <a href='https://github.com/alza-bitz/snowpark-clj/blob/main/README.md'>README</a> provides usage examples for all the current features, but I've copied the essential ones here to illustrate the API:</p><h4 id="load&#95;clojure&#95;data&#95;from&#95;local&#95;and&#95;save&#95;to&#95;a&#95;snowflake&#95;table">Load Clojure data from local and save to a Snowflake table</h4><pre><code class="lang-clojure">&#40;require '&#91;snowpark-clj.core :as sp&#93;&#41;

;; Sample data
&#40;def employee-data
  &#91;{:id 1 :name &quot;Alice&quot; :age 25 :department &quot;Engineering&quot; :salary 75000}
   {:id 2 :name &quot;Bob&quot; :age 30 :department &quot;Marketing&quot; :salary 65000}
   {:id 3 :name &quot;Charlie&quot; :age 35 :department &quot;Engineering&quot; :salary 80000}&#93;&#41;

;; Create session and save data
&#40;with-open &#91;session &#40;sp/create-session &quot;snowflake.edn&quot;&#41;&#93;
  &#40;-&gt; employee-data
      &#40;sp/create-dataframe session&#41;
      &#40;sp/save-as-table &quot;employees&quot; :overwrite&#41;&#41;&#41;
</code></pre><h4 id="compute&#95;over&#95;snowflake&#95;table(s)&#95;on-cluster&#95;and&#95;extract&#95;results&#95;locally">Compute over Snowflake table(s) on-cluster and extract results locally</h4><pre><code class="lang-clojure">&#40;with-open &#91;session &#40;sp/create-session &quot;snowflake.edn&quot;&#41;&#93;
  &#40;let &#91;table-df &#40;sp/table session &quot;employees&quot;&#41;&#93;
    &#40;-&gt; table-df
        &#40;sp/filter &#40;sp/gt &#40;sp/col table-df :salary&#41; &#40;sp/lit 70000&#41;&#41;&#41;
        &#40;sp/select &#91;:name :salary&#93;&#41;
        &#40;sp/collect&#41;&#41;&#41;&#41;
;; =&gt; &#91;{:name &quot;Alice&quot; :salary 75000} {:name &quot;Charlie&quot; :salary 80000}&#93;
</code></pre><p>As an early-stage proof-of-concept, it only covers the essential parts of the underlying API without being too concerned with performance or completeness. Other more advanced features are noted and planned, pending further elaboration.</p><p><strong>I hope you find it useful and I welcome any feedback or contributions!</strong></p>]]></content>
  </entry>
  <entry>
    <id>https://alza-bitz.github.io/scicloj-on-edtech-platforms</id>
    <link href="https://alza-bitz.github.io/scicloj-on-edtech-platforms"/>
    <title>Scicloj on EdTech Platforms: Enabling Clojure-based Data Science in the Browser</title>
    <updated>2025-08-27T23:59:59+00:00</updated>
    <content type="html"><![CDATA[<p>You may or may not be aware that the Clojure data science stack a.k.a. <a href='https://scicloj.github.io/'>Scicloj</a> has been gaining momentum in recent years. To give a few highlights, <a href='https://cnuernber.github.io/dtype-next/index.html'>dtype-next</a> / <a href='https://generateme.github.io/fastmath/fastmath.core.html'>fastmath</a> are comparable with <a href='https://scipy.org/'>scipy</a> / <a href='https://numpy.org/'>numpy</a> for numerical work, and <a href='https://techascent.github.io/tech.ml.dataset/walkthrough.html'>tech.ml.dataset</a> / <a href='https://github.com/scicloj/tablecloth'>tablecloth</a> are comparable with <a href='https://pandas.pydata.org/'>Pandas</a> for tabular data. Kira McLean's <a href='https://youtu.be/MguatDl5u2Q'>2023 Conj presentation</a> explains that feature parity is almost upon us, and also offers some reasoning on why data scientists are now considering Clojure as an alternative to Python or R.</p><p>However, the leading EdTech platforms don't have much support for Clojure so all the potential benefits of both the language and Scicloj are not currently accessible to those communities.</p><p><strong>The good news</strong> is that I have created <a href='https://github.com/alza-bitz/nrepl-ws-client'>a proof-of-concept for using a browser to write, load and evaluate Clojure code</a> running on a <a href='https://github.com/alza-bitz/nrepl-ws-server'>remote server using websockets</a>, processing the results for display using the Scicloj notebook library <a href='https://github.com/scicloj/clay'>clay</a>.</p><p>I don't believe this combination has been achieved before. It is a significant step when you consider that Scicloj is Java/JVM-based on account of the underlying math support, there is no Javascript or ClojureScript implementation and that is likely to remain the case.</p><p><strong>This work opens up the possibility for Clojure-based data science on e-learning platforms</strong> so that anyone anywhere can learn and experiment with the Scicloj stack.</p><p>Here are some examples of new e-learning content that could be unlocked..</p><p>Theory:</p><ul><li>Value vs state & functional programming</li><li>Concurrent programming.</li></ul><p>Clojure hands-on:</p><ul><li>Interactive programming and structural editing with the REPL</li><li>Data processing with lazy sequences and transducers</li><li>Data science notebooks covering stats, ML or LLM with rendered tabular data and charts.</li></ul><p>More recently I gave an update on my progress at the <a href='https://clojureverse.org/t/visual-tools-meeting-34-clojure-in-wasm-docker-nrepl-el-nrepl-ws-clay-summary-partial-recording/11452'>Scicloj Visual Tools #34 meetup</a>, including a live demo:</p><p><a href='https://youtu.be/i3x0z9mzWm0?si=LfzqCQTdFFBaAGjP&t=50'><img src="https://img.youtube.com/vi/i3x0z9mzWm0/hqdefault.jpg" alt="Watch the video" /></a></p><p>I hope you can appreciate the opportunity here. I'm happy to give a live demonstration to anyone who's interested!</p>]]></content>
  </entry>
  <entry>
    <id>https://alza-bitz.github.io/what-time-is-it</id>
    <link href="https://alza-bitz.github.io/what-time-is-it"/>
    <title>What Time Is It? Understanding the Complexity of Data Streaming Tools</title>
    <updated>2025-05-28T23:59:59+00:00</updated>
    <content type="html"><![CDATA[<h3 id="i&#95;argue&#95;that&#95;static&#95;documentation&#95;is&#95;insufficient&#95;to&#95;reason&#95;about&#95;the&#95;stateful&#95;operations&#95;of&#95;data&#95;streaming&#95;tools.">I argue that static documentation is insufficient to reason about the stateful operations of data streaming tools.</h3><p><img src="assets/images/what-time-is-it-256.jpg" alt="A representation of the complexity in data streaming tools" /></p><p>In a computer program, when values change over time we call this <strong>state</strong>. This is why we have two different words available to us for differentiating the context: the specific use of “state” instead of “value” signifies to the reader that we are intentionally composing two things, namely <strong>value</strong> and <strong>time</strong>.</p><p>Each of those concepts are simpler to reason about on their own, but when put together they require much more care. When you hear people saying “state is inherently complex”, this is what they are referring to. This is especially relevant when we are learning about <strong>data streaming tools</strong> as they need to consider state in many areas, and at scale: windowed aggregations, joins and other stateful operations, not to mention horizontal scaling, memory management through watermarks and checkpoints, fault tolerance and more.</p><p>So how best to understand the state management of these tools? Let’s take a look at what some of the most popular options provide to educate and inform users in this regard:</p><table><thead><tr><th>Tool</th><th><a href='https://flink.apache.org/'>Flink</a></th><th><a href='https://kafka.apache.org/documentation/streams/'>Kafka Streams</a></th><th><a href='https://spark.apache.org/streaming/'>Spark Structured Streaming</a></th><th><a href='https://storm.apache.org/'>Storm</a></th></tr></thead><tbody><tr><td>Word count of reference docs</td><td>20,042</td><td>45,492</td><td>19,908</td><td>28,682</td></tr><tr><td>Informing users on stateful operations:<br>t = written text<br>d = diagrams & charts<br>a = animations<br>u = unit test facility<br>s = simulator</td><td>t,d,u,s <a href='#fn-1' id='fnref1'><sup>1</sup></a></td><td>t,d,u</td><td>t,d,u</td><td>t,d,u</td></tr><tr><td>Execution plan checked against documented capabilities before running it?</td><td>Yes <a href='#fn-2' id='fnref2'><sup>2</sup></a></td><td>No <a href='#fn-3' id='fnref3'><sup>3</sup></a></td><td>Yes <a href='#fn-4' id='fnref4'><sup>4</sup></a></td><td>Yes <a href='#fn-5' id='fnref5'><sup>5</sup></a></td></tr><tr><td>If a simulator is available, where does it run?<br>1 = local<br>2 = browser + server / cloud<br>3 = browser only</td><td>N/A</td><td>N/A</td><td>N/A</td><td>N/A</td></tr></tbody></table><p>Based on the above, we can see that</p><ul><li>Despite their complexity, these tools are mostly limited to written documentation for educating and informing users on their stateful operations.</li><li>While they all provide unit test facilities, these are intended to test your usage or composition of these operations rather than understanding the operations themselves. One could also argue that unit testing is targeting a later phase of your project than “educate and inform”.</li><li>They are somewhat limited in any checks of their execution plans, and as a result things can slip through the cracks.</li></ul><p>With that in mind, consider again that stateful data streaming problems necessarily involve the consideration of time, and as such they are fundamentally a <strong>dynamic</strong> concern. By contrast, written documentation is of course only <strong>static</strong> and for that reason I will submit that it is <strong>inefficient and inadequate for the intended purpose</strong>.</p><p>Indeed, I was affected by this issue personally when I <a href='https://stackoverflow.com/questions/79476798/spark-structured-streaming-empty-result-for-a-stream-stream-inner-join-to-compu'>ran into trouble with one of these tools</a>. I still don't know if the problem I encountered is due to a misunderstanding of the (20,000 word) documentation or a <a href='https://issues.apache.org/jira/browse/SPARK-51399'>bug</a>.</p><hr/><p>So what’s the solution? Well, consider that we learn best by a combination <a href='https://practera.com/what-is-the-experiential-learning-theory-of-david-kolb'>of</a> <a href='https://citl.indiana.edu/teaching-resources/evidence-based/active-learning.html'>reading</a> <a href='https://www.psychologymadeeasy.in/posts/levels-of-processing-craik-and-lockhart'>and</a> <a href='https://en.wikipedia.org/wiki/Generation_effect'>doing</a> rather than by reading alone. The “doing” is something that happens in real time, and one way to achieve this is by <strong>simulation</strong>.</p><p>In our case, I will define a simulation thus:</p><p><i>A means to observe the effects of a stateful operation, where for the same input as given to the production equivalent, the simulation will give the same output.</i></p><p>Given the stated purpose of a simulator in our case is to educate and inform, I will also add to the definition that it must require zero installation or setup. Further, to reduce costs and complexity it should also require minimal resources and ideally be fully serverless or standalone in operation. Finally, since the goal is to represent stateful operations, it should be capable of representing those in a visual dynamic by using animated forms for example.</p><p><i>I think there’s an opportunity for these tools (or new ones) to provide <strong>visual simulators</strong> as the primary means of reasoning for their stateful operations, and also as a complement to their existing documentation.</i></p><p>So with that out of the way, if we could build such a simulator what would it look like, how would it work and how could it be built? Here’s a motivational blueprint!</p><ol><li>Make the simulator available in a web browser.</li><li>Write the core functions for the streaming solution and its stateful operations in a hosted language that compiles to code that can run in a browser. Then the same code can be used for both a production implementation and the browser-based simulation.</li><li>Represent unbounded inputs using generators over lazy sequences.</li><li>Define the execution plan specification and plan validation rules as data. Then, both the code that checks plans against the rules and any written reference guide can parse this same data, avoiding the possibility of inconsistencies.</li><li>Within the simulator, represent stateful operations as declarative example-based or property-based BDD style given-when-then constructs, with an animated accumulation of results over time.</li></ol><p>In conclusion, I hope you can appreciate the benefits that simulations would bring in this space, and I also hope to have suitably motivated other people in the community to take the baton!</p><hr/><h3 id="credits.">Credits.</h3><p><a href='https://www.linkedin.com/in/garciaigor'>Igor Garcia</a> for your feedback and advice: thank you 🙏</p><hr/><ol class='footnotes'><li id='fn-1'>Flink provides an <a href='https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/try-flink/flink-operations-playground'>operations playground</a> but it doesn’t specifically cover stateful operations. There’s also a worked example based on <a href='https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/try-flink/datastream'>fraud detection</a>, but the explanations are 100% written.<a href='#fnref1'>&#8617;</a></li><li id='fn-2'>Flink performs semantic checks for jobs defined using the Table API and SQL. However, the pre-execution validation is not exhaustive, and certain subtle errors or issues might only manifest as unexpected behavior or silent failures during runtime.<a href='#fnref2'>&#8617;</a></li><li id='fn-3'>Kafka Streams doesn’t have a distinct pre-execution validation phase in the traditional sense. Instead it relies on a combination of static type checking and the <a href='https://www.confluent.io/blog/test-kafka-streams-with-topologytestdriver/'>TopologyTestDriver</a> as part of a unit testing strategy.<a href='#fnref3'>&#8617;</a></li><li id='fn-4'>Spark Structured Streaming has a multi-layered validation process to ensure correctness and feasibility of computations before their execution. The <a href='https://books.japila.pl/spark-structured-streaming-internals/UnsupportedOperationChecker'>UnsupportedOperationChecker</a> enforces streaming-specific rules during the logical planning stage.<a href='#fnref4'>&#8617;</a></li><li id='fn-5'>Apache Storm's pre-execution topology checks focus on structural and configuration validity, and exceptions are thrown to indicate structural problems, configuration errors and authorization failures. Although it provides facilities for programmatically defining and inspecting topology structure and configuration, a dedicated validation API against documented capabilities is absent.<a href='#fnref5'>&#8617;</a></li></ol>]]></content>
  </entry>
  <entry>
    <id>https://alza-bitz.github.io/programmers-how-do-you-partition</id>
    <link href="https://alza-bitz.github.io/programmers-how-do-you-partition"/>
    <title>Programmers: How Do You Partition A Collection?</title>
    <updated>2025-02-22T23:59:59+00:00</updated>
    <content type="html"><![CDATA[<h3 id="i&#95;argue&#95;that&#95;the&#95;design&#95;of&#95;newer&#95;languages&#95;is&#95;more&#95;efficient&#95;for&#95;solving&#95;simple&#95;problems&#95;when&#95;compared&#95;to&#95;older&#95;languages.">I argue that the design of newer languages is more efficient for solving simple problems when compared to older languages.</h3><p><img src="assets/images/programmers-how-do-you-partition-256.jpg" alt="An overly complex machine for dividing a collection of candy into equal amounts" /></p><p>It's a good question! Let's start by defining the problem. A collection is defined here as any number of items, and partitioning is defined as the division of said items into n groups of equal size, with the last group possibly containing fewer than n items.</p><p>Well that seems simple enough to understand. It's a single task with only one dimension and no dependencies. I wouldn't be surprised if a child could apply this algorithm using real objects without understanding how to code it, so it should be relatively simple for us to implement, right?</p><p>Firstly, let's see how it could be done with the three most popular languages <a href='#fn-1' id='fnref1'><sup>1</sup></a> using Stack Overflow data as a guide (captured on 14th Feb):</p><p><strong>Python.</strong> Lots and lots of ways, based on the most popular post with <a href='https://stackoverflow.com/questions/312443/how-do-i-split-a-list-into-equally-sized-chunks'>1.7m views and 69 answers</a>.</p><p><strong>Java.</strong> Lots of ways, based on these popular posts with <a href='https://stackoverflow.com/questions/12026885/is-there-a-common-java-utility-to-break-a-list-into-batches'>250k views and 23 answers</a>, <a href='https://stackoverflow.com/questions/5824825/efficient-way-to-divide-a-list-into-lists-of-n-size'>166k views and 18 answers</a>, unless you can use Java 22. <a href='#fn-2' id='fnref2'><sup>2</sup></a></p><p><strong>Javascript.</strong> Lots and lots of ways, based on the most popular post with <a href='https://stackoverflow.com/questions/8495687/split-array-into-chunks'>1.3m views and 88 answers</a>, unless the environment supports Baseline 2024. <a href='#fn-3' id='fnref3'><sup>3</sup></a></p><p>In conclusion, for all these more popular languages we have </p><ul><li>Many different ways, with different syntax and different apis.</li><li>The suitability of each depends on the context (small / large, sync / async) and details of the data structure we're working with.</li><li>The availability of each depends on the version, of which there are many.</li></ul><hr/><p>By contrast, let's look at three much less popular languages <a href='#fn-4' id='fnref4'><sup>4</sup></a> again consulting Stack Overflow:</p><p><strong>Kotlin.</strong> Just a few ways, based on the most popular post with <a href='https://stackoverflow.com/questions/40699007/divide-list-into-parts'>37k views and 6 answers</a>. Although there are 6 answers, our problem can be solved using <code>kotlin.collections/chunked</code> in the standard library from version 1.2 onwards. <a href='#fn-5' id='fnref5'><sup>5</sup></a></p><p><strong>Rust.</strong> Just a few ways, based on the most popular post with <a href='https://stackoverflow.com/questions/67536734/how-to-partition-vector-of-results-in-rust'>6k views and 2 answers</a>. Although there are 2 answers, our problem can be solved using <code>trait.Iterator/array&#95;chunks</code> from the standard library. <a href='#fn-6' id='fnref6'><sup>6</sup></a></p><p><strong>Clojure.</strong> I couldn't find a specific Stack Overflow question for our case, but no matter: the top Google result takes me directly to <code>partition</code> and from there to <code>partition-all</code>. <a href='#fn-7' id='fnref7'><sup>7</sup></a></p><hr/><p>There seems to be a clear difference here. Despite the expected reduction in the number of post views, <strong>these less popular languages offer a single solution in their standard libraries that doesn't depend nearly as much on either the context or data structures involved</strong>.</p><p>Without getting into why this difference might exist (although see note 1 below), <i>we have a situation today where a large population of programmers have to either carry this complexity in their heads or disrupt their flow to look it up, and then make a suitable decision, just to do such a simple thing?</i></p><p>This might help us to understand why programmers are now using AI coding assistants to reduce context switching and work more efficiently on the actual problem they're trying to solve.</p><p>These new tools are feasible because datasets of questions and answers have grown over time to become large enough for LLMs to work effectively. However, I would submit that this growth has come not only from the popularity of these languages, but also from their shortcomings.</p><p><strong>Wouldn't it be better if these simple things were made easy, to have the solutions at hand, without reaching for AI?</strong></p><p>To be clear, I've seen <a href='https://youtu.be/oNhqqiKuUmw'>more complex examples</a> where AI assisted coding has significant benefits, but it wouldn't be needed in these simple cases if we stopped to reconsider our choices.</p><p>The alternative is surely a future where more and more output from these coding assistants ends up feeding the models used by those same assistants to serve these types of programming questions, and repeat. What could possibly go wrong? All this just to solve some simple problems.</p><p>A related issue concerning our choices is how well any given language is deemed to work with AI coding assistants. These tools might appear to be less effective for some languages because the LLMs don't have as much data to work with. However, we shouldn't naively assume this arises solely from a lack of popularity. We can see that some languages offer ready solutions for simple problems, and as a consequence we might expect considerably fewer questions and answers to be posted online in the first place.</p><p><strong>I think this is an opportunity to reflect on what we are doing, what we are using and where we are going.</strong></p><p>I think it's possible for us as programmers to do what's asked of us, and even do it with less disruption and more efficiency, but we can also focus on using AI where it actually adds value by enhancing our capabilities, instead of using it to compensate for the baggage brought by popular languages.</p><p>Programming could still be fun, rather than frustrating. Although if it's to be done by a machine, who cares right? Whilst there's no doubt that AI will fundamentally change the nature of programming in the coming years, through our choices and influence I hope you can see that the outcome for us is most definitely not inevitable.</p><p>Some thoughts to consider next time you're trying to do a simple thing and wondering why it's anything but!</p><hr/><h3 id="credits.">Credits.</h3><p><a href='https://www.linkedin.com/in/garciaigor'>Igor Garcia</a> for your feedback and advice: thank you 🙏</p><h3 id="notes.">Notes.</h3><p><strong>1.</strong>  I offer two hypotheses for why the most popular languages don't provide coverage for our case:</p><p>a) Due to incidental complexity resulting from their age and the prioritisation of backwards compatibility (the “shortcomings” and “baggage” I refer to in the article). The less popular languages I selected are all much younger so they can learn from the past and adopt more recent academic research, in particular concerning data structures, their abstractions and the functions that operate on them. </p><p>b) In recent times there has been a greater need for these kinds of functions over data, because even if you don't need it in your implementation you're more likely to need it in your tests. When these languages were conceived, even example-based unit testing wasn't a thing, but now we have property-based tests where the ability to generate data is essential.</p><p><strong>2.</strong> What I present here isn't particularly new, except perhaps for the analysis and AI perspective. Essentially it's just another manifestation of what happens when programmer convenience and replaceability are prioritised above other concerns, as described by Rich Hickey in his 2011 presentation <a href='https://youtu.be/LKtk3HCgTa8?t=546&si=v4mc0CTZOFGA87ay'>Simple Made Easy</a>.</p><p><strong>3.</strong> I'm not too familiar with Rust or Kotlin so if you think my analysis is incorrect, please let me know.</p><p><strong>4.</strong> Yes, I did use an LLM-based generator for the image in the header but I'm fine with that, apart from the fact that it did not let me give any attribution or reward to the creators of the source images used by its model.</p><hr/><ol class='footnotes'><li id='fn-1'><a href="https://spectrum.ieee.org/top-programming-languages-2024">https://spectrum.ieee.org/top-programming-languages-2024</a><a href='#fnref1'>&#8617;</a></li><li id='fn-2'><a href="https://openjdk.org/jeps/461">https://openjdk.org/jeps/461</a><a href='#fnref2'>&#8617;</a></li><li id='fn-3'><a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/groupBy">https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/groupBy</a><a href='#fnref3'>&#8617;</a></li><li id='fn-4'><a href="https://spectrum.ieee.org/top-programming-languages-2024">https://spectrum.ieee.org/top-programming-languages-2024</a><a href='#fnref4'>&#8617;</a></li><li id='fn-5'><a href="https://kotlinlang.org/api/core/kotlin-stdlib/kotlin.collections/chunked.html">https://kotlinlang.org/api/core/kotlin-stdlib/kotlin.collections/chunked.html</a><a href='#fnref5'>&#8617;</a></li><li id='fn-6'><a href="https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.array_chunks">https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.array_chunks</a><a href='#fnref6'>&#8617;</a></li><li id='fn-7'><a href="https://clojuredocs.org/clojure.core/partition-all">https://clojuredocs.org/clojure.core/partition-all</a><a href='#fnref7'>&#8617;</a></li></ol>]]></content>
  </entry>
</feed>
