Skip to content

Why AI Isn't Ready to Write Instrument Control Software

AI coding assistants have made remarkable strides, generating code across a wide range of applications. They are trained on vast datasets, extracting patterns and even demonstrating basic reasoning abilities. The promise is that AI's general reasoning—emerging from massive datasets—will eventually solve any problem, given the speed of modern computers.

However, when it comes to instrument control software (ICS), AI falls short—not just slightly, but fundamentally. ICS requires deep domain knowledge, hardware intuition, and the ability to diagnose failures that AI simply cannot match. Based on my experience working in ICS, AI is nowhere near capable of functioning on its own, and I doubt it will ever be more than a productivity tool used by experienced engineers.

The Challenge of Firmware

One of the biggest limitations of AI in ICS is firmware development. Unlike application software, professional-grade firmware that interacts directly with hardware is scarce in open-source datasets. AI has access to toy Arduino programs and sample code from STMicroelectronics, but not complete, functional systems. And when firmware fails, it often does so in obscure, unpredictable ways—locking up boards, corrupting memory, or introducing subtle timing issues.

For example, while putting together demo code for a Nucleo-144 board with Ethernet, I used AI to generate the code. It compiled and ran without errors—but did nothing. After hours of consulting user groups and tweaking the code, I got it working by modifying an old STM32 example to fit the new Hardware Abstraction Layer. But even then, there was a critical flaw: the DMA buffer locations required by the Ethernet controller overlapped with the area used by malloc()/free(). This led to intermittent crashes triggered by network traffic.

AI had no way of detecting this issue. The root cause became clear only because I have years of experience debugging low-level memory issues. Ultimately, I ran into undiscovered bugs in the latest STM32CubeIDE (which I reported) as well as flaws in existing examples. Firmware engineers develop a specific set of "battle scars"—experience that AI simply does not possess.

AI Can't Handle Undocumented Hardware Behaviors

Higher up the stack, ICS often involves controlling hardware like stepper motors over protocols such as RS232. Here, AI's inability to reason beyond documentation becomes even more apparent.

In one case, I was working with a stepper motor controller that supported resetting the coordinate system (i.e., setting the position to zero). This worked fine—most of the time. However, after extensive testing, I discovered that in certain command sequences at high speed, with the controller in closed-loop mode, the motor would enter an error state and drive continuously.

AI couldn't have caught this because the failure mode wasn't documented. It couldn't have reasoned through the problem, either. Only my experience writing similar software helped me deduce that the controller was still updating internal state after returning a completion status. I worked around the issue by carefully adjusting command timing, but an AI-generated solution would have failed catastrophically.

Another example of a hardware problem that can’t be anticipated is bad wiring. Prototype machines are often seat-of-the-pants affairs, with no EMF shielding and long wires dangling where they don’t belong. AI can be used to write tools that can detect hardware failures and characterize them, but only with very careful and detailed prompting. Again, an experienced engineer is required.

Debugging Complex Timing Issues in Embedded Systems

Recent studies have shown that AI does not do a good job debugging software. The timing issues when interacting with hardware take the problem to a new level. For example, one particularly tricky case involved an R&D prototype of a DNA sequencer that combined components from two other instruments. On startup, some devices intermittently failed to initialize.

Analyzing the logs, I noticed that bytes were not only being dropped but that later commands were overwriting earlier ones—suggesting a buffer overflow. After discussing with the FPGA engineer, we traced the issue to an internal RS232 ring buffer overflowing and wrapping around, corrupting commands. This only occurred on our instrument because it had more devices, increasing the command volume on startup.

Without experienced engineers, this problem would have remained a mystery. Someone might have resorted to adding sleep() calls—treating the symptom rather than diagnosing the cause. In fact, the lead electrical engineer initially proposed swapping boards, which would have wasted time without solving anything.

Internal inconsistency of responses

At the same time I was preparing this article, I had a question on some tax filing requirements involving form 5500EZ. There is plenty of information about this openly accessible on the web, so training data is not an issue. So, I went to Grok and asked the question. Its response contained a logical contradiction within the first (and only!) paragraph.

I wanted to include the transcript for your pleasure, but doing so would violate the license agreements for Grok. Suffice it to say, it said you don’t have to file if you closed the plan. Then, it said you must if it was the final year for the plan. Amused, I did what I often do as a hobby - made Grok tell me why it made such an obvious mistake and what it was going to do about it in the future. To its credit, it told me a way I could update my settings to force it to devote more resources to logical consistency, but it gave no guarantees. I have permanently added the following to my settings:

[Logical Consistency Check Required]

All is not lost, however - ChatGPT got the correct answer the first time.

What’s the lesson here? If you want an AI to do any serious work, you need to be aware of the strengths and weaknesses of each system, and be prepared to examine every line of what it comes up with. For writing  code, that means doing much more than a simple code review, where you implicitly assume that the author is not a confident idiot.

In ICS, I have found examining the product with a fine-toothed comb to be about as labor-intensive as writing it myself. The Singularity is coming, but I am getting more and more convinced that it’s still a ways off.

OK, AI falls short today, but what about tomorrow?

AI (a blanket term) has made remarkable progress. A lot of that progress involves pairing different specialized subsystems (like DeepSeek’s “Panel of Experts”) or designing feedback systems that continuously refine guesses (like AlphaFold). With generating code, it is reasonable to think that a paid service would devote the resources to develop effective simulators that could test and refine their output until it behaved correctly, especially for things like web browsers that already have simulators developed by Big Tech.

But that’s the problem - it’s all software. To pull that off when one of the players is a hardware component requires that they correctly and economically model the hardware component. There are tools like Renode that purport to do that for IoT devices, but it is unknown how many peripherals they support that are used in scientific instruments. Correctly modeling your custom PCB (which is likely trade secret) and the connection to all those peripherals is a tall order for an AI. I’m skeptical that the big players will decide to write a “Panel of Experts” system for custom firmware simulation.

Conclusion

These cases illustrate a fundamental limitation of AI in ICS: it lacks the ability to intuit failure modes, diagnose hardware-specific behaviors, and work from first principles. It can (and does) deliver obvious mistakes interwoven with true gems from its training set. While AI can be a useful assistant, it is no substitute for expert knowledge. In ICS, the most valuable tool remains the engineer who understands the intricacies of hardware, timing, and system interactions, and has a clear, logical mind, ensuring that everything works not just in theory, but in reality.