One solution I haven’t seen recommended much is to have a Claude instruction/skill that explicitly audits the diff of every upgrade, and force this manual audit as part of your upgrade workflow. This seems like it would work pretty reliably.
This is what many AI supply-chain security startups (like the one that posted the article) are already doing with all NPM packages, so save yourself the Claude tokens. All of these compromises were detected within minutes, but it takes some time (<1 hour) for NPM to unpublish all of the affected packages.
Sorry for my ignorance, but then couldn't we build this into NPM itself? So before a package is publicly available it would be quaranteened and checked.
Super dumb question as someone who has been using some form of AI for dev since 2023:
How does having an AI audit external code help? Can they not be prompt injected to ignore a malicious change?
I guess I am sort of concerned that they are a pretty thin layer and even if you put "DO NOT ALLOW PROMPT INJECTION", it's a bit like saying "make no mistakes". There _is_ a priority between `system` and `user` level messages as I had recalled, so a specifically made tool that has its own system prompt should prevent injection while asking Claude CLI could still allow for prompt injection.
There are prompt guard classifiers that can detect prompt injections, but they are not perfect (false positives, obfuscation) and should be only a part of the defense.
The concern is real and unsolved. I think security researchers have an advantage here because they still can fall back to manual audits if their automated analysis (or scores thereof) is off.