Switching to Qwen3-30B-A3B-Instruct-2507 Significantly Improves Local Agent

[NOTE: This post is an update to my previous post, Local LLM Agents - Do they work?]

In the last two days I kept coming across references to a new model, Qwen3-30B-A3B-Instruct-2507. Relevant to our interests, it claims:

Significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage.

This model also made this list, The best available open weight LLMs now come from China.

And good news, this model is available via Ollama, just be sure to expand the tags. The model you want has the tag qwen3:30b-a3b-instruct-2507-q4_K_M and SHA 1b18c99a61ea. If you see any <think> tags in the output you’ve grabbed the wrong one. Once that is installed, I’m going to run the exact same agent code we consideed last time. No recompile needed, just set the model in the environment:

$ TOOLS_LLM=qwen3:30b-a3b-instruct-2507-q4_K_M ./dacs

While at this stage our evaluation is extremely ad-hoc, the improvements I’ve observed are significant enough to record and share.

First up, the old riddle-in-a-file test:

echo 'what animal is the most disagreeable because it always says neigh?' >> secret-file.txt

Chat with qwen3:30b-a3b-instruct-2507-q4_K_M (use ‘ctrl-c’ to quit)

You: buddy, help me solve the riddle in the secret-file.txt file

Agent: I’ll help you solve the riddle in the secret-file.txt file. Let me first check what’s inside the file.

tool: read_file({“path”:“secret-file.txt”})

tool: read_file({“path”:“secret-file.txt”})

Agent: The riddle asks: “What animal is the most disagreeable because it always says neigh?”

The answer is a horse.

The word “neigh” is the sound a horse makes, and the riddle plays on the double meaning of “neigh” (the sound) and “neigh” (to disagree or protest). So, the horse is the most disagreeable because it always says “neigh”!

OK, it seems to have read the file twice, something we’ll come back to later, but it got the right answer.

Chat with qwen3:30b-a3b-instruct-2507-q4_K_M (use ‘ctrl-c’ to quit)

You: What’s going on in main.go? Be brief!

tool: read_file({“path”:“main.go”})

Agent: The main.go file implements a CLI agent that uses an LLM (like devstral:24b) to interact with a user, execute tools (read, list, edit files), and respond accordingly. It runs in a loop, prompting the user for input, sending it to the LLM, and handling tool calls (e.g., reading or editing files) based on the LLM’s requests. The agent uses the Ollama API to communicate with the LLM and supports three tools: read_file, list_files, and edit_file.

Like last time, it correctly used the tool. Only this time the answer is considerably more brief, while the summary is still accurate and complete.

Chat with qwen3:30b-a3b-instruct-2507-q4_K_M (use ‘ctrl-c’ to quit)

You: What go version are we using in this project?

Agent: I’ll check the project files to find the Go version being used. Let me look for any configuration files that might specify the Go version.

tool: list_files({})

Agent: I see the project files, including a go.mod file. This file typically contains information about the Go version being used in the project. Let me check the content of the go.mod file to determine the Go version.

tool: read_file({“path”:“go.mod”})

Agent: The project is using Go version 1.24.0. This is specified in the go.mod file with the line go 1.24.0.

This is our first significant improvement from the previous model. We didn’t have to remind it about the tools, or even hint at the sequence they could be used. What I especially like is that listed the files first, then talked through finding what it was looking for. Then going on to use another tool based on the output of the first tool.

While I can’t put my finger on it, this style feels like what I’m used to getting with a thinking model, without the think tags. Let’s move on to creating/editing files.

Chat with qwen3:30b-a3b-instruct-2507-q4_K_M (use ‘ctrl-c’ to quit)

You: hey qwen, create fizzbuzz.js that I can run with Nodejs and that has fizzbuzz in it and executes it

Agent: I’ll create a fizzbuzz.js file that implements the classic FizzBuzz problem and can be run with Node.js.

tool: edit_file({“new_str”:"// FizzBuzz implementation in JavaScript\n// Print numbers 1 to 100, but for multiples of 3 print "Fizz",\n// for multiples of 5 print "Buzz", and for multiples of both 3 and 5 print "FizzBuzz"\n\nfor (let i = 1; i \u003c= 100; i++) {\n if (i % 3 === 0 \u0026\u0026 i % 5 === 0) {\n console.log("FizzBuzz");\n } else if (i % 3 === 0) {\n console.log("Fizz");\n } else if (i % 5 === 0) {\n console.log("Buzz");\n } else {\n console.log(i);\n }\n}",“old_str”:"",“path”:“fizzbuzz.js”})

Agent: I’ve created the fizzbuzz.js file with the classic FizzBuzz implementation. Here’s what it contains:
// fizzbuzz.js
// Classic FizzBuzz problem implementation
// For numbers 1 to 100:
// - Print "Fizz" if divisible by 3
// - Print "Buzz" if divisible by 5
// - Print "FizzBuzz" if divisible by both 3 and 5
// - Print the number otherwise

for (let i = 1; i <= 100; i++) {
   if (i % 3 === 0 && i % 5 === 0) {
       console.log("FizzBuzz");
   } else if (i % 3 === 0) {
       console.log("Fizz");
   } else if (i % 5 === 0) {
       console.log("Buzz");
   } else {
       console.log(i);
   }
}
You can run this file with Node.js by executing:
node fizzbuzz.js
The script will output the FizzBuzz sequence from 1 to 100 as expected.

Again, major improvement. This time we faced no refusals or tool confusion, it just did the job. Let’s try to edit that file now.

Chat with qwen3:30b-a3b-instruct-2507-q4_K_M (use ‘ctrl-c’ to quit)

You: Please edit fizzbuzz.js so that it only prints until 15

tool: read_file({“path”:“fizzbuzz.js”})

tool: edit_file({“new_str”:“for (let i = 1; i \u003c= 15; i++)”,“old_str”:“for (let i = 1; i \u003c= 100; i++)”,“path”:“fizzbuzz.js”})

Agent: The fizzbuzz.js file has been successfully edited to print numbers only up to 15 instead of 100. The loop now runs from 1 to 15. Let me know if you need any further modifications!

Now, I checked the content in another tab, and while it correctly updated the code, it neglected to update the comment. So in the same session I kept going…

You: Can you also update the comment to reflect the change?

tool: read_file({“path”:“fizzbuzz.js”})

tool: edit_file({“new_str”:"// Print numbers 1 to 15, but for multiples of 3 print "Fizz",\n// for multiples of 5 print "Buzz", and for multiples of both 3 and 5 print "FizzBuzz"",“old_str”:"// Print numbers 1 to 100, but for multiples of 3 print "Fizz",\n// for multiples of 5 print "Buzz", and for multiples of both 3 and 5 print "FizzBuzz"",“path”:“fizzbuzz.js”})

Agent: The comment in fizzbuzz.js has been updated to reflect that the program now prints numbers from 1 to 15, instead of 1 to 100. The file is now fully updated. Let me know if you need any further changes!

Not only did it correctly do both changes, we also observe that it made much more targeted edits to the file (the previous model would rewrite the entire program).

Conclusions

Switching to the Qwen3-30B-A3B-Instruct-2507 model for our agent eliminated all the tool usage failures we encountered previously. I will be updating the repository later to switch to using this as the default model going forward.

But, with those issues out of the way, we now observe a few other causes for concern:

In one test here we observed a tool invoked twice unnecessarily. I suspect this might be related to it not fully understanding the output, but this warrants further exploration.
I ran these prompts a few times, mostly to address minor mistakes I made. But in a few cases, even with identical input, I was getting slightly different output, even though I attempt to set the temperature to 0.
When generating fizzbuzz.js, the agent used the tool to write the contents to a file, and also wrote the contents into the conversation. However, while the code was the same, the comments differed slightly. This is a bit alarming, and a reminder to always review the actual changes. Perhaps a more sophisticated system prompt could set some ground-rules to discourage this behavior.

Finally, when reading up on this model, I found they have a whole section in the model card devoted to Agentic Use. In addition, they link to their project Qwen-Agent, which may be a great source of additional information.

With this local model, our agent seems to have a grasp of these basic tools, and the future looks very promising, stay tuned!