
AI-powered SQL generators perform well on simple prompts but often break on queries that involve grouped calculations, aggregation boundaries, or multi-table relationships. These failures are subtle, easy to miss, and costly in production.
This article examines one deceptively simple question: "What is the percentage of orders that have more than one product?" It walks through why two popular open-source models failed to answer it correctly, and how improving SQL queries with fine-tuning a model on a targeted dataset produced an executable result.
Why SQL Generation Breaks on Aggregation Queries
Answering a percentage-based aggregation question in SQL requires more than translating words into syntax. It requires understanding grouping semantics, aggregation order, and execution constraints. Generic SQL generation models are trained on broad datasets that do not always encode these rules precisely enough for edge cases.
The result is queries that look correct at a glance but fail the moment they hit a database engine.
The Prompt Being Tested
What is the percentage of orders that have more than one product?This prompt tests whether a model understands how to group orders by ID, count line items per order, filter for orders with more than one item, and then calculate a percentage across the full order table. It requires a subquery or HAVING clause. It cannot be solved with a flat aggregation.
SQLCoder2: What It Generated and Why It Failed
SQLCoder2 produced this query:
sql
SELECT CAST(COUNT(DISTINCT CASE WHEN COUNT(shopify_orderlineitem.id) > 1
THEN shopify_order.id END) AS FLOAT) / NULLIF(COUNT(DISTINCT shopify_order.id), 0) AS proportion
FROM shopify_order
JOIN shopify_orderlineitem ON shopify_order.id = shopify_orderlineitem.order_id;Execution error:
sql
SQL Error [42803]: ERROR: aggregate function calls cannot be nestedThe problem is structural. SQL does not allow aggregate functions like COUNT() to be nested directly inside another aggregate without a grouping boundary. To identify orders with multiple line items, you need a grouped subquery or a HAVING clause. This query provides neither.
SQLCoder-34B: What It Generated and Why It Failed
SQLCoder-34B produced this query:
sql
SELECT CAST(COUNT(DISTINCT o.id) filter (
WHERE COUNT(DISTINCT oi.product_id) > 1) AS FLOAT)
/ NULLIF(COUNT(DISTINCT o.id), 0) AS proportion
FROM shopify_order o
JOIN shopify_orderlineitem oi ON o.id = oi.order_id
GROUP BY o.id;Execution error:
sql
SQL Error [42803]: ERROR: aggregate functions are not allowed in FILTERThis query places an aggregate condition inside a FILTER clause, which SQL does not permit. Conditions like COUNT(*) > 1 must be evaluated in a grouped context using HAVING, not inside a filter predicate. The larger model made a different structural mistake, but the result was the same: a query that cannot execute.
Walk away with actionable insights on AI adoption.
Limited seats available!
What These Failures Have in Common
Both models produced syntactically plausible SQL that failed at runtime. The core issues were:
- Neither model correctly handled nested aggregation, which requires a subquery or
HAVINGclause - Both reached for complex structures like
CASEstatements andFILTERclauses when simpler patterns would have worked - Neither validated output against execution rules before returning it
- Both misidentified where the aggregation boundary needed to be placed
These are not random errors. They reflect systematic gaps in how generic SQL generation models handle multi-step aggregation logic.
The Fine-Tuning Approach
Improving SQL queries with fine-tuning starts with identifying the exact failure patterns. After seeing consistent structural errors in both models, a custom fine-tuning approach was used to explicitly teach correct aggregation patterns for this class of queries.
Building the Dataset
A custom dataset was built around the original prompt and variations involving grouped counts, percentage calculations, and multi-table joins. Each entry paired a natural-language question with an executable SQL query that followed correct aggregation rules.
The dataset contained 196 samples split as follows:
- Training set: 80% (approximately 157 entries)
- Validation set: 20% (approximately 39 entries)
Training Process
Fine-tuning completed in 43 minutes using a transfer-learning approach with Hugging Face Transformers, PEFT, and LoRA. LoRA reduced the number of trainable parameters while preserving the base model's general knowledge.
Training and validation loss curves remained closely aligned throughout, indicating stable learning without overfitting. The training loss decreased steadily, and the validation loss followed the same trend without significant divergence. This confirmed the model was generalizing to new examples rather than memorizing the training set.
What the Fine-Tuned Model Generated
sql
select
(COUNT(*) * 100.0 / (
select
COUNT(*)
from
shopify_order))
from
shopify_order
where
id in (
select
order_id
from
shopify_orderlineitem
group by
order_id
having
COUNT(*) > 1);This query executes correctly.
The subquery in the WHERE clause first identifies orders that have more than one line item using GROUP BY and HAVING. The outer query then counts how many orders meet that condition and divides by the total order count to produce a percentage. The * 100.0 ensures the result is returned as a percentage rather than a decimal ratio.
Why This Query Works Where the Others Failed
The fine-tuned model avoided nested aggregations entirely. Instead of trying to count inside a count, it separated the two concerns: identifying qualifying orders in a subquery, then calculating the proportion in the outer query.
This is the correct pattern for this class of problem, and it is exactly what the custom training dataset was designed to teach. For very large datasets, replacing the IN clause with EXISTS or a JOIN would improve performance, but the logic is correct as written.
Technical Stack
The fine-tuning pipeline used the following tools:
- PyTorch: primary deep learning framework for model training
- Hugging Face Transformers: access to pre-trained models and fine-tuning utilities
- PEFT: parameter-efficient fine-tuning methods including LoRA
- Datasets: data handling and preprocessing
- Accelerate: distributed training and mixed precision support
About LoRA
LoRA (Low-Rank Adaptation) is a fine-tuning technique developed by Microsoft in 2021. It adds pairs of low-rank matrices to the weights of the original model. Only these matrices are updated during training, while the base model weights remain frozen.
This significantly reduces the number of trainable parameters without sacrificing task-specific performance. It is what made a 43-minute fine-tuning run practical rather than requiring days of GPU compute.
What This Approach Gets Right
Fine-tuning a pre-trained model rather than training from scratch offers clear advantages. Training time was 43 minutes instead of days or weeks.
Walk away with actionable insights on AI adoption.
Limited seats available!
Computational requirements were a fraction of full training. The model retained general SQL knowledge from its pre-training while learning the specific aggregation patterns the dataset introduced.
Targeted datasets improve SQL generation accuracy in ways that prompt engineering alone cannot achieve. A model that has seen hundreds of examples of correct HAVING clause usage will apply that pattern reliably. A model that has only seen it occasionally will not.
Limitations of This Approach
The fine-tuned model is dataset-specific. It performs well on queries that match the patterns in its training data but may not generalize cleanly to significantly different query structures without additional training.
As database schemas evolve, the model may need to be retrained or updated, which requires ongoing effort. The quality of the output is also influenced by the quality of the base model. If the base model has systematic weaknesses, fine-tuning cannot fully correct them without a dataset large enough to override those patterns.
Conclusion
Improving SQL queries with fine-tuning is more effective than relying on prompt engineering or generic models alone. Off-the-shelf SQL generation models often fail on aggregation-heavy queries despite producing syntactically convincing output. Fine-tuning with a focused, execution-verified dataset enables models to learn correct aggregation patterns and produce reliable SQL for specific use cases.
The investment is small: 196 samples, 43 minutes of training, and a working query where two generic models failed. For teams relying on AI-assisted SQL generation in production, targeted fine-tuning is a more reliable path than prompt engineering alone.
Frequently Asked Questions
Why do AI SQL generators fail on aggregation queries?
Generic models are trained on broad datasets that do not always encode SQL execution rules precisely. Queries involving nested counts, grouped filters, or percentage calculations require specific structural patterns that models must be explicitly taught through targeted training data.
What is LoRA and why is it used for fine-tuning?
LoRA is a parameter-efficient fine-tuning technique that adds low-rank matrices to the model's existing weights. Only these matrices are updated during training, which significantly reduces compute requirements while still adapting the model to new tasks effectively.
How much data is needed to fine-tune an SQL model?
This case used 196 samples and produced a correct, executable result. For narrow, well-defined query types, a small high-quality dataset is often more effective than a large noisy one.
Can this approach work for other SQL query types?
Yes. The same fine-tuning methodology can be applied to query optimization, schema-specific generation, data cleaning queries, and other structured SQL tasks by building a targeted dataset around the relevant patterns.
What tools are used for SQL fine-tuning?
Common tools include PyTorch, Hugging Face Transformers, PEFT for parameter-efficient methods like LoRA, and the Datasets library for data preprocessing and loading.
Walk away with actionable insights on AI adoption.
Limited seats available!



