By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: How AI Agents Helped Migrate a Data Lake From Snowflake to BigQuery | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > How AI Agents Helped Migrate a Data Lake From Snowflake to BigQuery | HackerNoon
Computing

How AI Agents Helped Migrate a Data Lake From Snowflake to BigQuery | HackerNoon

News Room
Last updated: 2026/02/02 at 8:15 PM
News Room Published 2 February 2026
Share
How AI Agents Helped Migrate a Data Lake From Snowflake to BigQuery | HackerNoon
SHARE

Problem Statement

Simple: Migrating the Data Lake from Snowflake to BigQuery.

| As-Is | To-Be |
|—-|—-|
| SELECT orderid::STRING AS orderidstr FROM orders; | SELECT CAST(orderid AS STRING) AS orderidstr FROM orders; |

During the Snowflake-to-BigQuery migration, the task was not only to move terabytes of data across platforms, but to do so efficiently and cost-effectively. At the same time, refactor the data pipelines.

That’s where the real complexity surfaced. We were following an ELT approach for the data lake, which meant the extraction layer would remain unchanged. But the loading and transformation logic had to be refactored. The technology stack included Python for development, AWS for infrastructure, and Snowflake as the source data store, all of which now needed to align with BigQuery’s ecosystem.

Over time, the platform had accumulated extensive SQL—complex queries, layered views, and tightly coupled stored procedures. Translating this logic was not a simple syntax conversion; it required rethinking how transformations were executed, optimized, and governed in a fundamentally different data environment.

Evolution of pattern detection and replacement

Taking a step back and checking what I used to do and what the options are now.

| ~~Regular Expressions (Regex)~~ |
|—-|
| ~~String Matching Algorithms~~ |
| ~~Traditional NLP~~ |
| ==Use AI Gents== |

Given the scale and complexity of the migration comprising thousands of Database objects(Tables, Views, Tasks, Procedures), hundreds of SQL scripts, and hundreds of ad hoc queries, it is evident that leveraging large language models embedded within the IDE was the most practical and effective approach for refactoring and validating SQL. Over the past few years, Generative AI has matured significantly, so the next step was to decide whether the instruction should be framed simply as a direct task—

  1. “Refactor the Snowflake SQL into a BigQuery-compatible format”—or should it establish clearer context and expertise, such as
  2. “Given your expertise in translating Snowflake SQL to BigQuery, translate the following SQL”? Alternatively, would a more guided approach be more effective,
  3. “You are a domain expert in translating Snowflake SQL to BigQuery; and here are example of few translations; now translate the sql of file …”?

AGENTS.md Adoption

Introduction

 The AGENTS.md file is a guide for AI agents, providing context and instructions to help them work on the project. It is an open-format file. Unlike the structured configurations found in `*.json` or `*.yml,` the md file  is a "human-to-machine" plain-text README. It is designed to give the high-level context, tribal knowledge, and specific behavioral instructions that code alone cannot convey. So no syntax error, no compilation error, and many more data engineer nightmares. 

Structure: Formatting & Hierarchy

A simple Markdown-style hierarchy is followed to parse the importance of different instructions:

# (Single Hash): Indicates the highest level of hierarchy (Project Vision or Major Sections).

  • ## (Double Hash): Used for the second-highest level (Specific Modules or Workflows).
  • ### (Triple Hash): Used for the third level and below (Detailed Tasks or Technical Nuances).

There are no specific constraints apart from above markers.

Sample File

Here is a sample Agents.md file. All the instructions and guidelines are in plain text.

The first line states the project vision and the expectations for AI Agents. The second line has instructions with details.

# Snowflake to BigQuery SQL Refactoring Guide
## Reusable Setup for Future Refactoring Tasks

This guide provides a systematic approach to refactoring Snowflake SQL to BigQuery SQL, 
reusing the patterns and tools established for the Adobe API refactoring.
---

One of the biggest challenges in this migration was related to handling the JSON structure. There are multiple pipelines that dump JSON-type response payloads into Snowflake variant data types. Snowflake gracefully handles them or parses them with the LATERAL FLATTEN keyword. However same is not the case in BigQuery. The section below in the md file covers most JSON data types in BigQuery. So a combination of UNNEST and JSON PARSE did the trick. First, identify which JSON structures to transform. Based on that UNNEST part is derived.

## Quick Start Checklist
### Step 0: MANDATORY - Detect JSON Structure First (CRITICAL)**
⚠️ ALWAYS run this diagnostic query BEFORE refactoring to avoid zero-row issues:**

sql
— Comprehensive JSON Structure Diagnostic
— Run this for EACH table with JSON content BEFORE refactoring
SELECT  
— Basic info 
COUNT(*) as totalrows, 
COUNTIF(content IS NOT NULL) as rows
withcontent, 
COUNTIF(SAFE.PARSE
JSON(content) IS NULL) as parsefailures,   
— JSON type (CRITICAL – determines parsing approach) 
JSON
TYPE(PARSEJSON(ANYVALUE(content))) as jsontype,   
— Array lengths for different paths (identifies correct path) 
MAX(ARRAY
LENGTH(JSONQUERYARRAY(PARSEJSON(content), ‘$’))) as directarraymax, 
MAX(ARRAY
LENGTH(JSONQUERYARRAY(PARSEJSON(content), ‘$.content’))) as contentarraymax, 
MAX(ARRAY
LENGTH(JSONQUERYARRAY(PARSEJSON(content), ‘$.data’))) as dataarraymax, 
MAX(ARRAY
LENGTH(JSONQUERYARRAY(PARSEJSON(content), ‘$.rows’))) as rowsarraymax,   
— Sample preview (shows actual structure) 
SUBSTR(ANY
VALUE(content), 1, 200) as samplepreview,   
— Check for double-quoted JSON (escaped JSON string) 
CASE     WHEN JSON
TYPE(PARSEJSON(ANYVALUE(content))) = ‘OBJECT’      
AND JSONTYPE(PARSEJSON(JSONVALUE(PARSEJSON(ANYVALUE(content)), ‘$.content’))) = ‘ARRAY’   
THEN ‘YES – Double-quoted JSON detected’   
WHEN JSON
TYPE(PARSEJSON(ANYVALUE(content))) = ‘STRING’   
THEN ‘YES – Needs double parsing’   
ELSE ‘NO’  END as isdoublequoted_json
FROM {project_id}.{dataset}.{table_name};

``**Decision Tree Based on Diagnostic Results:
**1. **If
jsontype=”ARRAY”anddirectarray_max > 0`:**   

  • Structure: [{...}, {...}] (direct array)   
  • Use: UNNEST(JSON_QUERY_ARRAY(PARSE_JSON(content), '$'))
  1. If json_type="OBJECT" and content_array_max > 0:   
  • Structure: {"content": [{...}, {...}]} (nested content)   
  • Use: UNNEST(JSON_QUERY_ARRAY(PARSE_JSON(content), '$.content'))
  1. If json_type="OBJECT" and rows_array_max > 0:   
  • Structure: {"rows": [{...}, {...}]} (nested rows)   
  • Use: UNNEST(JSON_QUERY_ARRAY(PARSE_JSON(content), '$.rows'))
  1. If is_double_quoted_json = 'YES':   
  • Structure: {"content": "[{"id":"..."}]"} (escaped JSON string)   
  • Use: UNNEST(JSON_QUERY_ARRAY(PARSE_JSON(JSON_VALUE(PARSE_JSON(content), '$.content')), '$'))   
  • This is the most common cause of zero-row issues!
  1. If json_type="OBJECT" and all array_max are NULL:   
  • Structure: Single object {...} (not array)   
  • Use: Direct extraction without UNNEST: JSON_VALUE(PARSE_JSON(content), '$.field')
Below are some common environment-based refactoring guidelines.

python

Step 1: Analyze the SQL File

  • [ ] Identify all Snowflake-specific syntax: 
  • LATERAL FLATTEN patterns 
  • ::type casting 
  • Snowflake functions (IFF, NVL, DECODE, etc.) 
  • USE WAREHOUSE, USE DATABASE, USE SCHEMA 
  • DELETE FROM ... INSERT INTO patterns 
  • QUALIFY clauses- [ ] List all tables involved (source and target)
  • [ ] Identify incremental load patterns
  • [ ] Note special handling (NaN, NULL, date formats, etc.)

Pattern 1: Type Casting

| Snowflake | BigQuery |
|———–|———-|
| value:field::string | JSON_VALUE(item, '$.field') |
| value:field::float | CAST(JSON_VALUE(item, '$.field') AS FLOAT64) |
| value:field::int | CAST(JSON_VALUE(item, '$.field') AS INT64) |
| value:field::date | CAST(JSON_VALUE(item, '$.field') AS DATE) |
| value:field::timestamp_ntz | CAST(JSON_VALUE(item, '$.field') AS TIMESTAMP) |
| value:field::numeric(10,2) | ROUND(CAST(JSON_VALUE(item, '$.field') AS NUMERIC), 2) |

Pattern 2: DELETE + INSERT → TRUNCATE + INSERT

“`sql
— Snowflake
DELETE FROM {edw}.schema.table;
INSERT INTO {edw}.schema.table …

— BigQuery
TRUNCATE TABLE {{projectid}}.EDW_SCHEMA.table;
INSERT INTO {{projectid}}.EDW_SCHEMA.table …

Pattern 3: Function Conversions

| Snowflake | BigQuery |
|———–|———-|
| IFF(cond, a, b) | IF(cond, a, b) |
| NVL(expr, default) | COALESCE(expr, default) |
| IFNULL(expr, default) | COALESCE(expr, default) |
| DECODE(expr, s1, r1, default) | CASE expr WHEN s1 THEN r1 ELSE default END |
| split(str, '/')[1] | SPLIT(str, '/')[SAFE_OFFSET(1)] |
| YEAR(date) | EXTRACT(YEAR FROM date) |
| TO_DATE(str) | CAST(str AS DATE) or PARSE_DATE('%Y-%m-%d', str) |
| TO_CHAR(num) | CAST(num AS STRING) |
| CHARINDEX(substr, str) | STRPOS(str, substr) |
| EDITDISTANCE(str1, str2) | EDIT_DISTANCE(str1, str2) |

### Pattern 4: QUALIFY Clause
“`sql
— Snowflake
SELECT …
QUALIFY ROW_NUMBER() OVER (PARTITION BY … ORDER BY …) = 1

— BigQuery
SELECT … FROM ( SELECT …,
ROW_NUMBER() OVER (PARTITION BY … ORDER BY …) AS rn
FROM …
)
WHERE rn = 1

### Pattern 5: Date/Time Functions

sql
— Snowflake
DATEADD(day, -90, CURRENTDATE())
DATEDIFF(day, date1, date2)
TO
TIMESTAMP(str, ‘YYYY-MM-DDTHH24:MI:SS’)

— BigQuery
DATESUB(CURRENTDATE, INTERVAL 90 DAY)
DATEDIFF(date2, date1, DAY)
PARSE
TIMESTAMP(‘%Y-%m-%dT%H:%M:%S’, str)

Finally, to end the md file, we can include validation of translated SQL. We can also ask the agent to connect BigQuery and do a dummy run.

python

Step 2: Validate

  • [ ] Run validation queries
  • [ ] Test each table individually
  • [ ] Fix any issues found
  • [ ] Document assumptions and changes
    “`

Overall, this is how the md file is structured. Just keep it in the repository root directory and instruct the IDE agent to follow it. The IDE can also reference different MD files depending on where the MD files are placed.

Conclusion

Wrapping this write-up, I would say translating 100s of batch SQL scripts and ad hoc queries, and validating them, would be a daunting and time-intensive effort. The introduction of AI-assisted workflows, powered by AGENTS.md file, made it simpler. By codifying domain knowledge, refactoring patterns, and validation rules in a single, human-readable guide, AGENTS.md established a reliable baseline for consistency and correctness. This approach not only reduced manual effort but also enabled the management of complexity at scale, making it a key enabler of the successful transition to BigQuery and laying a strong foundation for the future platform.

References

  • https://agents.md/
  • https://docs.cloud.google.com/bigquery/docs
  • https://docs.snowflake.com/en/

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article SpaceX Acquires Musk’s xAI to Fuel Orbital Data Center Plans SpaceX Acquires Musk’s xAI to Fuel Orbital Data Center Plans
Next Article iPhone Fold Could Redefine Battery Life and Shake Up Button Design iPhone Fold Could Redefine Battery Life and Shake Up Button Design
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

LimX Dynamics raises 0 million in Series B to scale humanoid robotics · TechNode
LimX Dynamics raises $200 million in Series B to scale humanoid robotics · TechNode
Computing
Save 55% on a Second Laptop Screen With This Limited-Time Deal
Save 55% on a Second Laptop Screen With This Limited-Time Deal
News
SpaceX, founded by Elon Musk, has just announced the purchase of xAI, founded by Elon Musk
SpaceX, founded by Elon Musk, has just announced the purchase of xAI, founded by Elon Musk
Mobile
unique training in the world
unique training in the world
Mobile

You Might also Like

LimX Dynamics raises 0 million in Series B to scale humanoid robotics · TechNode
Computing

LimX Dynamics raises $200 million in Series B to scale humanoid robotics · TechNode

1 Min Read
TSMC to begin construction of 1.4nm production line with  billion investment · TechNode
Computing

TSMC to begin construction of 1.4nm production line with $49 billion investment · TechNode

1 Min Read
Pony.ai launches robotaxi pilot program in Shenzhen · TechNode
Computing

Pony.ai launches robotaxi pilot program in Shenzhen · TechNode

1 Min Read
Lenovo unveils 0 V1 AI glasses with ultra-light 38g design · TechNode
Computing

Lenovo unveils $560 V1 AI glasses with ultra-light 38g design · TechNode

1 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?