Skip to content

feat: ibis-framework: Snowflake backend should pass force_microsecond_precision=True #11983

@kbzsl

Description

@kbzsl

Is your feature request related to a problem?

When using the Snowflake backend, to_polars() can fail with a schema mismatch error on timestamp columns. The root cause is that ibis.backends.snowflake.Backend._make_batch_iter calls cursor.fetch_arrow_batches() without passing force_microsecond_precision=True.

ArrowInvalid: Schema at index 6 was different: 
timestamp_column: timestamp[ns]
vs
timestamp_column: timestamp[us]

The ibis Snowflake backend fetches Arrow data in two places:

  • _make_batch_iter — called internally by expr.to_polars() and expr.to_pyarrow_batches()
  • Backend.to_pyarrow — called internally by expr.to_pyarrow()

What is the motivation behind your request?

The Snowflake connector already has a built-in fix: cursor.fetch_arrow_batches(force_microsecond_precision=True), which forces all timestamp columns to timestamp[us] across all batches.

Connector docstring (from snowflake-connector-python, cursor.py, SnowflakeCursorBase.fetch_arrow_batches):

force_microsecond_precision: When True, all timestamp columns are converted to microsecond precision, ensuring consistent schema across all batches. This is useful when your data contains timestamps outside the nanosecond range (1677-2262), such as '9999-12-31' or '0001-01-01'. When False (default), precision is determined per-batch based on the data, which may cause pyarrow schema mismatch errors when combining batches.

Note: the force_microsecond_precision parameter is exposed in fetch_arrow_all(), fetch_arrow_batches(), fetch_pandas_all() and fetch_pandas_batches().

Describe the solution you'd like

Implement the force_microsecond_precision parameter.

  1. add a new force_microsecond_precision parameter to ibis.snowflake.connect(). This gives users explicit control and preserves backward compatibility by defaulting to False. The parameter is stored on the backend instance at connect time and should be threaded through to both fetch methods.
# user-facing API
ibis_con = ibis.snowflake.connect(
    **params,
    force_microsecond_precision=True,  # new parameter
)
  1. as an alternative quick fix: to pass force_microsecond_precision=True* in the in places where ibis fetches Arrow data from Snowflake:
# ibis/backends/snowflake/__init__.py

# in _make_batch_iter (line 522):
for t in cur.fetch_arrow_batches(force_microsecond_precision=True)  # was: fetch_arrow_batches()

# in to_pyarrow (line 450):
res = cur.fetch_arrow_all(force_microsecond_precision=True)  # was: fetch_arrow_all()

This would immediately fix this issue, but it always forces microsecond precision. Therefore it is not a desirable solution as introduces a behavioral change.

What version of ibis are you running?

Environment

  • ibis-framework: 12.0.0
  • snowflake-connector-python: 4.3.0
  • Python: 3.12
  • OS: Windows

What backend(s) are you using, if any?

Snowflake

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureFeatures or general enhancements

    Type

    No type

    Projects

    Status

    backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions