Is your feature request related to a problem?
When using the Snowflake backend, to_polars() can fail with a schema mismatch error on timestamp columns. The root cause is that ibis.backends.snowflake.Backend._make_batch_iter calls cursor.fetch_arrow_batches() without passing force_microsecond_precision=True.
ArrowInvalid: Schema at index 6 was different:
timestamp_column: timestamp[ns]
vs
timestamp_column: timestamp[us]
The ibis Snowflake backend fetches Arrow data in two places:
_make_batch_iter — called internally by expr.to_polars() and expr.to_pyarrow_batches()
Backend.to_pyarrow — called internally by expr.to_pyarrow()
What is the motivation behind your request?
The Snowflake connector already has a built-in fix: cursor.fetch_arrow_batches(force_microsecond_precision=True), which forces all timestamp columns to timestamp[us] across all batches.
Connector docstring (from snowflake-connector-python, cursor.py, SnowflakeCursorBase.fetch_arrow_batches):
force_microsecond_precision: When True, all timestamp columns are converted to microsecond precision, ensuring consistent schema across all batches. This is useful when your data contains timestamps outside the nanosecond range (1677-2262), such as '9999-12-31' or '0001-01-01'. When False (default), precision is determined per-batch based on the data, which may cause pyarrow schema mismatch errors when combining batches.
Note: the force_microsecond_precision parameter is exposed in fetch_arrow_all(), fetch_arrow_batches(), fetch_pandas_all() and fetch_pandas_batches().
Describe the solution you'd like
Implement the force_microsecond_precision parameter.
- add a new
force_microsecond_precision parameter to ibis.snowflake.connect(). This gives users explicit control and preserves backward compatibility by defaulting to False. The parameter is stored on the backend instance at connect time and should be threaded through to both fetch methods.
# user-facing API
ibis_con = ibis.snowflake.connect(
**params,
force_microsecond_precision=True, # new parameter
)
- as an alternative quick fix: to pass
force_microsecond_precision=True* in the in places where ibis fetches Arrow data from Snowflake:
# ibis/backends/snowflake/__init__.py
# in _make_batch_iter (line 522):
for t in cur.fetch_arrow_batches(force_microsecond_precision=True) # was: fetch_arrow_batches()
# in to_pyarrow (line 450):
res = cur.fetch_arrow_all(force_microsecond_precision=True) # was: fetch_arrow_all()
This would immediately fix this issue, but it always forces microsecond precision. Therefore it is not a desirable solution as introduces a behavioral change.
What version of ibis are you running?
Environment
- ibis-framework: 12.0.0
- snowflake-connector-python: 4.3.0
- Python: 3.12
- OS: Windows
What backend(s) are you using, if any?
Snowflake
Code of Conduct
Is your feature request related to a problem?
When using the Snowflake backend,
to_polars()can fail with a schema mismatch error on timestamp columns. The root cause is thatibis.backends.snowflake.Backend._make_batch_itercallscursor.fetch_arrow_batches()without passingforce_microsecond_precision=True.The ibis Snowflake backend fetches Arrow data in two places:
_make_batch_iter— called internally byexpr.to_polars()andexpr.to_pyarrow_batches()Backend.to_pyarrow— called internally byexpr.to_pyarrow()What is the motivation behind your request?
The Snowflake connector already has a built-in fix:
cursor.fetch_arrow_batches(force_microsecond_precision=True), which forces all timestamp columns totimestamp[us]across all batches.Connector docstring (from
snowflake-connector-python,cursor.py,SnowflakeCursorBase.fetch_arrow_batches):Note: the
force_microsecond_precisionparameter is exposed infetch_arrow_all(),fetch_arrow_batches(),fetch_pandas_all()andfetch_pandas_batches().Describe the solution you'd like
Implement the
force_microsecond_precisionparameter.force_microsecond_precisionparameter toibis.snowflake.connect(). This gives users explicit control and preserves backward compatibility by defaulting toFalse. The parameter is stored on the backend instance at connect time and should be threaded through to both fetch methods.force_microsecond_precision=True* in the in places whereibisfetches Arrow data from Snowflake:This would immediately fix this issue, but it always forces microsecond precision. Therefore it is not a desirable solution as introduces a behavioral change.
What version of ibis are you running?
Environment
What backend(s) are you using, if any?
Snowflake
Code of Conduct