fix(rest): skip Hadoop-only vended storage credentials during resolution#3241
fix(rest): skip Hadoop-only vended storage credentials during resolution#3241plusplusjiajia wants to merge 2 commits into
Conversation
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that's incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
a1315d1 to
4a4f18c
Compare
|
Hi @kevinjqliu — would you have a chance to take a look at this when you get cycles? Small fix to _resolve_storage_credentials (the function from #3042). |
| _PLANNING_RESPONSE_ADAPTER = TypeAdapter(PlanningResponse) | ||
|
|
||
|
|
||
| def _is_hadoop_only_config(config: Properties) -> bool: |
There was a problem hiding this comment.
Config is a dict[str, str] according to the OpenAPI doc
There was a problem hiding this comment.
@rambleraptor Thanks, good point. Changed to dict[str, str] to match the OpenAPI spec.
| # Java S3FileIO falls back to the "s3" ROOT_PREFIX credential; scope it to | ||
| # schemes pyarrow's S3FileSystem handles so non-S3 schemes (gs://, abfs://, | ||
| # etc.) don't get handed s3.* keys. | ||
| if best_match is None and location.startswith(("s3://", "s3a://", "s3n://", "oss://")): |
There was a problem hiding this comment.
Wouldn't s3:// get caught by line 477 already?
There was a problem hiding this comment.
@rambleraptor Great catch! Narrowed the fallback to oss:// only.
| # Java S3FileIO falls back to the "s3" ROOT_PREFIX credential; scope it to | ||
| # schemes pyarrow's S3FileSystem handles so non-S3 schemes (gs://, abfs://, | ||
| # etc.) don't get handed s3.* keys. | ||
| if best_match is None and location.startswith(("s3://", "s3a://", "s3n://", "oss://")): |
There was a problem hiding this comment.
I understand that we want s3 prefixed credentials to get mapped to s3a + s3n. What's oss here?
There was a problem hiding this comment.
@rambleraptor Thanks for asking — OSS (Alibaba Cloud) is S3-API-compatible. This mirrors Java's S3FileIO.clientForStoragePath() ROOT_PREFIX fallback behavior.
ffdc84e to
7b73de3
Compare
7b73de3 to
0d085b0
Compare
0d085b0 to
23a1299
Compare
Rationale for this change
REST catalogs can return multiple StorageCredential entries per table to serve different client runtimes. A common pattern is one entry with Hadoop-style fs.* keys alongside a second entry with canonical s3.* / gs.* keys consumed by the cloud-native SDKs).
Java's FileIO implementations each filter vended credentials down to their own key namespace. S3FileIO.clientForStoragePath() only consumes entries with an s3-prefixed label (S3FileIO.java:413-414) and, when no URI prefix matches the storage path, falls back to the client keyed at the root "s3" prefix. pyiceberg has no HadoopFileIO, so Hadoop-style credential bundles have no consumer on the Python side; but _resolve_storage_credentials did a blind longest-prefix URI match across the full credential list, so when a Hadoop-style entry happened to be the longest URI-prefix match for a given location the Python FileIO ended up with fs.* keys it cannot use, and silently fell through to unauthenticated access.