Skip to content

Replaced all instances of gsutil commands with their gcloud equiv…#325

Open
johntbates wants to merge 2 commits into
mainfrom
pr-gcloud
Open

Replaced all instances of gsutil commands with their gcloud equiv…#325
johntbates wants to merge 2 commits into
mainfrom
pr-gcloud

Conversation

@johntbates

Copy link
Copy Markdown
Contributor

…alents.

@johntbates johntbates requested a review from PeterSu92 March 5, 2026 18:02
@bgorissen

Copy link
Copy Markdown

This doesn't work because the image is too old:

$ docker run -it gcr.io/google.com/cloudsdktool/cloud-sdk:294.0.0-slim
root@027bc5441bb5:/# gcloud storage
ERROR: (gcloud) Invalid choice: 'storage'.

Updating the image to cloud-sdk:572.0.0-slim resolves this error but introduces new ones because python 2has been removed. You need to change python to python3 (here and here).

Since we're updating the image, I recommend switching to cloud-sdk:572.0.0-alpine to reduce image pulling time (115 MB vs 358 MB; 572.0.0-stable is even smaller but lacks Python). With alpine you have to set CLOUDSDK_PYTHON=/usr/local/bin/python3 as otherwise it can't find python.

Combining these two suggestions, you need to make the following patches:

diff --git a/dsub/providers/google_batch.py b/dsub/providers/google_batch.py
index 73a35d4..a07d875 100644
--- a/dsub/providers/google_batch.py
+++ b/dsub/providers/google_batch.py
@@ -189,7 +189,7 @@ _CONTINUOUS_LOGGING_CMD = textwrap.dedent("""\
 
   # Prep the log filter script
   echo "${{{log_filter_var}}}" \
-    | python -c '{python_decode_script}' \
+    | python3 -c '{python_decode_script}' \
     > "{log_filter_script_path}"
   chmod a+x "{log_filter_script_path}"
 
@@ -528,6 +528,7 @@ class GoogleBatchJobProvider(google_utils.GoogleJobProviderBase):
         'STDOUT_PATH': '{}-stdout.log'.format(logging_prefix),
         'STDERR_PATH': '{}-stderr.log'.format(logging_prefix),
         'USER_PROJECT': user_project,
+        'CLOUDSDK_PYTHON': '/usr/local/bin/python3'
     }
     if include_filter_script:
       env[_LOG_FILTER_VAR] = repr(_LOG_FILTER_PYTHON)
diff --git a/dsub/providers/google_utils.py b/dsub/providers/google_utils.py
index 9839159..5cf274a 100644
--- a/dsub/providers/google_utils.py
+++ b/dsub/providers/google_utils.py
@@ -70,7 +70,7 @@ def make_runtime_dirs_command(script_dir: str, tmp_dir: str,
 
 # Action steps that interact with GCS need gcloud storage and Python.
 # Use the 'slim' variant of the cloud-sdk image as it is much smaller.
-CLOUD_SDK_IMAGE = 'gcr.io/google.com/cloudsdktool/cloud-sdk:294.0.0-slim'
+CLOUD_SDK_IMAGE = 'gcr.io/google.com/cloudsdktool/cloud-sdk:572.0.0-alpine'
 
 # Name of the data disk
 DATA_DISK_NAME = 'datadisk'
@@ -274,7 +274,7 @@ PREPARE_CMD = textwrap.dedent("""\
   {mk_runtime_dirs}
 
   echo "${{{script_var}}}" \
-    | python -c '{python_decode_script}' \
+    | python3 -c '{python_decode_script}' \
     > "{script_path}"
   chmod a+x "{script_path}"
 
@@ -347,6 +347,7 @@ class GoogleJobProviderBase(base.JobProvider):
       env['INPUT_{}'.format(idx)] = var.name
       env['INPUT_RECURSIVE_{}'.format(idx)] = str(int(var.recursive))
       env['INPUT_SRC_{}'.format(idx)] = var.value
+      env['CLOUDSDK_PYTHON'] = '/usr/local/bin/python3'
 
       # For wildcard paths, the destination must be a directory
       dst = os.path.join(mount_point, var.docker_path)
@@ -378,6 +379,7 @@ class GoogleJobProviderBase(base.JobProvider):
       env['OUTPUT_RECURSIVE_{}'.format(idx)] = str(int(var.recursive))
       env['OUTPUT_SRC_{}'.format(idx)] = os.path.join(mount_point,
                                                       var.docker_path)
+      env['CLOUDSDK_PYTHON'] = '/usr/local/bin/python3'
 
       # For wildcard paths, the destination must be a directory
       if '*' in var.uri.basename:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants