Sync
sync
sync(
clusters: list[str] | None = None,
uv_sync_args: list[str] | None = None,
sync_datasets: bool = True,
) -> list[Remote]
Synchronizes the current project across clusters.
- Synchronizes code across all clusters.
- Gathers results on the "main" cluster (mila)
- Does
uv syncthat cluster as well- (Important so that jobs can be run in OFFLINE mode)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
clusters
|
list[str] | None
|
List of SSH hostnames of the target clusters. If empty, will attempt to sync with all clusters in the config that we have an active SSH connection to. |
None
|
Returns:
| Type | Description |
|---|---|
list[Remote]
|
A list of Remote objects corresponding to the clusters that were synced with. |
How it could work (proof-of-concept) - Checks git state - Push to github - TODO: Check syncing without github. - Over SSH, does a git fetch on all remote clusters - Gathers results from all other clusters to the Mila cluster using rsync.
get_active_remotes
Returns the Remotes for each cluster which has an active SSH connection.
sync_task_function
Syncs a single cluster, and reports progress using the provided report_progress function.
clone_project
clone_project(remote: Remote)
Setup the project repo on all the remote clusters.
New idea: - Assume GitHub. Push to GitHub if needed. Clone from github on the remotes. - Worry about authentication later, just raise an error if need be for now.
fetch_results
fetch_results(
remote: Remote, config: CluvConfig
) -> list[Path]
Fetches results from a remote cluster to local using rsync via the results symlink.
Returns the list of newly-synced run directories (those that did not exist locally before the rsync ran).
create_results_dir_with_symlink_to_scratch
create_results_dir_with_symlink_to_scratch(
remote: Remote, results_symlink: str, results_path: str
)
On the remote, create results_path and symlink project/
results_path may contain env vars (e.g. $SCRATCH); they are resolved via the remote login shell.