You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

205 lines
9.7 KiB

[release-2.25] Refactor and expand download_hash.py (#11539) * download_hash.py: generalized and data-driven The script is currently limited to one hardcoded URL for kubernetes related binaries, and a fixed set of architectures. The solution is three-fold: 1. Use an url template dictionary for each download -> this allow to easily add support for new downloads. 2. Source the architectures to search from the existing data 3. Enumerate the existing versions in the data and start searching from the last one until no newer version is found (newer in the version order sense, irrespective of actual age) * download_hash.py: support for 'multi-hash' file + runc runc upstream does not provide one hash file per assets in their releases, but one file with all the hashes. To handle this (and/or any arbitrary format from upstreams), add a dictionary mapping the name of the download to a lambda function which transform the file provided by upstream into a dictionary of hashes, keyed by architecture. * download_hash: argument handling with argparse Allow the script to be called with a list of components, to only download new versions checksums for those. By default, we get new versions checksums for all supported (by the script) components. * download_hash: propagate new patch versions to all archs * download_hash: add support for 'simple hash' components * download_hash: support 'multi-hash' components * download_hash: document missing support * download_hash: use persistent session This allows to reuse http connection and be more efficient. From rough measuring it saves around 25-30% of execution time. * download_hash: cache request for 'multi-hash' files This avoid re-downloading the same file for different arch and re-parsing it * download_hash: document usage --------- Co-authored-by: Max Gautier <mg@max.gautier.name>
2 months ago
[release-2.25] Refactor and expand download_hash.py (#11539) * download_hash.py: generalized and data-driven The script is currently limited to one hardcoded URL for kubernetes related binaries, and a fixed set of architectures. The solution is three-fold: 1. Use an url template dictionary for each download -> this allow to easily add support for new downloads. 2. Source the architectures to search from the existing data 3. Enumerate the existing versions in the data and start searching from the last one until no newer version is found (newer in the version order sense, irrespective of actual age) * download_hash.py: support for 'multi-hash' file + runc runc upstream does not provide one hash file per assets in their releases, but one file with all the hashes. To handle this (and/or any arbitrary format from upstreams), add a dictionary mapping the name of the download to a lambda function which transform the file provided by upstream into a dictionary of hashes, keyed by architecture. * download_hash: argument handling with argparse Allow the script to be called with a list of components, to only download new versions checksums for those. By default, we get new versions checksums for all supported (by the script) components. * download_hash: propagate new patch versions to all archs * download_hash: add support for 'simple hash' components * download_hash: support 'multi-hash' components * download_hash: document missing support * download_hash: use persistent session This allows to reuse http connection and be more efficient. From rough measuring it saves around 25-30% of execution time. * download_hash: cache request for 'multi-hash' files This avoid re-downloading the same file for different arch and re-parsing it * download_hash: document usage --------- Co-authored-by: Max Gautier <mg@max.gautier.name>
2 months ago
[release-2.25] Refactor and expand download_hash.py (#11539) * download_hash.py: generalized and data-driven The script is currently limited to one hardcoded URL for kubernetes related binaries, and a fixed set of architectures. The solution is three-fold: 1. Use an url template dictionary for each download -> this allow to easily add support for new downloads. 2. Source the architectures to search from the existing data 3. Enumerate the existing versions in the data and start searching from the last one until no newer version is found (newer in the version order sense, irrespective of actual age) * download_hash.py: support for 'multi-hash' file + runc runc upstream does not provide one hash file per assets in their releases, but one file with all the hashes. To handle this (and/or any arbitrary format from upstreams), add a dictionary mapping the name of the download to a lambda function which transform the file provided by upstream into a dictionary of hashes, keyed by architecture. * download_hash: argument handling with argparse Allow the script to be called with a list of components, to only download new versions checksums for those. By default, we get new versions checksums for all supported (by the script) components. * download_hash: propagate new patch versions to all archs * download_hash: add support for 'simple hash' components * download_hash: support 'multi-hash' components * download_hash: document missing support * download_hash: use persistent session This allows to reuse http connection and be more efficient. From rough measuring it saves around 25-30% of execution time. * download_hash: cache request for 'multi-hash' files This avoid re-downloading the same file for different arch and re-parsing it * download_hash: document usage --------- Co-authored-by: Max Gautier <mg@max.gautier.name>
2 months ago
[release-2.25] Refactor and expand download_hash.py (#11539) * download_hash.py: generalized and data-driven The script is currently limited to one hardcoded URL for kubernetes related binaries, and a fixed set of architectures. The solution is three-fold: 1. Use an url template dictionary for each download -> this allow to easily add support for new downloads. 2. Source the architectures to search from the existing data 3. Enumerate the existing versions in the data and start searching from the last one until no newer version is found (newer in the version order sense, irrespective of actual age) * download_hash.py: support for 'multi-hash' file + runc runc upstream does not provide one hash file per assets in their releases, but one file with all the hashes. To handle this (and/or any arbitrary format from upstreams), add a dictionary mapping the name of the download to a lambda function which transform the file provided by upstream into a dictionary of hashes, keyed by architecture. * download_hash: argument handling with argparse Allow the script to be called with a list of components, to only download new versions checksums for those. By default, we get new versions checksums for all supported (by the script) components. * download_hash: propagate new patch versions to all archs * download_hash: add support for 'simple hash' components * download_hash: support 'multi-hash' components * download_hash: document missing support * download_hash: use persistent session This allows to reuse http connection and be more efficient. From rough measuring it saves around 25-30% of execution time. * download_hash: cache request for 'multi-hash' files This avoid re-downloading the same file for different arch and re-parsing it * download_hash: document usage --------- Co-authored-by: Max Gautier <mg@max.gautier.name>
2 months ago
[release-2.25] Refactor and expand download_hash.py (#11539) * download_hash.py: generalized and data-driven The script is currently limited to one hardcoded URL for kubernetes related binaries, and a fixed set of architectures. The solution is three-fold: 1. Use an url template dictionary for each download -> this allow to easily add support for new downloads. 2. Source the architectures to search from the existing data 3. Enumerate the existing versions in the data and start searching from the last one until no newer version is found (newer in the version order sense, irrespective of actual age) * download_hash.py: support for 'multi-hash' file + runc runc upstream does not provide one hash file per assets in their releases, but one file with all the hashes. To handle this (and/or any arbitrary format from upstreams), add a dictionary mapping the name of the download to a lambda function which transform the file provided by upstream into a dictionary of hashes, keyed by architecture. * download_hash: argument handling with argparse Allow the script to be called with a list of components, to only download new versions checksums for those. By default, we get new versions checksums for all supported (by the script) components. * download_hash: propagate new patch versions to all archs * download_hash: add support for 'simple hash' components * download_hash: support 'multi-hash' components * download_hash: document missing support * download_hash: use persistent session This allows to reuse http connection and be more efficient. From rough measuring it saves around 25-30% of execution time. * download_hash: cache request for 'multi-hash' files This avoid re-downloading the same file for different arch and re-parsing it * download_hash: document usage --------- Co-authored-by: Max Gautier <mg@max.gautier.name>
2 months ago
[release-2.25] Refactor and expand download_hash.py (#11539) * download_hash.py: generalized and data-driven The script is currently limited to one hardcoded URL for kubernetes related binaries, and a fixed set of architectures. The solution is three-fold: 1. Use an url template dictionary for each download -> this allow to easily add support for new downloads. 2. Source the architectures to search from the existing data 3. Enumerate the existing versions in the data and start searching from the last one until no newer version is found (newer in the version order sense, irrespective of actual age) * download_hash.py: support for 'multi-hash' file + runc runc upstream does not provide one hash file per assets in their releases, but one file with all the hashes. To handle this (and/or any arbitrary format from upstreams), add a dictionary mapping the name of the download to a lambda function which transform the file provided by upstream into a dictionary of hashes, keyed by architecture. * download_hash: argument handling with argparse Allow the script to be called with a list of components, to only download new versions checksums for those. By default, we get new versions checksums for all supported (by the script) components. * download_hash: propagate new patch versions to all archs * download_hash: add support for 'simple hash' components * download_hash: support 'multi-hash' components * download_hash: document missing support * download_hash: use persistent session This allows to reuse http connection and be more efficient. From rough measuring it saves around 25-30% of execution time. * download_hash: cache request for 'multi-hash' files This avoid re-downloading the same file for different arch and re-parsing it * download_hash: document usage --------- Co-authored-by: Max Gautier <mg@max.gautier.name>
2 months ago
[release-2.25] Refactor and expand download_hash.py (#11539) * download_hash.py: generalized and data-driven The script is currently limited to one hardcoded URL for kubernetes related binaries, and a fixed set of architectures. The solution is three-fold: 1. Use an url template dictionary for each download -> this allow to easily add support for new downloads. 2. Source the architectures to search from the existing data 3. Enumerate the existing versions in the data and start searching from the last one until no newer version is found (newer in the version order sense, irrespective of actual age) * download_hash.py: support for 'multi-hash' file + runc runc upstream does not provide one hash file per assets in their releases, but one file with all the hashes. To handle this (and/or any arbitrary format from upstreams), add a dictionary mapping the name of the download to a lambda function which transform the file provided by upstream into a dictionary of hashes, keyed by architecture. * download_hash: argument handling with argparse Allow the script to be called with a list of components, to only download new versions checksums for those. By default, we get new versions checksums for all supported (by the script) components. * download_hash: propagate new patch versions to all archs * download_hash: add support for 'simple hash' components * download_hash: support 'multi-hash' components * download_hash: document missing support * download_hash: use persistent session This allows to reuse http connection and be more efficient. From rough measuring it saves around 25-30% of execution time. * download_hash: cache request for 'multi-hash' files This avoid re-downloading the same file for different arch and re-parsing it * download_hash: document usage --------- Co-authored-by: Max Gautier <mg@max.gautier.name>
2 months ago
[release-2.25] Refactor and expand download_hash.py (#11539) * download_hash.py: generalized and data-driven The script is currently limited to one hardcoded URL for kubernetes related binaries, and a fixed set of architectures. The solution is three-fold: 1. Use an url template dictionary for each download -> this allow to easily add support for new downloads. 2. Source the architectures to search from the existing data 3. Enumerate the existing versions in the data and start searching from the last one until no newer version is found (newer in the version order sense, irrespective of actual age) * download_hash.py: support for 'multi-hash' file + runc runc upstream does not provide one hash file per assets in their releases, but one file with all the hashes. To handle this (and/or any arbitrary format from upstreams), add a dictionary mapping the name of the download to a lambda function which transform the file provided by upstream into a dictionary of hashes, keyed by architecture. * download_hash: argument handling with argparse Allow the script to be called with a list of components, to only download new versions checksums for those. By default, we get new versions checksums for all supported (by the script) components. * download_hash: propagate new patch versions to all archs * download_hash: add support for 'simple hash' components * download_hash: support 'multi-hash' components * download_hash: document missing support * download_hash: use persistent session This allows to reuse http connection and be more efficient. From rough measuring it saves around 25-30% of execution time. * download_hash: cache request for 'multi-hash' files This avoid re-downloading the same file for different arch and re-parsing it * download_hash: document usage --------- Co-authored-by: Max Gautier <mg@max.gautier.name>
2 months ago
[release-2.25] Refactor and expand download_hash.py (#11539) * download_hash.py: generalized and data-driven The script is currently limited to one hardcoded URL for kubernetes related binaries, and a fixed set of architectures. The solution is three-fold: 1. Use an url template dictionary for each download -> this allow to easily add support for new downloads. 2. Source the architectures to search from the existing data 3. Enumerate the existing versions in the data and start searching from the last one until no newer version is found (newer in the version order sense, irrespective of actual age) * download_hash.py: support for 'multi-hash' file + runc runc upstream does not provide one hash file per assets in their releases, but one file with all the hashes. To handle this (and/or any arbitrary format from upstreams), add a dictionary mapping the name of the download to a lambda function which transform the file provided by upstream into a dictionary of hashes, keyed by architecture. * download_hash: argument handling with argparse Allow the script to be called with a list of components, to only download new versions checksums for those. By default, we get new versions checksums for all supported (by the script) components. * download_hash: propagate new patch versions to all archs * download_hash: add support for 'simple hash' components * download_hash: support 'multi-hash' components * download_hash: document missing support * download_hash: use persistent session This allows to reuse http connection and be more efficient. From rough measuring it saves around 25-30% of execution time. * download_hash: cache request for 'multi-hash' files This avoid re-downloading the same file for different arch and re-parsing it * download_hash: document usage --------- Co-authored-by: Max Gautier <mg@max.gautier.name>
2 months ago
  1. #!/usr/bin/env python3
  2. # After a new version of Kubernetes has been released,
  3. # run this script to update roles/kubespray-defaults/defaults/main/download.yml
  4. # with new hashes.
  5. import sys
  6. from itertools import count, groupby
  7. from collections import defaultdict
  8. from functools import cache
  9. import argparse
  10. import requests
  11. from ruamel.yaml import YAML
  12. from packaging.version import Version
  13. CHECKSUMS_YML = "../roles/kubespray-defaults/defaults/main/checksums.yml"
  14. def open_checksums_yaml():
  15. yaml = YAML()
  16. yaml.explicit_start = True
  17. yaml.preserve_quotes = True
  18. yaml.width = 4096
  19. with open(CHECKSUMS_YML, "r") as checksums_yml:
  20. data = yaml.load(checksums_yml)
  21. return data, yaml
  22. def version_compare(version):
  23. return Version(version.removeprefix("v"))
  24. downloads = {
  25. "calicoctl_binary": "https://github.com/projectcalico/calico/releases/download/{version}/SHA256SUMS",
  26. "ciliumcli_binary": "https://github.com/cilium/cilium-cli/releases/download/{version}/cilium-{os}-{arch}.tar.gz.sha256sum",
  27. "cni_binary": "https://github.com/containernetworking/plugins/releases/download/{version}/cni-plugins-{os}-{arch}-{version}.tgz.sha256",
  28. "containerd_archive": "https://github.com/containerd/containerd/releases/download/v{version}/containerd-{version}-{os}-{arch}.tar.gz.sha256sum",
  29. "crictl": "https://github.com/kubernetes-sigs/cri-tools/releases/download/{version}/critest-{version}-{os}-{arch}.tar.gz.sha256",
  30. "crio_archive": "https://storage.googleapis.com/cri-o/artifacts/cri-o.{arch}.{version}.tar.gz.sha256sum",
  31. "etcd_binary": "https://github.com/etcd-io/etcd/releases/download/{version}/SHA256SUMS",
  32. "kubeadm": "https://dl.k8s.io/release/{version}/bin/linux/{arch}/kubeadm.sha256",
  33. "kubectl": "https://dl.k8s.io/release/{version}/bin/linux/{arch}/kubectl.sha256",
  34. "kubelet": "https://dl.k8s.io/release/{version}/bin/linux/{arch}/kubelet.sha256",
  35. "nerdctl_archive": "https://github.com/containerd/nerdctl/releases/download/v{version}/SHA256SUMS",
  36. "runc": "https://github.com/opencontainers/runc/releases/download/{version}/runc.sha256sum",
  37. "skopeo_binary": "https://github.com/lework/skopeo-binary/releases/download/{version}/skopeo-{os}-{arch}.sha256",
  38. "yq": "https://github.com/mikefarah/yq/releases/download/{version}/checksums-bsd", # see https://github.com/mikefarah/yq/pull/1691 for why we use this url
  39. }
  40. # TODO: downloads not supported
  41. # youki: no checkusms in releases
  42. # kata: no checksums in releases
  43. # gvisor: sha512 checksums
  44. # crun : PGP signatures
  45. # cri_dockerd: no checksums or signatures
  46. # helm_archive: PGP signatures
  47. # krew_archive: different yaml structure
  48. # calico_crds_archive: different yaml structure
  49. # TODO:
  50. # noarch support -> k8s manifests, helm charts
  51. # different checksum format (needs download role changes)
  52. # different verification methods (gpg, cosign) ( needs download role changes) (or verify the sig in this script and only use the checksum in the playbook)
  53. # perf improvements (async)
  54. def download_hash(only_downloads: [str]) -> None:
  55. # Handle file with multiples hashes, with various formats.
  56. # the lambda is expected to produce a dictionary of hashes indexed by arch name
  57. download_hash_extract = {
  58. "calicoctl_binary": lambda hashes : {
  59. line.split('-')[-1] : line.split()[0]
  60. for line in hashes.strip().split('\n')
  61. if line.count('-') == 2 and line.split('-')[-2] == "linux"
  62. },
  63. "etcd_binary": lambda hashes : {
  64. line.split('-')[-1].removesuffix('.tar.gz') : line.split()[0]
  65. for line in hashes.strip().split('\n')
  66. if line.split('-')[-2] == "linux"
  67. },
  68. "nerdctl_archive": lambda hashes : {
  69. line.split()[1].removesuffix('.tar.gz').split('-')[3] : line.split()[0]
  70. for line in hashes.strip().split('\n')
  71. if [x for x in line.split(' ') if x][1].split('-')[2] == "linux"
  72. },
  73. "runc": lambda hashes : {
  74. parts[1].split('.')[1] : parts[0]
  75. for parts in (line.split()
  76. for line in hashes.split('\n')[3:9])
  77. },
  78. "yq": lambda rhashes_bsd : {
  79. pair[0].split('_')[-1] : pair[1]
  80. # pair = (yq_<os>_<arch>, <hash>)
  81. for pair in ((line.split()[1][1:-1], line.split()[3])
  82. for line in rhashes_bsd.splitlines()
  83. if line.startswith("SHA256"))
  84. if pair[0].startswith("yq")
  85. and pair[0].split('_')[1] == "linux"
  86. and not pair[0].endswith(".tar.gz")
  87. },
  88. }
  89. data, yaml = open_checksums_yaml()
  90. s = requests.Session()
  91. @cache
  92. def _get_hash_by_arch(download: str, version: str) -> {str: str}:
  93. hash_file = s.get(downloads[download].format(
  94. version = version,
  95. os = "linux",
  96. ),
  97. allow_redirects=True)
  98. if hash_file.status_code == 404:
  99. print(f"Unable to find {download} hash file for version {version} at {hash_file.url}")
  100. return None
  101. hash_file.raise_for_status()
  102. return download_hash_extract[download](hash_file.content.decode())
  103. for download, url in (downloads if only_downloads == []
  104. else {k:downloads[k] for k in downloads.keys() & only_downloads}).items():
  105. checksum_name = f"{download}_checksums"
  106. # Propagate new patch versions to all architectures
  107. for arch in data[checksum_name].values():
  108. for arch2 in data[checksum_name].values():
  109. arch.update({
  110. v:("NONE" if arch2[v] == "NONE" else 0)
  111. for v in (set(arch2.keys()) - set(arch.keys()))
  112. if v.split('.')[2] == '0'})
  113. # this is necessary to make the script indempotent,
  114. # by only adding a vX.X.0 version (=minor release) in each arch
  115. # and letting the rest of the script populate the potential
  116. # patch versions
  117. for arch, versions in data[checksum_name].items():
  118. for minor, patches in groupby(versions.copy().keys(), lambda v : '.'.join(v.split('.')[:-1])):
  119. for version in (f"{minor}.{patch}" for patch in
  120. count(start=int(max(patches, key=version_compare).split('.')[-1]),
  121. step=1)):
  122. # Those barbaric generators do the following:
  123. # Group all patches versions by minor number, take the newest and start from that
  124. # to find new versions
  125. if version in versions and versions[version] != 0:
  126. continue
  127. if download in download_hash_extract:
  128. hashes = _get_hash_by_arch(download, version)
  129. if hashes == None:
  130. break
  131. sha256sum = hashes.get(arch)
  132. if sha256sum == None:
  133. break
  134. else:
  135. hash_file = s.get(downloads[download].format(
  136. version = version,
  137. os = "linux",
  138. arch = arch
  139. ),
  140. allow_redirects=True)
  141. if hash_file.status_code == 404:
  142. print(f"Unable to find {download} hash file for version {version} (arch: {arch}) at {hash_file.url}")
  143. break
  144. hash_file.raise_for_status()
  145. sha256sum = hash_file.content.decode().split()[0]
  146. if len(sha256sum) != 64:
  147. raise Exception(f"Checksum has an unexpected length: {len(sha256sum)} (binary: {download}, arch: {arch}, release: {version}, checksum: '{sha256sum}')")
  148. data[checksum_name][arch][version] = sha256sum
  149. data[checksum_name] = {arch : {r : releases[r] for r in sorted(releases.keys(),
  150. key=version_compare,
  151. reverse=True)}
  152. for arch, releases in data[checksum_name].items()}
  153. with open(CHECKSUMS_YML, "w") as checksums_yml:
  154. yaml.dump(data, checksums_yml)
  155. print(f"\n\nUpdated {CHECKSUMS_YML}\n")
  156. parser = argparse.ArgumentParser(description=f"Add new patch versions hashes in {CHECKSUMS_YML}",
  157. formatter_class=argparse.RawTextHelpFormatter,
  158. epilog=f"""
  159. This script only lookup new patch versions relative to those already existing
  160. in the data in {CHECKSUMS_YML},
  161. which means it won't add new major or minor versions.
  162. In order to add one of these, edit {CHECKSUMS_YML}
  163. by hand, adding the new versions with a patch number of 0 (or the lowest relevant patch versions)
  164. ; then run this script.
  165. Note that the script will try to add the versions on all
  166. architecture keys already present for a given download target.
  167. The '0' value for a version hash is treated as a missing hash, so the script will try to download it again.
  168. To notify a non-existing version (yanked, or upstream does not have monotonically increasing versions numbers),
  169. use the special value 'NONE'.
  170. EXAMPLES:
  171. crictl_checksums:
  172. ...
  173. amd64:
  174. + v1.30.0: 0
  175. v1.29.0: d16a1ffb3938f5a19d5c8f45d363bd091ef89c0bc4d44ad16b933eede32fdcbb
  176. v1.28.0: 8dc78774f7cbeaf787994d386eec663f0a3cf24de1ea4893598096cb39ef2508"""
  177. )
  178. parser.add_argument('binaries', nargs='*', choices=downloads.keys())
  179. args = parser.parse_args()
  180. download_hash(args.binaries)