What would you like to be added:
apiVersion: llmaz.io/v1alpha1
kind: OpenModel
metadata:
name: opt-125m
spec:
familyName: opt
source:
modelHub:
modelID: facebook/opt-125m
inferenceConfig:
flavors:
- name: h800
priority: 5 # higher priority
nodeSelector:
karpenter.k8s.aws/instance-gpu-name: h800
limits:
nvidia.com/gpu: 4
- name: h100
priority: 4
nodeSelector:
karpenter.k8s.aws/instance-gpu-name: h100
limits:
nvidia.com/gpu: 4
- name: a100
priority: 3
nodeSelector:
karpenter.k8s.aws/instance-gpu-name: a100
limits:
nvidia.com/gpu: 4
- name: a20
priority: 2
nodeSelector:
karpenter.k8s.aws/instance-gpu-name: a20
limits:
nvidia.com/gpu: 4
- name: t4
priority: 1 # lower priority
nodeSelector:
karpenter.k8s.aws/instance-gpu-name: t4
limits:
nvidia.com/gpu: 4
Why is this needed:
When multiple flavors are defined for a model, there is currently no explicit way to control their matching order during scheduling. The scheduler uses the order defined in the list, which may not reflect the intended preference.
https://github.com/InftyAI/scheduler-plugins/blob/685a4d9f8a769f7f5634a6680f374a05c72823cd/pkg/plugins/resource_fungibility/resource_fungibility.go#L228-L248
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
What would you like to be added:
apiVersion: llmaz.io/v1alpha1 kind: OpenModel metadata: name: opt-125m spec: familyName: opt source: modelHub: modelID: facebook/opt-125m inferenceConfig: flavors: - name: h800 priority: 5 # higher priority nodeSelector: karpenter.k8s.aws/instance-gpu-name: h800 limits: nvidia.com/gpu: 4 - name: h100 priority: 4 nodeSelector: karpenter.k8s.aws/instance-gpu-name: h100 limits: nvidia.com/gpu: 4 - name: a100 priority: 3 nodeSelector: karpenter.k8s.aws/instance-gpu-name: a100 limits: nvidia.com/gpu: 4 - name: a20 priority: 2 nodeSelector: karpenter.k8s.aws/instance-gpu-name: a20 limits: nvidia.com/gpu: 4 - name: t4 priority: 1 # lower priority nodeSelector: karpenter.k8s.aws/instance-gpu-name: t4 limits: nvidia.com/gpu: 4Why is this needed:
When multiple flavors are defined for a model, there is currently no explicit way to control their matching order during scheduling. The scheduler uses the order defined in the list, which may not reflect the intended preference.
https://github.com/InftyAI/scheduler-plugins/blob/685a4d9f8a769f7f5634a6680f374a05c72823cd/pkg/plugins/resource_fungibility/resource_fungibility.go#L228-L248
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.