You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
create a ticket to discuss the CICD requirement for faultinj tooling.
There is still some question for the tooling,
A. The artifact is a .so file, where should we deploy it to?
internal only or external artifactory store? Or we ask developers to build it whenever they want the tool?
B. What is the plan for this tooling? do we have plan to release it?
do we have some roadmap for it? like what we are trying to achieve in next release
C. We have several scenarios in design doc, but there is still no specific test specs (SW&HW) and expectation to make sure we have deterministic regular runs nightly. It would be nice to have some tables to clarify the details to help define the scenarios instead of simply giving a command. e.g.
spark test w/ some specific configs
some faultinj specific configs
driver 450.xx
ubuntu 18.04
GPU w/ 12Gi mem
should return error count X. Then if using driver 465.yy/centos7/24Gi-mem gpu, it should return error count Y/Z/A
Or explicit saying that like cuda/OS/GPU types do not matter here, or we do not care about error count, or if test error out then all the setup meets our expectations. Then we could have a regular run for it
Thanks
The text was updated successfully, but these errors were encountered:
@sameerz can you help share some info about the plan here? At least for me, I would like to understand what is the must have of faultinj for the next release 22.10, thanks
Is your feature request related to a problem? Please describe.
create a ticket to discuss the CICD requirement for faultinj tooling.
There is still some question for the tooling,
A. The artifact is a .so file, where should we deploy it to?
internal only or external artifactory store? Or we ask developers to build it whenever they want the tool?
B. What is the plan for this tooling? do we have plan to release it?
do we have some roadmap for it? like what we are trying to achieve in next release
C. We have several scenarios in design doc, but there is still no specific test specs (SW&HW) and expectation to make sure we have deterministic regular runs nightly. It would be nice to have some tables to clarify the details to help define the scenarios instead of simply giving a command.
e.g.
spark test w/ some specific configs
some faultinj specific configs
driver 450.xx
ubuntu 18.04
GPU w/ 12Gi mem
should return error count X. Then if using driver 465.yy/centos7/24Gi-mem gpu, it should return error count Y/Z/A
Or explicit saying that like cuda/OS/GPU types do not matter here, or we do not care about error count, or if test error out then all the setup meets our expectations. Then we could have a regular run for it
Thanks
The text was updated successfully, but these errors were encountered: