sagemaker-rightline

sagemaker-rightline is a Python package that eases validation of properties of a SageMaker Pipeline object.

Note that at present, this package is in an early stage of development and is not yet ready for production use. We welcome contributions!

Content

API

Features

⚙️ Configuration

The Configuration class is responsible for running the Validations against the Pipeline object and returning a Report. The Configuration class is instantiated with a - sagemaker.workflow.pipeline.Pipeline object, and - a list of Validations.

✔️ Validations

A Validation is a class that inherits from the Validation base class. It is responsible for validating a single property of the Pipeline object. We differentiate between Validations that check the Pipeline object itself (class names beginning with “Pipeline”) and Validations that check the Pipeline object’s Step objects (class name starting with “Step”). Depending on the specific Validation, a different set of StepTypEnums may be supported.

For example, the StepImagesExist supports Processing and Training steps. It’s a validation checks that all ImageURI that Steps of the named types of the Pipeline object reference indeed exist on the target ECR.

The following Validations are currently implemented:

  • PipelineParametersAsExpected

  • StepImagesExist

  • StepKmsKeyIdAsExpected

  • StepNetworkConfigAsExpected

  • StepLambdaFunctionExists

  • StepRoleNameExists

  • StepRoleNameAsExpected

  • StepTagsAsExpected

  • StepInputsAsExpected

  • StepOutputsAsExpected

  • StepOutputsMatchInputsAsExpected

  • StepCallbackSqsQueueExists

  • PipelineProcessingStepsIONamesUnique

In most cases, a Validation subclass requires passing a Rule object to its constructor.

📜 Rules

A Rule is a class that inherits from the Rule base class. It is responsible for defining the rule that a Validation checks for. For example, passing the list of expected KMSKeyIDs and the Rule Equals to StepKmsKeyIdAsExpected will check that all Step objects of the Pipeline object have a KmsKeyId property that matches the passed KMSKeyIDs.

Note that not all Validations require a Rule object, e.g. StepImagesExist.

The following Rules are currently implemented:

  • Equals

  • Contains

All rules support the negative parameter (default: False), which allows for inverting the rule.

📝 Report

A Report is a class whose instance is returned by the Configuration class (optionally a pandas.DataFrame instead). It contains the results of the Validations that were run against the Pipeline object as well as additional information to allow for further analysis.

Usage

from sagemaker.processing import NetworkConfig, ProcessingInput, ProcessingOutput
from sagemaker.workflow.parameters import ParameterString
from sagemaker_rightline.model import Configuration
from sagemaker_rightline.rules import Contains, Equals
from sagemaker_rightline.validations import (
    PipelineParametersAsExpected,
    StepImagesExist,
    StepKmsKeyIdAsExpected,
    StepNetworkConfigAsExpected,
    StepLambdaFunctionExists,
    StepRoleNameExists,
    StepRoleNameAsExpected,
    StepTagsAsExpected,
    StepInputsAsExpected,
    StepOutputsAsExpected,
    StepOutputsMatchInputsAsExpected,
    StepCallbackSqsQueueExists,
)

# Import a dummy pipeline
from tests.fixtures.pipeline import get_sagemaker_pipeline, DUMMY_BUCKET

sm_pipeline = get_sagemaker_pipeline()

# Define Validations
validations = [
    StepImagesExist(),
    PipelineParametersAsExpected(
        parameters_expected=[
            ParameterString(
                name="parameter-1",
                default_value="some-value",
            ),
        ],
        rule=Contains(),
    ),
    StepKmsKeyIdAsExpected(
        kms_key_id_expected="some/kms-key-alias",
        step_name="sm_training_step_sklearn",  # optional: if not set, will check all steps
        rule=Equals(),
    ),
    StepNetworkConfigAsExpected(
        network_config_expected=NetworkConfig(
            enable_network_isolation=False,
            security_group_ids=["sg-1234567890"],
            subnets=["subnet-1234567890"],
        ),
        rule=Equals(negative=True),
    ),
    StepLambdaFunctionExists(),
    StepRoleNameExists(),
    StepRoleNameAsExpected(
        role_name_expected="some-role-name",
        step_name="sm_training_step_sklearn",  # optional: if not set, will check all steps
        rule=Equals(),
    ),
    StepTagsAsExpected(
        tags_expected=[{
            "some-key": "some-value",
        }],
        step_name="sm_training_step_sklearn",  # optional: if not set, will check all steps
        rule=Equals(),
    ),
    StepInputsAsExpected(
        inputs_expected=[
            ProcessingInput(
                source=f"s3://{DUMMY_BUCKET}/input-1",
                destination="/opt/ml/processing/input",
                input_name="input-2",
            )
        ],
        step_type="Processing",  # either step_type or step_name must be set to filter
        rule=Contains(),
    ),
    StepOutputsAsExpected(
        outputs_expected=[
            ProcessingOutput(
                source="/opt/ml/processing/output",
                destination=f"s3://{DUMMY_BUCKET}/output-1",
                output_name="output-1",
            )
        ],
        step_name="sm_processing_step_spark",  # optional
        rule=Contains(),
    ),
    StepOutputsMatchInputsAsExpected(
        inputs_outputs_expected=[
            {
                "input": {
                    "step_name": "sm_processing_step_sklearn",
                    "input_name": "input-1",
                },
                "output": {
                    "step_name": "sm_processing_step_sklearn",
                    "output_name": "output-1",
                },
            }
        ]
    ),
    StepCallbackSqsQueueExists(),
]

# Add Validations and SageMaker Pipeline to Configuration
cm = Configuration(
    validations=validations,
    sagemaker_pipeline=sm_pipeline,
)

# Run the full Configuration
df = cm.run()

# Show the report
df
report.png

Release

Publishing a new version to PyPI is done via the Release functionality. This will trigger the publish.yml workflow, creating a new release with the version from the tag and publish the package to PyPI.

Contributing

Contributions welcome! We’ll add a guide shortly.