Skip to content

Latest commit

 

History

History
80 lines (79 loc) · 18.8 KB

CUSTOM_STEPS_LIST.md

File metadata and controls

80 lines (79 loc) · 18.8 KB

List of Custom Steps in this Project

Name Brief Description Owner/Contact Viya Version Supported Last Update
_template Template to use for contributions SAS 2020.1.5 or later 22DEC2023
Airflow - Generate DAG Generates an Apache Airflow DAG using SAS Studio Flow where flow steps represent Airflow tasks using the SAS Airflow Provider Nicolas Robert 2023.12 or later 09JAN2024
Anonymize and Mask Data Anonymize and Mask Data using QKB definitions Mary Kathryn Queen 2023.06 or later 19FEB2024
Append Table Appends data to a target table with support for maintaining unique incremental id Torben Juul Johansson 2020.1.5 or later 26OCT2022
CAS - Convert Char to Varchar Create a copy of a table and convert chars to varchars Carlo Petti 2023.01 or later 05SEP2023
CAS - Generate unique ID Generates a new column containing a unique identifier (ID) per observation for a given input CAS table Sundaresh Sankaran 2022.11 or later 07FEB2023
CAS - Load Tables from Folders in Filesystem Load all files in a directory to CAS tables Sundaresh Sankaran / Wilbram Hazejager 2022.11 or later 21DEC2022
CAS - Submit Python and R code Submit Python/R code to CAS server using CAS Gateway action David Weik 2023.11 or later 11JAN2024
CAS - Validate unique ID Validates if column contains unique values for a given input CAS table Sundaresh Sankaran 2023.06 or later 11JUL2023
Catalog - List Agents Extract list of configured SAS Information Catalog agents into a table David Weik 2023.12 or later 15JAN2024
Catalog - Run Agent Triggers the run of a SAS Information Catalog agent) David Weik 2023.12 or later 15JAN2024
Create Listing Of Directory CLOD Create table containing names of all files in directory Stephan Weigandt 2020.1.5 or later 21OCT2022
Data Synthesis with Python faker Generate synthetic data using Python faker module (also includes custom steps to install/load Python modules) Angus Looney / Duncan Bain 2021.1.1 or later 22DEC2022
DQ - Cluster Analysis Compare pairs of rows in a cluster and identify potential false positives Clemens Knobloch 2023.10 or later 23JAN2024
DQ - Change Case Upper-, Lower-, or Propercase data values using QKB locale specific rules Clemens Knobloch 2023.01 or later 15MAR2023
DQ - Clustering Cluster records based on column values, eg. match codes Lorenzo Toja / Arnold Toporowski 2021.1.1 or later 05DEC2022
DQ - Create QKB Reference Tables Create QKB Reference Tables Mary Kathryn Queen 2023.06 or later 10OCT2023
DQ - Identify Obtain the Identity Type for of data values using the dqIdentify function Arnold Toporowski 2021.1.1 or later 29NOV2022
DQ - Match Code Create Match Codes based on locale, using SAS QKB and dqMatch function Lorenzo Toja / Arnold Toporowski 2021.1.1 or later 22NOV2022
DQ - Parsing Parse a string into a set of tokens using QKB locale specific rules Clemens Knobloch 2023.01 or later 15MAR2023
DQ - Standardize Create standardized values based on locale, using SAS QKB and dqStandardize function (includes support for generating masked values) Lorenzo Toja / Arnold Toporowski 2023.06 or later 27JUL2023
DQ - Surviving Record Extract the best record (aka. Golden Record) from clusters of records, with support for standard deduplication routines and user-defined rules Lorenzo Toja 2023.06 or later 24AUG2023
DuckDB Uses DuckDB to read and write data in various DBMS and file types Clemens Knobloch 2023.01 or later 12JUL2023
Dynamic Aggregations From Timeseries DAFT Perform dynamic aggregations on timeseries data Stephan Weigandt Tested on 2022.1.2 (should work on earlier versions) 08MAY2023
Export - ADLS File Writer Write SAS tables to Parquet files on Azure Data Lake Storage (ADLS) Alfredo Lorie 2023.02 or later 20APR2023
Export - GCSFS File Writer Write SAS and CAS datasets to Google Cloud Storage (GCS) in Parquet and Delta Lake format using Python Ignacio Rodríguez 2023.11 or later 21DEC2023
Export - Parquet Export SAS tables to Parquet files using SAS Libname engine Neil Griffin 2023.10 or later 15DEC2023
Extract Job Definition Metadata extract all content (SAS Code, HTML forms, XML Prompts and Parameters) from a SAS Job Definition and store it in files David Weik 2023.11 or later 23NOV2023
Extract Text Features Supports extracting many different features from text fields David Weik / Ulrich Reincke / Rens Feenstra 2022.10 or later 27NOV2022
GEO - Shape Files Manage Shape Files used in GIS systems for use in SAS Visual Analytics Stephano Tucciarone 2023.10 or later 06FEB2024
GeoDistance with Rounding Calculate the distance between 2 supplied lat/long locations in either kilometers or miles Mary Kathryn Queen 2020.1.5 or later 28SEP2022
Get Exchange Rates Get Exchange Rates from Service Provider David Weik 2023.08 or later 04SEP2023
Git - Clone Git Repo Clone Git Repo as part of running a flow Sundaresh Sankaran 2022.11 or later 25JAN2023
Git - Delete Local Repo Delete LOCAL Repo as part of running a flow Sundaresh Sankaran 2022.11 or later 25JAN2023
Git - List Local Repo Changes List changed files inside local Git repository folder into a dataset as part of running a flow for easy reporting Sundaresh Sankaran 2022.11 or later 07FEB2023
Git - Stage, Commit, Pull and Push Changes Perform stage, commit, pull and push changes as part of running a flow David Weik 2023.01 or later 19FEB2023
Great Expectations - Execute Rule Run business rules based on Great Expectations Python package Stephen Kotiang 2023.03 or later 11OCT2023
Great Expectations - Generate Expectation Suite Generate rules on input data using Great Expectations Mackenzie Looney 2023.04 or later 19OCT2023
Great Expectations - Run Expectations Suite Compare data against an Expectation Suite Mackenzie Looney 2023.04 or later 19OCT2023
Import - ADLS File Reader Read Parquet files from Azure Data Lake Storage (ADLS) with support for Delta Lake file format Alfredo Lorie 2023.02 or later 25APR2023
Import - CSV with long column names Import CSV file with long column names (>32 chars) in header row Ignacio Rodríguez 2023.11 or later 15JAN2024
Import - Data Ingestion Auto Pilot DIAP (Light) for External Files Ingest external file(s) from directory with push of a button Stephan Weigandt 2020.1.5 or later 28JUL2023
Import - Extract Table from PDF Extract tabular data from within a PDF document and load the same to a SAS dataset Sundaresh Sankaran / Dragos Coles 2023.03 or later 01MAY2023
Import - GCSFS File Reader Read Parquet and Delta Lake files from Google Cloud Storage (GCS) and write to SAS and CAS datasets using Python Ignacio Rodríguez 2023.11 or later 21DEC2023
Import - Google Sheets Import public Google Sheets as a SAS data set David Weik 2022.12 or later 12JAN2023
Import - HTML Table Import HTML table(s) from web page as SAS data set(s) using Python Pandas David Weik 2023.07 or later 28JUL2023
LLM - Prompt Catalog Submit queries to a Large Language Model, test prompts (prompt engineering) and save prompt history Xin Ru Lee 2023.12 or later 02FEB2024
Lookup Add column by performing lookup on other table (using data step hash object) Torben Juul Johansson 2021.2.1 or later 21SEP2022
NLP - Categories Testing Framework Test and assess SAS Visual Text Analytics categorization model Sundaresh Sankaran 2023.12 or later 10JAN2024
NLP - Extract Identities Pull entities out of documents or freeform text Arnold Toporowski 2022.12 or later 13DEC2023
NLP - Extract Rule Configuration Extracts the rule configuration within rules-based Visual Text Analytics Concepts or Categories model definitions for use in downstream applications. Sundaresh Sankaran 2023.04 or later 01AUG2023
NLP - Identify Language Identifies the language used for text data in an input table and create a column containing the ISO 639-1 language code. Sundaresh Sankaran 2022.12 or later 15FEB2023
NLP - Predefined Sentiment Analysis analyse a text corpus for the sentiment expressed in it Sundaresh Sankaran 2023.03 or later 04APR2023
NLP - Profile Text Profile text within a document corpus and understand its linguistic structure Sundaresh Sankaran 2022.07 or later 19OCT2022
NLP - Score Text Classifier Score a text corpus with a text classifier model trained using the deep learning (BERT-based) textClassifier.trainTextClassifier CAS action Sundaresh Sankaran 2023.02 or later 19MAR2023
NLP - Sentence Splitter Splits a text column into multiple observations with constituent sentences using CAS actions Sundaresh Sankaran 2023.08 or later 03NOV2023
NLP - Train Text Classifier Train a text classifier model based on deep learning (BERT-based transformer) architecture using textClassifier.trainTextClassifier CAS action (supports GPUs) Sundaresh Sankaran 2023.02 or later 18MAR2023
OCR - AWS Textract Use the AWS Textract service to perform different types of OCR on files that can be stored in S3 buckets or on the SAS Compute file system Jannic Horst 2022.09 or later 08JAN2024
OCR - Azure AI Document Intelligence Table Extraction Use Microsoft Azure's Document Intelligence cloud AI services to extract tables contained in PDFs and images stored on a URL Sundaresh Sankaran 2023.09 or later 02JAN2024
Python - Load Objects to SAS Load Python objects to SAS Compute or CAS tables Sundaresh Sankaran 2023.08 or later 01SEP2022
Python - Virtual Environments A collection of 5 SAS Studio custom steps which help you create, activate, and switch between virtual Python environments for use within SAS Viya Sundaresh Sankaran 2020.1.5 or later 12JUL2022
R Runner Submit R scripts with support for input and output table Samiul Haque / Sundaresh Sankaran 2023.08 or later 18AUG2023
Rank Columns - Starter template Simple Example (based on template) SAS 2020.1.5 or later 26AUG2022
SAS Content - Copy File from File System Copy file from Compute file system into SAS Content folder programmatically Sundaresh Sankaran 2022.11 or later 09JAN2024
SAS Content - Create Folder Creates a new folder in SAS Content programmatically Sundaresh Sankaran 2022.11 or later 18DEC2023
SAS Content - Obtain Folder URI Obtain URI of selected SAS Content folder and save it in a global macro variable Sundaresh Sankaran 2022.11 or later 18DEC2023
SCD Loader Slowly Changing Dimensions loader with support for type 1 and type 2 changes Torben Juul Johansson 2020.1.5 or later 28SEP2022
Send SMTP Email Send Email message Mary Kathryn Queen 2022.1.4 or later 03APR2023
Send Teams Message Send Microsoft Teams Messages to a Teams channel David Weik / Tamara Fischer 2022.10 or later 14JUN2023
Surrogate Key Generator Generates a surrogate key based on a business key Torben Juul Johansson 2020.1.5 or later 29SEP2022
Synthetic Data Generation A collection of 4 SAS Studio custom steps which help you which help you train, score and assess Synthetic Data models. Sundaresh Sankaran 2022.09 or later 06OCT2022
Translate Text Translates text stored in a column using DeepL API David Weik 2023.04 or later 10MAY2023
Update column labels Update column labels from a (metadata) table, delimited file, or interactively Ignacio Rodríguez 2023.11 or later 21DEC2023
Vector Databases - Hydrate Chroma DB Collection Populate a Chroma vector database collection with documents and embeddings contained in a CAS table Sundaresh Sankaran 2023.12 or later 24JAN2024
Vector Databases - Query Chroma Chroma DB Collection Query a Chroma vector database collection with documents and store results in CAS table Sundaresh Sankaran 2023.12 or later 30JAN2024
Vector Search - Fast KNN Identify nearest neighbors to observations in an input query table Sundaresh Sankaran 2023.11 or later 09FEB2024