Exporting selected model data for external analysis

Data scientists can export snapshots of the adaptive models that drive Pega Customer Decision Hub™ predictions from the Adaptive Decision Manager (ADM) data mart for further analysis in their favorite analytical tool. To limit the scope of the export to the data of interest, learn how to build data flows to populate a data set that contains all relevant data.

Transcript

Data scientists continuously monitor the state of their adaptive models. For an in-depth analysis of the adaptive model data, they can export the data to use in analytical tools like Python or R. This demo shows you how to build a data flow in Dev Studio to populate a data set with the model data that is relevant to you. The GitHub repository CDH Tools is publicly available, and the tooling helps to easily build meaningful plots and more.

U+ Bank uses Pega Customer Decision Hub™ to determine which credit card offer to show on their website when a customer logs in.

This image shows the credit card offer for Troy

For each offer the customer is eligible for, an adaptive model based on the Web Click Through Rate model configuration determines the likelihood that the customer will click on the web banner.

As a data scientist, you may want to export the data for these specific models from the ADM data mart for further analysis. Two database tables contain the required information and populate two data sets in regular updates or at the request of the user. The Model Snapshots data set contains snapshots that include the model ID, the model name, the configuration name, and model attributes such as the number of predictors, the model performance, and many others.

This image shows the model snapshot table

The ADM Predictor Snapshots data set contains snapshots of the binning of individual predictors.

This image shows the predictor snapshot table

You can export the raw data sets that contain all available data on adaptive models in the system, but to limit the size of the exports you may want to select the data that you are interested in before exporting. To select the model and predictor snapshots of models based on the Web Click Through Rate model configuration, create data flows to populate a data set that contains only the relevant information.

Data flows are scalable and resilient data pipelines that you can use to ingest, process, and move data to one or more destinations. A data flow consists of components that transform the data in the pipeline and starts with a source component.

To start building a data flow that selects the snapshots of the models that you are interested in, use the Model Snapshots data set as the source.

Next, add a filter component that only lets the snapshots of the models through that are based on the Web Click Through Rate model configuration.

This image shows the filter configuration

In a second data flow, use the ADM Predictor Snapshots data set as the source.

This image shows the second new data flow

You can use the abstract output of the first data flow, containing the Web Click Through Rate model data, to select the relevant predictor snapshots from the ADM Predictor Snapshots data set, based on matching model IDs.

This image shows both snapshot tables with the Model ID highlighted

To do so, you add a Compose component and you configure the SelectedModels data flow as the second source. Compose the ADM Predictor Snapshots with the output of the SelectedModels data flow into a property that holds the model data. The condition is that the Model IDs match up.

Add a filter component to filter out any predictor snapshots that do not match up to one of the selected models.

This image shows the filter conditions to filter relevant data

As the destination of the data for the selected models, use a Decision Data Store data set. The underlying storage for this type of data set is an Apache Cassandra database, which can be exported in the JSON format.

This image shows the destination data set

To export the data set containing all available model data for the models based on the Web Click Through Rate model configuration for external analysis, you run the data flow that you have just created. After you run the data flow, you can export the output dataset in a JSON file format.

Alternatively, you can also store the results of the data flow in a File data set that is configured to be in the repository of your choice. For each predictor snapshot, the data set contains all predictor data, and also the data from the matching adaptive model snapshot.

This demo has concluded. What did it show you?

How to build a data flow to export a subset of the ADM data mart.
How to create a Decision Data Store data set based on a Cassandra table.
How to export ADM data for analysis in external tools by running a data flow.

This Topic is available in the following Module:

Monitoring adaptive models v4

Get help

If you are having problems with your training, please review the Pega Academy Support FAQs.

¿Le ha resultado útil este contenido?

Sí

¿Quiere ayudarnos a mejorar este contenido?

Sugerir una edición

Get Started with Community

Exporting selected model data for external analysis

Transcript

This Topic is available in the following Module:

We'd prefer it if you saw us at our best.