Installing Unscrambl Drive

Introduction

Unscrambl Drive is an on-premise next generation enterprise marketing automation product. Built for marketers, it enables marketing teams to plan and manage the customer lifecycle end-to-end and to deliver personalized campaigns quickly moving from micro segments to a segment of one.

It runs on RedHat 7 (or CentOS 7), Ubuntu 14, and Ubuntu 16. On RedHat, x86_64 and ppc64le (little endian) builds are available.

In the next sections, we will go over the software dependencies that must be satisfied prior to the installation of Unscrambl Drive.

Dependencies

The server where Unscrambl Drive is installed must be configured according to a few guidelines. The software also relies on a set of external dependencies that must be installed prior to setting it up for use. In this section, we discuss both of these issues.

Operating System configuration

The server (or servers) where Drive runs must be properly configured as far as certain OS and shell resource limits.

While these limits are going to be checked during installation, we will provide certain helpful guidelines in this section.

First, the number of file descriptors that can be used by a process must be equal to or greater than 640,000.

The soft and hard limits are defined as part of the OS configuration (defined in /etc/security/limits.conf) and used/enforced by the (Bash) shell.

In the shell, the soft limits for each of these resources (including the maximum number of open files) can be inspected as follows:

ulimit -Sa

And the hard limits as follows:

ulimit -Ha

Naturally neither limit can go beyond the OS limit that applies to the server.

If the current limit is too low for the current shell, but sufficient as far as the OS is concerned, you can update your local .bashrc raising the limit as follows:

ulimit -n 640000

Nevertheless, in most cases, after a fresh OS install, it is necessary to update /etc/security/limits.conf to meet Drive’s needs.

We recommend that you consult your OS documentation, but usually the following settings can be added towards the end of the file (assuming you want the limits applied to all users):

*    soft    nofile 640000
*    hard    nofile 640000

# End of file

After applying this change, a new login must be made with the userid that will be used by Drive (no reboot is necessary).

External Dependencies

The external software dependencies are required by Drive are specific to the operating system version and architecture where the software will run.

There are two types of external dependencies. The ones provided by the OS vendor and the ones from external vendors.

Starting with the dependencies provided by external vendors, requiring a manual installation, you will need to obtain:

  • mandatory, only for Power8 ppc64le installations: IBM JDK
  • optional: IBM InfoSphere Streams, IBM SPSS, as well as Oracle WebLogic, if these are capabilities you have acquired in your particular Unscrambl Drive installation. These are commercial products and you should consult their respective documentation to have them installed in your environment. Their integration with Unscrambl Drive is described later in this section, where the installer will request additional information regarding their location in the file system.

Installing the IBM JDK (for Power8 ppc64le)

The IBM JDK can be installed by running the following commands:

sudo subscription-manager repos --enable rhel-7-for-power-le-supplementary-rpms
sudo yum install java-1.8.0-ibm-devel

Unscrambl Drive assumes that the default path for the JDK installation is /usr/lib/jvm/java-ibm.

If that is not the case, i.e., you have installed the JDK elsewhere, Unscrambl Drive will alternatively use the UNSCRAMBL_JAVA_PATH environment variable to locate the JRE within the JDK installation, so ensure that the following variable is set and available before running the installer:

export UNSCRAMBL_JAVA_PATH=<my-own-jdk-location>

Installing the OS-managed packages

When it comes to OS-managed dependencies, i.e., dependencies that can be installed using the operating system package management, the installer will look for and warn you about missing dependencies.

The installation package comes with a utility, dependency_checker, that can be used to ensure that all dependencies are in place ahead of the installation.

This utility inspects the environment for RedHat- or Ubuntu-provided software (referred to as OS-provided software in the rest of this documentation) as well as for specific Python packages required by Unscrambl Drive, which are provided as a virtualenv environment, pre-configured to match the Unscrambl Drive needs.

Operating System-software packages must be installed using the regular mechanism employed to download and install them, usually yum on RedHat and apt-get on Ubuntu.

When using dependency_checker to extract the list of required dependencies, the output will be similar to (but not necessarily the same as) the following:

$ ./drive/bin/dependency_checker -l
List of OS package dependencies:

advance-toolchain-at8.0-runtime: 8.0 (installed)
mariadb: 5.5 (not installed)
mariadb-libs: 5.5 (installed)
mariadb-server: 5.5 (installed)

List of Python package dependencies (available in the Unscrambl Drive virtualenv):
...

Note that, in this example, one external dependency (mariadb) is not currently installed. In this case, assuming this host is running RedHat Linux, yum must be used to install mariadb.

Note

Installing OS packages on a server without Internet connection

In many cases, the server (or cluster) where Unscrambl Drive is going to be installed is not directly connected to the Internet.

In such cases, the installation of additional OS-level packages can be accomplished by having access to the Operating System installation CD/DVD or, simply, to an .iso image with the OS installation.

If your installation is RedHat-based, use one of the following alternatives:

If your installation is Ubuntu-based, use one of the following alternatives:

  • if a DVD is available, please follow the DVD-based apt repository directions outlined by Canonical to create a locally available aptitude repository.

  • if an .iso file is available, mount it first and then use the mounting point in the steps above as the location when running apt-cdrom. To mount the .iso perform the following steps either sudo-ed or by logging in as root:

    $ mkdir -p <mounting point location>
    $ mount -o loop <file>.iso <mounting point> location
    

When installing external operating system-managed dependencies, as long as the major and minor version numbers match, the dependency is considered satisfied.

Configuring MariaDB for use by Unscrambl Drive

Unscrambl Drive requires a relational database to store configuration as well as runtime data used by the applications as it runs.

Currently, the only supported database is MariaDB.

Installing MariaDB

If you need to have it installed prior to installing Unscrambl Drive, please refer to MariaDB’s online installation instructions to become acquainted with both the installation and configuration process. The actual installation is done via apt-get or yum, depending on whether you are installing it on Ubuntu or RedHat.

MariaDB is executed as a service on both Ubuntu as well as on RedHat and its behavior is controlled by a configuration file (usually located in /etc/my.cnf.d/server.cnf or /etc/mysql/my.cnf). In this file, specifically in the mysqld section of the configuration, the following entry must be commented out or removed:

bind-address = 127.0.0.1

This change will ensure that other hosts in the cluster can interact with the MariaDB server.

Configuration modifications only become active after a server restart, which requires restarting the specific operating system service as follows. On RedHat:

$ sudo service mariadb restart

And on Ubuntu:

$ sudo service mysql restart

Finally, ensure that the MariaDB server is automatically started at boot time by appropriately configuring init, systemd, cron or whatever mechanism is in place for automating the startup of services.

Securing MariaDB

Once the installation is performed, it’s also recommended that MariaDB’s installation and configuration be hardened:

$ sudo mysql_secure_installation

The simplest way to make the MariaDB server (mysqld) available to Unscrambl Drive is to install it on the same host where Unscrambl Drive is going to be placed. It is also necessary to ensure that the server is using its default port (3306):

$ netstat -lptn | grep 3306
tcp   0   0 0.0.0.0:3306   0.0.0.0:*   LISTEN   11430/mysqld

Note that placing the MariaDB server on another host as well as using a different port number can be done, but both of these options require additional configuration.

Setting up a MariaDB user

Once MariaDB is properly installed, it must be configured with a user and password to be used by Unscrambl Drive when establishing connections with the database server. By default, both the user and password are set to drive.

To create a MariaDB user called drive with the password drive, start MariaDB’s interactive shell using MariaDB’s root user:

$ mysql -u root -p

And issue the following command:

MariaDB [(none)]> CREATE USER '<user>'@'%' IDENTIFIED BY '<password>';

Replacing <user> and <password>, with drive, specifically:

MariaDB [(none)]> CREATE USER 'drive'@'%' IDENTIFIED BY 'drive';

Once the user is created, it must be given privileges to create the Unscrambl Drive databases:

MariaDB [(none)]> GRANT ALL PRIVILEGES ON `drive_%` . * TO 'drive'@'%';
MariaDB [(none)]> FLUSH PRIVILEGES;

You can now close the MariaDB interactive shell by pressing CTRL-D (the Control and the D key, together) or by using the exit command.

At this point you should be able to login to MariaDB using the user drive and authenticate using the initial, default, password you just configured:

$ mysql -u drive -p

If the new user has been properly configured, you will once again be greeted by the MariaDB interactive shell:

$ mysql -u drive -p
Enter password:
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 39
...
MariaDB [(none)]>

Using a non-default MariaDB configuration

The use of MariaDB on a different host as well as the use of a different user name and/or password requires updating the Unscrambl Drive configuration file unscrambl/etc/drive.json.

This JSON file can be edited using a regular text editor.

To change the password used by the Unscrambl Drive database server user, it’s necessary to first encrypt the new password so it’s not stored in clear text in the configuration file. Unscrambl Drive includes a tool that can be used to perform this operation:

$ $UNSCRAMBL_HOME/bin/password_encryptor -p <new_password>

Once you have the newly encrypted password, modify the drive|database|{userName,userPassword}, and, if necessary [1], the solution|database|{userName,userPassword} entries in the configuration file with the new password as well as a new user name.

[1]As mentioned, some Unscrambl Drive installations have optional features (i.e., a solution) that also make use of a relational database.

Note that the configuration file is nested, so the notation used in the prior sentence means, e.g., the entry userName as well as the entry userPassword are under the entry database, which is under the solution entry.

Note that if you are changing one or both of these settings, MariaDB itself must be updated with this new user/password information.

To change the host and port the MariaDB server uses, locate and update one or more of the following entries: drive|database|address and, if necessary, the solution|database|{userName,userPassword}. The address is a tuple with the following format: hostname:port, e.g., foo.unscrambl.com:3306.

Configuring Redis for use by Unscrambl Drive

Unscrambl Drive uses Redis, an in-memory data structure store.

You might be required to address certain OS-level requirements to ensure that Redis runs efficiently. As the Installation Process is carried out, you might face warnings similar to the following:

Warning

The operating system in this host is not optimally configured to run Redis, details:

  • overcommit_memory is set to 0. Redis background save process may fail under low memory conditions. To fix this issue add ‘vm.overcommit_memory = 1’ to /etc/sysctl.conf and then reboot or run the command ‘sysctl vm.overcommit_memory=1’ for this to take effect. For more details, please refer to http://redis.io/topics/admin
  • when Transparent Huge Pages (THP) support is enabled in the Linux kernel (as is the case now), it can lead to high latency when Redis forks to persist data to disk. THP support can be disabled by executing the command ‘echo never > /sys/kernel/mm/transparent_hugepage/enabled’ as root, and adding it to your /etc/rc.local in order to retain the setting after a reboot. When making it permanent, make sure that the rc.local file has the execution permission for the owner (if this file is a symlink, ensure that the permission is set for actual file being linked). For more details, please refer to http://redis.io/topics/admin
  • unable to enforce the TCP backlog settings of 511 yet-to-be-listened TCP connections in Redis. To fix this issue add ‘net.core.somaxconn=511’ to /etc/sysctl.conf and then reboot or run the command ‘sysctl -w net.core.somaxconn=511’ for this setting to take effect immediately, but only until the next reboot. For more details, please refer to http://redis.io/topics/admin

Ensure that, if/when these errors are displayed, the changes suggested by the error messages are put in place.

Installation Package

The Unscrambl Drive installation tarball includes:

  • The Unscrambl Drive software platform itself, comprising all of the necessary components to run Unscrambl Drive-supported applications.
  • The pre-configured Python virtualenv environment, comprising all Python dependencies required by Unscrambl Drive applications to run.
  • The external open source software required by Unscrambl Drive, e.g., Apache Tomcat, Apache Kafka, among others.

Installation Process

The installation process must be carried out using the userid that will manage the Unscrambl Drive software.

It is recommended that this user be named drive or, if using a hybrid Unscrambl Drive/InfoSphere Streams environment, streams.

Prior to starting the actual installation, if HTTPS-secured web access to Unscrambl Drive will be made available, a (optional) DNS entry must be configured to provide a user-friendly URL to end users as well as a self-signed or commercial SSL certificate must be on-hand as it will be required to complete the product installation.

It might be helpful to become familiar with the infrastructure used to provide HTTPS access to Drive by reading the steps outlined in the Configuring a web proxy section before attempting the installation steps.

The installation process begins by un-tarring the software tarball:

$ tar xvfz drive-2.1.0-rhel07ppc64le.tar.gz

The suffixes 2.1.0 denote the version you are installing and rhel07ppc64le the specific operating system (rhel7 for RedHat 7) and hardware architecture (ppc64le for Power8, 64-bits, little endian).

Extract the software:

$ mkdir -p /opt/unscrambl/drive/rhel07ppc64le
$ tar xzvf drive-2.1.0-rhel07ppc64le.tar.gz -C /opt/unscrambl/drive/rhel07ppc64le

Again, other locations are acceptable, but /opt/unscrambl/drive/rhel07ppc64le is the recommended path. Now, you are ready to perform the configuration steps:

$ cd /opt/unscrambl/drive/rhel07ppc64le/unscrambl/bin
$ ./installer

installer is an interactive program and will guide you through specific installation and configuration choices:

Unscrambl Drive is a commercial product, subjected to End-User License Agreement terms. A paper-based or digital
copy of these terms must have been signed and agreed by someone authorized to do so in your organization, prior to
carrying out this configuration. A non-customer specific copy of these terms is included for your reference in this
installation package (unscrambl/license/eula.pdf). Do you confirm that you are authorized to proceed with the
configuration based on the terms specified in your organization's own license agreement with Unscrambl Inc. (y/n)?

As a first configuration step, you will be asked about the directory that will be used to host the Unscrambl Drive instance, i.e., the location in the file system that Unscrambl Drive will use to host its services their logs as well as the data that will be kept by the Unscrambl Drive data management services.

Next, you will be asked which network interface to use for external TCP/IP traffic. You should choose the interface that provides connectivity to other hosts in the cluster (if any) and/or external services the Unscrambl Drive application will interface with:

Please select the network interface to use for external TCP/IP traffic
(default: [eth0]):

[0] lo: 127.0.0.1
[1] eth0 10.0.0.123
[2] tun0: 10.8.0.45

Make your choice by selecting the number corresponding to the interface
you want (default is '1' for interface 'eth0'): 1

The 'eth0' network interface will be used for all external TCP/IP traffic.

Once a network interface is selected, the installer will proceed to ask which transport protocol should be used between Unscrambl Drive’s web-based frontend and its backend services:

Do you want to secure the interaction between the Unscrambl Drive browser-based interface and its backend ([n]o:
HTTP will be used / [y]es: HTTPS will be used and you will be guided to generate and install an SSL certificate)
(y/n)?

Warning

Unscrambl Drive has access to and handles potentially sensitive information:

  • Authentication information: the use of Unscrambl Drive requires a user account. Unscrambl Drive has its own directory of users or can, alternatively, defer to an enterprise-wide LDAP server. In either case, user passwords are employed to ensure that a user is both authenticated and authorized to use the system. While Unscrambl Drive never stores user passwords in the clear, certain interactions require the transfering of passwords between the front end, web-based interface to the backend. Hence, encryption (through the use of HTTPS) is STRONGLY recommended.
  • Metadata and structural information about subscriber and corporate data feeds: Unscrambl Drive carries out analytics employing data that is often private and sensitive. Once, again, interactions between its web-based interface and its backend require manipulating such data and encryption, in the form of HTTPS interactions, is STRONGLY recommended to preserve end-to-end confidentiality.

While Unscrambl Drive is generally hosted in an internal network, never facing non-corporate users, it does integrate with other segments of the enterprise computing environment. Unscrambl recommends that administrators take every possible precaution to protect the integrity and confidentiality of the data consumed and produced by this platform.

There are multiple possible configurations to choose from and each one has certain advantages and risks:

  • HTTP, available network-wide (STRONGLY DISCOURAGED): this is the simplest form of installing Unscrambl Drive. Nevertheless, it is insecure as potentially sensitive information is transmitted in the clear, flowing from the user’s browser to the server without any encryption, including passwords (potentially, even the ones used for authenticating via enterprise-wide repositories such as a corporate LDAP server, if such an integration is eventually enabled).

Note

SSL certificates

An SSL certificates is a data file that digitally binds a cryptographic key to an organization’s identity. For instance, when installed on a web server, it activates the padlock used by the browser to indicate a _secure_ connection and, hence, the HTTPS protocol in interactions between the brower and a server.

In general, an SSL certificate is obtained from a Certificate Authority (CA) and it attests the ownership of a public key by the named subject of the certificate, providing an assurance that an interaction is occurring between a client and a properly identified server. A CA acts as a third party, trusted both by the subject (owner) of the certificate and by the party relying upon the certificate.

The Unscrambl Drive web service can make use of an SSL certificate to ensure that interactions between the web-based client and server are properly authenticated (i.e., to provide an assurance to the browser-based client that it is indeed speaking to an actual server) as well as encrypted, such that no sensitive information flows over between a client and server in clear text form.

SSL certificates can be purchased from several vendors or obtained for free from organizations such as Let’s Encrypt. Commercial SSL certificates are typically verified and accepted by mainstream web browsers such as Google Chrome and Mozilla Firefox.

SSL certificates can also be provided by any entity hosting a Public Key Infrastructure (PKI). For instance, an organization’s IT department might host its own internal PKI and issue self-signed certificates. These certificates work just like commercial certificates, but they will typically produce a warning or an outright rejection from web browsers, which will not recognized the PKI’s CA. In such cases, upon being directed by the organization’s IT department, the certificate may be added to the browser and accepted as legitimate, thus quieting down the warnings that would, otherwise, be raised every time a web server presenting it is accessed.

Unscrambl Drive can be configured with either type of certificate, but Unscrambl strongly recommends that a certificate be obtained from an officially recognized commercial or non-profit CA.

  • HTTPS, available network-wide (RECOMMENDED): this configuration is deemed safe, as it minimizes the chances of a sensitive data breach. In such a configuration, the interactions between the browser-based user interface and Unscrambl Drive’s backend is carried out via HTTPS interactions (encrypted). In this case, the Unscrambl Drive backend service (hosted by Apache Tomcat) is the direct destination of calls performed by a user’s browser to the server servicing the REST APIs.

    The disadvantage of this approach is that the URL to access Unscrambl Drive will be in the form https://drive.company.com:<port>/drive. In other words, the port where the service runs will be part of the URL. Typically, HTTPS servers bind and run on the privileged port 443 and such a port number can be omitted from the URL. While the Apache Tomcat-based Drive backend can be configured to use port 443, this configuration is not recommended since it requires running the web server as root, which opens up the possibility of a complete takeover of the host where the web server runs should an unknown (but possible) security vulnerability be exploited.

    Alternatively, it is also possible to fiddle with the Operating System settings to allow non-root-owned applications to use a privileged port. Nevertheless, such a configuration is both non-standard and complex from a system administration standpoint.

    If this configuration is chosen, the installer will ask for a non-privileged port and offer to install an SSL certificate, commercial or self-signed, as part of the installation process.

  • HTTP, available only in the localhost interface, proxied by an HTTPS web proxy (STRONGLY RECOMMENDED): this configuration is also deemed safe, as it minimizes the chances of a sensitive data breach. As with the prior configuration, the interactions between the browser-based user interface and the web proxy in front of Unscrambl Drive’s backend is HTTPS encrypted. The web proxy (both Apache httpd and nginx are supported) employs a regular local HTTP connection to the Unscrambl Drive backend. While interception of end user communication with the web server is possible, if the host is compromised and root access is available to the malicious party, this is not significantly riskier than the prior alternative since in both cases man-in-the-middle attacks are equally possible.

    The benefit of this approach is that the web proxy (which is specifically hardened for this task), not Unscrambl Drive’s backend, runs as root and the web proxy’s internal design is optimized for these types of interactions.

    In this configuration, the installation of an SSL certificate as well as the configuration of the HTTPS endpoint is done by installing and configuring the web proxy, a step described in the Configuring a web proxy section.

Finally, an Unscrambl Drive installation can be optionally configured with additional integration points to external software packages including IBM SPSS Modeler Solution Publisher, IBM InfoSphere Streams, as well as Oracle WebLogic Server. Each of these integration points requires a prior licensed installation for each of these software packages. When installing an Unscrambl Drive version enabled with one or more of these integration points, additional installation steps will take place. These steps, applicable only to the specific integration points enabled in the installation, will be carried out by the installer program to ensure that it can find the proper version for each of the external packages needed by them. For instance:

Please enter the path to an existing IBM SPSS Modeler Solution Publisher
installation to use: /opt/x86_64/ibm/spss/ModelerSolutionPublisher/17.1

Please enter the path to an existing IBM InfoSphere Streams
installation to use: /opt/rhel07ppc64le/ibm/streams/4.1.1

ERROR: failed to validate IBM InfoSphere Streams
installation at '/opt/rhel07ppc64le/ibm/streams/4.1.1'
Product information file '/opt/rhel07ppc64le/ibm/streams/4.1.1/.product' does not exist
Please provide a valid path to an existing IBM InfoSphere Streams installation

Please enter the path to an existing IBM InfoSphere Streams
installation to use: /opt/rhel07ppc64le/ibm/streams/4.1.1.0

Please enter the path to an existing Oracle WebLogic Server
installation to use: /opt/unscrambl/drive/noarch/weblogic-10.3.6.0

If everything is correctly configured, a success status message will be printed out:

the Unscrambl Drive environment has been configured successfully...

Now that the configuration is complete, a test can be executed. To run any Unscrambl Drive application, the shell where the application is going to run must be configured by invoking the environment_setter script (<drive-install-location>/unscrambl/bin/environment_setter):

$ ./environment_setter

Once the script is source’d, several environment variables are configured and the Python virtualenv environment is activated, indicating that we are ready to start an Unscrambl Drive application:

[drive]|$ env | sort | grep UNSCRAMBL
UNSCRAMBL_ARCH=ppc64le
UNSCRAMBL_BUILD_ARCH=ppc64le
UNSCRAMBL_HOME=/opt/unscrambl/drive/rhel07ppc64le/unscrambl
UNSCRAMBL_JAVA_PATH=/usr/lib/jvm/java-ibm
UNSCRAMBL_OS=rhel07ppc64le
UNSCRAMBL_PACKAGE_VERSION=2.1.0
UNSCRAMBL_PYTHON_PATH=/opt/unscrambl/drive/rhel07ppc64le/pyenv
UNSCRAMBL_READLINK=readlink
UNSCRAMBL_TOMCAT_PATH=/opt/unscrambl/drive/noarch/apache-tomcat-8.5.15

The preparation of the shell must be made for every shell and session where an Unscrambl Drive application will run. Now, we can run a test application (<drive-install-location>/unscrambl/bin):

[drive]|$ cd unscrambl/bin
[drive]|$ ./installation_tester -c -sns
INFO: this application is using '/tmp/<user>/unscrambl' to output and store its
code-generated assets...
.
.

Note that we are invoking the test application installation_tester using the -c flag to indicate that the application should be re-built (just in case, it has been executed before), thus ensuring that a fresh version, based on the current installation, is executed. Normally, Unscrambl Drive applications are only rebuilt when needed and taking such a precaution is normally not necessary.

Configuring Unscrambl Drive’s Runtime Environment

The Unscrambl Drive runtime environment can be configured based on the specific performance and reliability requirements of a customer deployment. These runtime environment settings are specified in the following file: $UNSCRAMBL_HOME/etc/runtime_environment.json.

An important runtime configuration is the persistence behavior of PinPoint, which is the Redis-based distributed in-memory store used by Drive to maintain subscriber profiles. As a primary mechanism, PinPoint relies on append-only logging to persist its data. As an additional mechanism, it also periodically checkpoints its data to disk.

The checkpointing can be configured using the pinPoint|server|checkpointConfiguration|schedules setting inside the runtime_environment.json file. The default checkpoint configuration is given as follows:

"schedules":
[
    {
        "operationCount": 1,
        "timeInSeconds": 86400
    },
    {
        "operationCount": 1000000,
        "timeInSeconds": 3600
    }
]

This configuration indicates that the state is checkpointed every hour if at least 1 million profiles are updated since the last checkpoint, and every day if at least one profile is updated since the last checkpoint. More schedules can be added or the existing schedules can be updated, as needed. Since the append-only logging already provides persistence, specifying frequent checkpointing is unnecessary and can adversely affect the CPU usage of the PinPoint servers.