Installing Unscrambl Drive¶
Introduction¶
Unscrambl Drive is an on-premise next generation enterprise marketing automation product. Built for marketers, it enables marketing teams to plan and manage the customer lifecycle end-to-end and to deliver personalized campaigns quickly moving from micro segments to a segment of one.
It runs on RedHat 7 (or CentOS 7), Ubuntu 14, and Ubuntu 16. On RedHat, x86_64
and ppc64le
(little endian) builds are available.
In the next sections, we will go over the software dependencies that must be satisfied prior to the installation of Unscrambl Drive.
Dependencies¶
The server where Unscrambl Drive is installed must be configured according to a few guidelines. The software also relies on a set of external dependencies that must be installed prior to setting it up for use. In this section, we discuss both of these issues.
Operating System configuration¶
The server (or servers) where Drive runs must be properly configured as far as certain OS and shell resource limits.
While these limits are going to be checked during installation, we will provide certain helpful guidelines in this section.
First, the number of file descriptors that can be used by a process must be equal to or greater than 640,000.
The soft and hard limits are defined as part of the OS configuration (defined in /etc/security/limits.conf
) and
used/enforced by the (Bash) shell.
In the shell, the soft limits for each of these resources (including the maximum number of open files) can be inspected as follows:
ulimit -Sa
And the hard limits as follows:
ulimit -Ha
Naturally neither limit can go beyond the OS limit that applies to the server.
If the current limit is too low for the current shell, but sufficient as far as the OS is concerned, you can update your
local .bashrc
raising the limit as follows:
ulimit -n 640000
Nevertheless, in most cases, after a fresh OS install, it is necessary to update /etc/security/limits.conf
to meet
Drive’s needs.
We recommend that you consult your OS documentation, but usually the following settings can be added towards the end of the file (assuming you want the limits applied to all users):
* soft nofile 640000
* hard nofile 640000
# End of file
After applying this change, a new login must be made with the userid that will be used by Drive (no reboot is necessary).
External Dependencies¶
The external software dependencies are required by Drive are specific to the operating system version and architecture where the software will run.
There are two types of external dependencies. The ones provided by the OS vendor and the ones from external vendors.
Starting with the dependencies provided by external vendors, requiring a manual installation, you will need to obtain:
- mandatory, only for Power8
ppc64le
installations: IBM JDK - optional: IBM InfoSphere Streams, IBM SPSS, as well as Oracle WebLogic, if these are capabilities you have acquired in
your particular Unscrambl Drive installation. These are commercial products and you should consult their respective
documentation to have them installed in your environment. Their integration with Unscrambl Drive is described later in
this section, where the
installer
will request additional information regarding their location in the file system.
Installing the IBM JDK (for Power8 ppc64le
)¶
The IBM JDK can be installed by running the following commands:
sudo subscription-manager repos --enable rhel-7-for-power-le-supplementary-rpms
sudo yum install java-1.8.0-ibm-devel
Unscrambl Drive assumes that the default path for the JDK installation is /usr/lib/jvm/java-ibm
.
If that is not the case, i.e., you have installed the JDK elsewhere, Unscrambl Drive will alternatively use the
UNSCRAMBL_JAVA_PATH
environment variable to locate the JRE within the JDK installation, so ensure that the following
variable is set and available before running the installer
:
export UNSCRAMBL_JAVA_PATH=<my-own-jdk-location>
Installing the OS-managed packages¶
When it comes to OS-managed dependencies, i.e., dependencies that can be installed using the operating system package management, the installer will look for and warn you about missing dependencies.
The installation package comes with a utility, dependency_checker
, that can be used to ensure that all dependencies
are in place ahead of the installation.
This utility inspects the environment for RedHat- or Ubuntu-provided software (referred to as OS-provided software in
the rest of this documentation) as well as for specific Python packages required by Unscrambl Drive, which are provided
as a virtualenv
environment, pre-configured to match the Unscrambl Drive needs.
Operating System-software packages must be installed using the regular mechanism employed to download and install them,
usually yum
on RedHat and apt-get
on Ubuntu.
When using dependency_checker
to extract the list of required dependencies, the output will be similar to (but not
necessarily the same as) the following:
$ ./drive/bin/dependency_checker -l
List of OS package dependencies:
advance-toolchain-at8.0-runtime: 8.0 (installed)
mariadb: 5.5 (not installed)
mariadb-libs: 5.5 (installed)
mariadb-server: 5.5 (installed)
List of Python package dependencies (available in the Unscrambl Drive virtualenv):
...
Note that, in this example, one external dependency (mariadb
) is not currently installed. In this case, assuming
this host is running RedHat Linux, yum
must be used to install mariadb
.
Note
Installing OS packages on a server without Internet connection
In many cases, the server (or cluster) where Unscrambl Drive is going to be installed is not directly connected to the Internet.
In such cases, the installation of additional OS-level packages can be accomplished by having access to the Operating System installation CD/DVD or, simply, to an .iso image with the OS installation.
If your installation is RedHat-based, use one of the following alternatives:
- if a DVD is available, please follow the DVD-based yum repository directions outlined by RedHat to create a locally available yum repository.
- if an .iso file is available, please follow these .iso file directions to create a locally available yum repository.
If your installation is Ubuntu-based, use one of the following alternatives:
if a DVD is available, please follow the DVD-based apt repository directions outlined by Canonical to create a locally available aptitude repository.
if an .iso file is available, mount it first and then use the mounting point in the steps above as the location when running
apt-cdrom
. To mount the .iso perform the following steps eithersudo
-ed or by logging in asroot
:$ mkdir -p <mounting point location> $ mount -o loop <file>.iso <mounting point> location
When installing external operating system-managed dependencies, as long as the major and minor version numbers match, the dependency is considered satisfied.
Configuring MariaDB for use by Unscrambl Drive¶
Unscrambl Drive requires a relational database to store configuration as well as runtime data used by the applications as it runs.
Currently, the only supported database is MariaDB.
Installing MariaDB¶
If you need to have it installed prior to installing Unscrambl Drive, please refer to MariaDB’s online installation
instructions to become acquainted with
both the installation and configuration process. The actual installation is done via apt-get
or yum
, depending
on whether you are installing it on Ubuntu or RedHat.
MariaDB is executed as a service on both Ubuntu as well as on RedHat and its behavior is controlled by a configuration
file (usually located in /etc/my.cnf.d/server.cnf
or /etc/mysql/my.cnf
). In this file, specifically in the
mysqld
section of the configuration, the following entry must be commented out or removed:
bind-address = 127.0.0.1
This change will ensure that other hosts in the cluster can interact with the MariaDB server.
Configuration modifications only become active after a server restart, which requires restarting the specific operating system service as follows. On RedHat:
$ sudo service mariadb restart
And on Ubuntu:
$ sudo service mysql restart
Finally, ensure that the MariaDB server is automatically started at boot time by appropriately configuring init
,
systemd
, cron
or whatever mechanism is in place for automating the startup of services.
Securing MariaDB¶
Once the installation is performed, it’s also recommended that MariaDB’s installation and configuration be hardened:
$ sudo mysql_secure_installation
The simplest way to make the MariaDB server (mysqld
) available to Unscrambl Drive is to install it on the same host
where Unscrambl Drive is going to be placed. It is also necessary to ensure that the server is using its default port
(3306):
$ netstat -lptn | grep 3306
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN 11430/mysqld
Note that placing the MariaDB server on another host as well as using a different port number can be done, but both of these options require additional configuration.
Setting up a MariaDB user¶
Once MariaDB is properly installed, it must be configured with a user and password to be used by Unscrambl Drive when
establishing connections with the database server. By default, both the user and password are set to drive
.
To create a MariaDB user called drive
with the password drive
, start MariaDB’s interactive shell using MariaDB’s
root
user:
$ mysql -u root -p
And issue the following command:
MariaDB [(none)]> CREATE USER '<user>'@'%' IDENTIFIED BY '<password>';
Replacing <user>
and <password>
, with drive
, specifically:
MariaDB [(none)]> CREATE USER 'drive'@'%' IDENTIFIED BY 'drive';
Once the user is created, it must be given privileges to create the Unscrambl Drive databases:
MariaDB [(none)]> GRANT ALL PRIVILEGES ON `drive_%` . * TO 'drive'@'%';
MariaDB [(none)]> FLUSH PRIVILEGES;
You can now close the MariaDB interactive shell by pressing CTRL-D (the Control
and the D
key, together) or by
using the exit
command.
At this point you should be able to login to MariaDB using the user drive
and authenticate using the initial,
default, password you just configured:
$ mysql -u drive -p
If the new user has been properly configured, you will once again be greeted by the MariaDB interactive shell:
$ mysql -u drive -p
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 39
...
MariaDB [(none)]>
Using a non-default MariaDB configuration¶
The use of MariaDB on a different host as well as the use of a different user name and/or password requires updating the
Unscrambl Drive configuration file unscrambl/etc/drive.json
.
This JSON file can be edited using a regular text editor.
To change the password used by the Unscrambl Drive database server user, it’s necessary to first encrypt the new password so it’s not stored in clear text in the configuration file. Unscrambl Drive includes a tool that can be used to perform this operation:
$ $UNSCRAMBL_HOME/bin/password_encryptor -p <new_password>
Once you have the newly encrypted password, modify the drive|database|{userName,userPassword}
, and, if necessary
[1], the solution|database|{userName,userPassword}
entries in the configuration file with the new password as well
as a new user name.
[1] | As mentioned, some Unscrambl Drive installations have optional features (i.e., a solution) that also make use of a relational database. |
Note that the configuration file is nested, so the notation used in the prior sentence means, e.g., the entry
userName
as well as the entry userPassword
are under the entry database
, which is under the solution
entry.
Note that if you are changing one or both of these settings, MariaDB itself must be updated with this new user/password information.
To change the host and port the MariaDB server uses, locate and update one or more of the following entries:
drive|database|address
and, if necessary, the solution|database|{userName,userPassword}
. The address is a tuple
with the following format: hostname:port
, e.g., foo.unscrambl.com:3306
.
Configuring Redis for use by Unscrambl Drive¶
Unscrambl Drive uses Redis, an in-memory data structure store.
You might be required to address certain OS-level requirements to ensure that Redis runs efficiently. As the Installation Process is carried out, you might face warnings similar to the following:
Warning
The operating system in this host is not optimally configured to run Redis, details:
- overcommit_memory is set to 0. Redis background save process may fail under low memory conditions. To fix this issue add ‘vm.overcommit_memory = 1’ to /etc/sysctl.conf and then reboot or run the command ‘sysctl vm.overcommit_memory=1’ for this to take effect. For more details, please refer to http://redis.io/topics/admin
- when Transparent Huge Pages (THP) support is enabled in the Linux kernel (as is the case now), it can lead to high latency when Redis forks to persist data to disk. THP support can be disabled by executing the command ‘echo never > /sys/kernel/mm/transparent_hugepage/enabled’ as root, and adding it to your /etc/rc.local in order to retain the setting after a reboot. When making it permanent, make sure that the rc.local file has the execution permission for the owner (if this file is a symlink, ensure that the permission is set for actual file being linked). For more details, please refer to http://redis.io/topics/admin
- unable to enforce the TCP backlog settings of 511 yet-to-be-listened TCP connections in Redis. To fix this issue add ‘net.core.somaxconn=511’ to /etc/sysctl.conf and then reboot or run the command ‘sysctl -w net.core.somaxconn=511’ for this setting to take effect immediately, but only until the next reboot. For more details, please refer to http://redis.io/topics/admin
Ensure that, if/when these errors are displayed, the changes suggested by the error messages are put in place.
Installation Package¶
The Unscrambl Drive installation tarball includes:
- The Unscrambl Drive software platform itself, comprising all of the necessary components to run Unscrambl Drive-supported applications.
- The pre-configured Python
virtualenv
environment, comprising all Python dependencies required by Unscrambl Drive applications to run. - The external open source software required by Unscrambl Drive, e.g., Apache Tomcat, Apache Kafka, among others.
Installation Process¶
The installation process must be carried out using the userid that will manage the Unscrambl Drive software.
It is recommended that this user be named drive
or, if using a hybrid Unscrambl Drive/InfoSphere Streams
environment, streams
.
Prior to starting the actual installation, if HTTPS-secured web access to Unscrambl Drive will be made available, a (optional) DNS entry must be configured to provide a user-friendly URL to end users as well as a self-signed or commercial SSL certificate must be on-hand as it will be required to complete the product installation.
It might be helpful to become familiar with the infrastructure used to provide HTTPS access to Drive by reading the steps outlined in the Configuring a web proxy section before attempting the installation steps.
The installation process begins by un-tarring the software tarball:
$ tar xvfz drive-2.1.0-rhel07ppc64le.tar.gz
The suffixes 2.1.0
denote the version you are installing and rhel07ppc64le
the specific operating system
(rhel7
for RedHat 7) and hardware architecture (ppc64le
for Power8, 64-bits, little endian).
Extract the software:
$ mkdir -p /opt/unscrambl/drive/rhel07ppc64le
$ tar xzvf drive-2.1.0-rhel07ppc64le.tar.gz -C /opt/unscrambl/drive/rhel07ppc64le
Again, other locations are acceptable, but /opt/unscrambl/drive/rhel07ppc64le
is the recommended path. Now, you are
ready to perform the configuration steps:
$ cd /opt/unscrambl/drive/rhel07ppc64le/unscrambl/bin
$ ./installer
installer
is an interactive program and will guide you through specific installation and configuration choices:
Unscrambl Drive is a commercial product, subjected to End-User License Agreement terms. A paper-based or digital
copy of these terms must have been signed and agreed by someone authorized to do so in your organization, prior to
carrying out this configuration. A non-customer specific copy of these terms is included for your reference in this
installation package (unscrambl/license/eula.pdf). Do you confirm that you are authorized to proceed with the
configuration based on the terms specified in your organization's own license agreement with Unscrambl Inc. (y/n)?
As a first configuration step, you will be asked about the directory that will be used to host the Unscrambl Drive instance, i.e., the location in the file system that Unscrambl Drive will use to host its services their logs as well as the data that will be kept by the Unscrambl Drive data management services.
Next, you will be asked which network interface to use for external TCP/IP traffic. You should choose the interface that provides connectivity to other hosts in the cluster (if any) and/or external services the Unscrambl Drive application will interface with:
Please select the network interface to use for external TCP/IP traffic
(default: [eth0]):
[0] lo: 127.0.0.1
[1] eth0 10.0.0.123
[2] tun0: 10.8.0.45
Make your choice by selecting the number corresponding to the interface
you want (default is '1' for interface 'eth0'): 1
The 'eth0' network interface will be used for all external TCP/IP traffic.
Once a network interface is selected, the installer will proceed to ask which transport protocol should be used between Unscrambl Drive’s web-based frontend and its backend services:
Do you want to secure the interaction between the Unscrambl Drive browser-based interface and its backend ([n]o:
HTTP will be used / [y]es: HTTPS will be used and you will be guided to generate and install an SSL certificate)
(y/n)?
Warning
Unscrambl Drive has access to and handles potentially sensitive information:
- Authentication information: the use of Unscrambl Drive requires a user account. Unscrambl Drive has its own directory of users or can, alternatively, defer to an enterprise-wide LDAP server. In either case, user passwords are employed to ensure that a user is both authenticated and authorized to use the system. While Unscrambl Drive never stores user passwords in the clear, certain interactions require the transfering of passwords between the front end, web-based interface to the backend. Hence, encryption (through the use of HTTPS) is STRONGLY recommended.
- Metadata and structural information about subscriber and corporate data feeds: Unscrambl Drive carries out analytics employing data that is often private and sensitive. Once, again, interactions between its web-based interface and its backend require manipulating such data and encryption, in the form of HTTPS interactions, is STRONGLY recommended to preserve end-to-end confidentiality.
While Unscrambl Drive is generally hosted in an internal network, never facing non-corporate users, it does integrate with other segments of the enterprise computing environment. Unscrambl recommends that administrators take every possible precaution to protect the integrity and confidentiality of the data consumed and produced by this platform.
There are multiple possible configurations to choose from and each one has certain advantages and risks:
- HTTP, available network-wide (STRONGLY DISCOURAGED): this is the simplest form of installing Unscrambl Drive. Nevertheless, it is insecure as potentially sensitive information is transmitted in the clear, flowing from the user’s browser to the server without any encryption, including passwords (potentially, even the ones used for authenticating via enterprise-wide repositories such as a corporate LDAP server, if such an integration is eventually enabled).
Note
SSL certificates
An SSL certificates is a data file that digitally binds a cryptographic key to an organization’s identity. For instance, when installed on a web server, it activates the padlock used by the browser to indicate a _secure_ connection and, hence, the HTTPS protocol in interactions between the brower and a server.
In general, an SSL certificate is obtained from a Certificate Authority (CA) and it attests the ownership of a public key by the named subject of the certificate, providing an assurance that an interaction is occurring between a client and a properly identified server. A CA acts as a third party, trusted both by the subject (owner) of the certificate and by the party relying upon the certificate.
The Unscrambl Drive web service can make use of an SSL certificate to ensure that interactions between the web-based client and server are properly authenticated (i.e., to provide an assurance to the browser-based client that it is indeed speaking to an actual server) as well as encrypted, such that no sensitive information flows over between a client and server in clear text form.
SSL certificates can be purchased from several vendors or obtained for free from organizations such as Let’s Encrypt. Commercial SSL certificates are typically verified and accepted by mainstream web browsers such as Google Chrome and Mozilla Firefox.
SSL certificates can also be provided by any entity hosting a Public Key Infrastructure (PKI). For instance, an organization’s IT department might host its own internal PKI and issue self-signed certificates. These certificates work just like commercial certificates, but they will typically produce a warning or an outright rejection from web browsers, which will not recognized the PKI’s CA. In such cases, upon being directed by the organization’s IT department, the certificate may be added to the browser and accepted as legitimate, thus quieting down the warnings that would, otherwise, be raised every time a web server presenting it is accessed.
Unscrambl Drive can be configured with either type of certificate, but Unscrambl strongly recommends that a certificate be obtained from an officially recognized commercial or non-profit CA.
HTTPS, available network-wide (RECOMMENDED): this configuration is deemed safe, as it minimizes the chances of a sensitive data breach. In such a configuration, the interactions between the browser-based user interface and Unscrambl Drive’s backend is carried out via HTTPS interactions (encrypted). In this case, the Unscrambl Drive backend service (hosted by Apache Tomcat) is the direct destination of calls performed by a user’s browser to the server servicing the REST APIs.
The disadvantage of this approach is that the URL to access Unscrambl Drive will be in the form
https://drive.company.com:<port>/drive
. In other words, theport
where the service runs will be part of the URL. Typically, HTTPS servers bind and run on the privileged port 443 and such a port number can be omitted from the URL. While the Apache Tomcat-based Drive backend can be configured to use port 443, this configuration is not recommended since it requires running the web server asroot
, which opens up the possibility of a complete takeover of the host where the web server runs should an unknown (but possible) security vulnerability be exploited.Alternatively, it is also possible to fiddle with the Operating System settings to allow non-
root
-owned applications to use a privileged port. Nevertheless, such a configuration is both non-standard and complex from a system administration standpoint.If this configuration is chosen, the
installer
will ask for a non-privileged port and offer to install an SSL certificate, commercial or self-signed, as part of the installation process.HTTP, available only in the
localhost
interface, proxied by an HTTPS web proxy (STRONGLY RECOMMENDED): this configuration is also deemed safe, as it minimizes the chances of a sensitive data breach. As with the prior configuration, the interactions between the browser-based user interface and the web proxy in front of Unscrambl Drive’s backend is HTTPS encrypted. The web proxy (both Apachehttpd
andnginx
are supported) employs a regular local HTTP connection to the Unscrambl Drive backend. While interception of end user communication with the web server is possible, if the host is compromised androot
access is available to the malicious party, this is not significantly riskier than the prior alternative since in both cases man-in-the-middle attacks are equally possible.The benefit of this approach is that the web proxy (which is specifically hardened for this task), not Unscrambl Drive’s backend, runs as
root
and the web proxy’s internal design is optimized for these types of interactions.In this configuration, the installation of an SSL certificate as well as the configuration of the HTTPS endpoint is done by installing and configuring the web proxy, a step described in the Configuring a web proxy section.
Finally, an Unscrambl Drive installation can be optionally configured with additional integration points to external
software packages including IBM SPSS Modeler Solution Publisher, IBM InfoSphere Streams, as well as Oracle WebLogic
Server. Each of these integration points requires a prior licensed installation for each of these software packages.
When installing an Unscrambl Drive version enabled with one or more of these integration points, additional installation
steps will take place. These steps, applicable only to the specific integration points enabled in the installation, will
be carried out by the installer
program to ensure that it can find the proper version for each of the external
packages needed by them. For instance:
Please enter the path to an existing IBM SPSS Modeler Solution Publisher
installation to use: /opt/x86_64/ibm/spss/ModelerSolutionPublisher/17.1
Please enter the path to an existing IBM InfoSphere Streams
installation to use: /opt/rhel07ppc64le/ibm/streams/4.1.1
ERROR: failed to validate IBM InfoSphere Streams
installation at '/opt/rhel07ppc64le/ibm/streams/4.1.1'
Product information file '/opt/rhel07ppc64le/ibm/streams/4.1.1/.product' does not exist
Please provide a valid path to an existing IBM InfoSphere Streams installation
Please enter the path to an existing IBM InfoSphere Streams
installation to use: /opt/rhel07ppc64le/ibm/streams/4.1.1.0
Please enter the path to an existing Oracle WebLogic Server
installation to use: /opt/unscrambl/drive/noarch/weblogic-10.3.6.0
If everything is correctly configured, a success status message will be printed out:
the Unscrambl Drive environment has been configured successfully...
Now that the configuration is complete, a test can be executed. To run any Unscrambl Drive application, the shell where
the application is going to run must be configured by invoking the environment_setter
script
(<drive-install-location>/unscrambl/bin/environment_setter
):
$ ./environment_setter
Once the script is source’d, several environment variables are configured and the Python virtualenv
environment is
activated, indicating that we are ready to start an Unscrambl Drive application:
[drive]|$ env | sort | grep UNSCRAMBL
UNSCRAMBL_ARCH=ppc64le
UNSCRAMBL_BUILD_ARCH=ppc64le
UNSCRAMBL_HOME=/opt/unscrambl/drive/rhel07ppc64le/unscrambl
UNSCRAMBL_JAVA_PATH=/usr/lib/jvm/java-ibm
UNSCRAMBL_OS=rhel07ppc64le
UNSCRAMBL_PACKAGE_VERSION=2.1.0
UNSCRAMBL_PYTHON_PATH=/opt/unscrambl/drive/rhel07ppc64le/pyenv
UNSCRAMBL_READLINK=readlink
UNSCRAMBL_TOMCAT_PATH=/opt/unscrambl/drive/noarch/apache-tomcat-8.5.15
The preparation of the shell must be made for every shell and session where an Unscrambl Drive application will run.
Now, we can run a test application (<drive-install-location>/unscrambl/bin
):
[drive]|$ cd unscrambl/bin
[drive]|$ ./installation_tester -c -sns
INFO: this application is using '/tmp/<user>/unscrambl' to output and store its
code-generated assets...
.
.
Note that we are invoking the test application installation_tester
using the -c
flag to indicate that the
application should be re-built (just in case, it has been executed before), thus ensuring that a fresh version, based on
the current installation, is executed. Normally, Unscrambl Drive applications are only rebuilt when needed and taking
such a precaution is normally not necessary.
Configuring Unscrambl Drive’s Runtime Environment¶
The Unscrambl Drive runtime environment can be configured based on the specific performance and reliability requirements
of a customer deployment. These runtime environment settings are specified in the following file:
$UNSCRAMBL_HOME/etc/runtime_environment.json
.
An important runtime configuration is the persistence behavior of PinPoint, which is the Redis-based distributed in-memory store used by Drive to maintain subscriber profiles. As a primary mechanism, PinPoint relies on append-only logging to persist its data. As an additional mechanism, it also periodically checkpoints its data to disk.
The checkpointing can be configured using the pinPoint|server|checkpointConfiguration|schedules
setting inside the
runtime_environment.json
file. The default checkpoint configuration is given as follows:
"schedules":
[
{
"operationCount": 1,
"timeInSeconds": 86400
},
{
"operationCount": 1000000,
"timeInSeconds": 3600
}
]
This configuration indicates that the state is checkpointed every hour if at least 1 million profiles are updated since the last checkpoint, and every day if at least one profile is updated since the last checkpoint. More schedules can be added or the existing schedules can be updated, as needed. Since the append-only logging already provides persistence, specifying frequent checkpointing is unnecessary and can adversely affect the CPU usage of the PinPoint servers.